quote
@LeoJay
22/04/14 01:55 
 poparcie   5517
Widz, komentarz do filmu
The AI getting scared and slowing down is kinda adorable lol...
Przejdź do KOMENTARZY POD FILMEM

Komentarze widzów

@LeoJay @LeoJay  22/04/14 01:55  polubiono  5517

The AI getting scared and slowing down is kinda adorable lol

@DarkValorWolf @DarkValorWolf  22/03/12 15:43  polubiono  5424

"and after 53 hours of learning, the AI gets this run" nice Wirtual reference there

@bgosl @bgosl  22/03/22 19:49  polubiono  4091

Great video - I think your explanations and illustrations explain some tricky concepts in a super understandable way! As to your issues at the end: I think maybe it's related to the way your rewards are structured? Looking at the illustration around 3:30, there's a massive reward associated with cutting a corner: while going along a straight bit of road, it's getting rewards like 1.4, 1.6, 1.7 - but once it cuts a corner, suddenly you get 8.7 in one step. So it makes a lot of sense that it learns to always cut corners aggressively, since that increases reward by a lot. But going quickly on the straights, which seems it doesn't like to do, doesn't in itself carry all that much more positive reward. Since you're using discounted rewards to evaluate the expected rewards of each action, you will see a slightly higher reward since you're moving further along - but relative to the rewards seen if it finds another corner to cut a little more, it's quite small. So it might just be favoring minor improvements to a corner-cut over basically anything else, including just pushing the forward button on a straight. I think maybe restructuring your rewards could help. An obvious improvement would be to give rewards not relative to the midline of each block, but place rewards along the optimal racing line - but at that point, are you even learning anything? You're just saying "you will get an increased reward if you follow my predetermined path", which to me isn't really learning. I think an intermediate step would be to place rewards for each 90 degree corner at the inside corner of that block (maybe a small margin from the actual edge): that should reduce the extreme impact of cutting corners vs going fast on straights, but you're still quite far from just indirectly providing the solution. Also; unless just didn't say, I don't think you have a negative reward at each timestep? That's typical for a "win but as fast as possible" scenario, which is the case here. It would make sense, as well: going in the right direction but super slowly, is kind of like going backwards, so should also be penalized. I think that would even eliminate the need for negative rewards if going backwards: by proxy, going backwards will always lead to taking more time, which leads to more negative rewards. You might even have to remove the negative rewards from going backwards, as going backwards and going slowly might see the same net reward, which would leave the agent puzzled/indifferent between the two. In the end, getting to the finish with less time spent will lead to the maximum reward. Finally, of course: introducing the brake button would give you possible improved times - and even might let the agent learn some cool Trackmania tricks like drifting (tapping brake while steering) to go around corners faster. It does increase the action space though, which of course means longer training time. But something to consider, if you want to iterate on this! Regards, I went to YouTube to procrastinate from his reinforcement learning course, and ended up using some of that knowledge anyway. I guess the algorithm now knows my interests a little _too well_. PS: really well done on introducing exploring starts! When you got to that part of the video, I almost yelled "exploring starts!" at the screen, and then that's exactly what you decided to do. I'm curious if that was from knowing that exploring starts are a thing in RL, or if you just came up with that concept from thinking about it?

@eL3ctric @eL3ctric  22/03/13 12:58  polubiono  3236

Oh god yes finally someone that tackles the "my ai just learns the track layout" by adjusting the layout/starting position. Nice!

@Aoi_Haru763 @Aoi_Haru763  22/05/14 07:12  polubiono  1953

"At one point, it even stops, as of it's afraid to continue. After a long minute, it finally decides to continue, and dies". Story of my life. I feel a connection between me and the AI. Empathy.

@noahhastings6145 @noahhastings6145  22/04/27 00:00  polubiono  1894

*Turning* AI: "I got this." *Straights* AI: "🤷‍♂️ Guess I'll die"

@p3mikka709 @p3mikka709  22/03/12 15:31  polubiono  1737

"but then, the AI got this run" music starts playing

@DonatCallens @DonatCallens  22/03/22 07:56  polubiono  1189

Suggestion: when you compare human runs versus AI runs, you immediately see a big difference which is that humans make less corrections. The driving style of humans is infused with the biological constraint of energy preservation. I think we could improve the learning of AI greatly by adding a negative cost to the amount of input changes the AI makes...

@petros4225 @petros4225  22/04/22 00:15  polubiono  981

I like how the AI figures out that by moving in a sinusoidal trajectory rather than a straight line, it covers more distance, thus generates more cumulative reward. Maybe you could penaltize unnecessary steering somehow, to make it less wiggly 😜

@the_break1 @the_break1  22/03/12 15:16  polubiono  836

I really wonder how fast would this AI pass A01 and it's reaction would be on final jump. Really cool stuff!

@marijnregterschot7009 @marijnregterschot7009  22/03/12 16:54  polubiono  818

I think trackmania is a great game to practice machine learning. It has very basic inputs and the game is 100% deterministic. Most importantly it's just satisfying to see.

@BryceNewbury @BryceNewbury  22/03/12 15:53  polubiono  390

I really enjoyed the explanations of the different training methods paired with the excellent visuals. Keep up the good work, and I can’t wait to see what you try next!

@cparch1758 @cparch1758  22/04/07 16:15  polubiono  297

I'm curious how adding walls would have affected the learning speed. Add barriers around the track, and subtract the "reward" for every time it made contact with a barrier

@jeromelageyre5287 @jeromelageyre5287  22/03/12 15:17  polubiono  252

What a fun way to learn about machine learning and its variants! Very good video and montages ! Very clear and accessible English ! The return of yoshtm is more than a pleasure!

@June_auAlaska @June_auAlaska  22/08/11 13:12  polubiono  219

Never have I felt so much emotion for a programmed robot, but here we are.

@deathfoxstreams2542 @deathfoxstreams2542  22/03/22 03:32  polubiono  216

It would be cool to see a speedrunner catagory based around learning AI

@SelevanRsC @SelevanRsC  22/07/02 22:56  polubiono  205

I love how at 7:01 the one car made such a well run that it was shocked in the end how good it was, and got totally confused, lol

@TheStormyClouds @TheStormyClouds  22/06/26 01:49  polubiono  186

I'm so happy that you did the randomized spawn points and speeds. I was worrying that you might simply be teaching the AI how to play a single map by it learning just pure inputs rather than seeing the actual turns and figuring out what to do. I was incredibly impressed with how many made it through the map with all sorts of jumps and terrain types.

@sjccsjcc @sjccsjcc  22/03/13 13:18  polubiono  155

i enjoyed this so much and the wirtual reference made it better, keep up the good work

@ToToMania @ToToMania  22/03/13 08:15  polubiono  134

I can't even think of how much time went into this video. Amazing visualizations, and a great AI of course. Very interesting to see the learning process. Great work!