Using a dataset of tournament players, we can train AIs that play Super Smash Bros. Melee in a competent and human-like manner.



Note: This page is a high level overview; for technical details, please see this post

Training Description

Project Nabla is trained using deep neural networks. At a high level, it works in two parts: first, it uses a dataset of Slippi replays from tournament setups, learning to copy exactly what players are doing. At this point, it has no conception of what moves are “good” or “bad”; it simply tries to imitate the human players. In the second part, it plays against itself and receives rewards when it does damage/wins the stock. The agents take actions that lead to more reward, increasing their competence. By playing against itself continually, it continues to pose challenges at an appropriate level as it improves, leading to increasingly competent agents.

Videos

IBDW & KJH react to the bot

Toph plays against 2 frame delay Falco & Fox

VOD of random Twitch viewers playing against it

Follow @otter_collapse for updates, perhaps will make a video :)

FAQ

How’s this different from altf4’s SmashBot project? How’s this different from Vlad Firoiu’s Philip projects? altf4’s project is trained by hard-coding using heuristics. As a result, it looks much less humanlike (but it’s still very entertaining and not the point of their project!) Vlad’s project is the most similar project; the difference is that we start off the agent with learning from Slippi replays while Vlad’s starts from random actions. Note that he is also pursuing an independent project with the same goal.

What’s the dataset it was trained on? What characters did it train against? Drive link Includes nearly 100k SLP files from tournament setups, pruned to remove handwarmers, doubles, <30 second matches, and others. Credit to altf4 for the original dataset and Vlad Firoiu whose filtered fox dataset was used in the beginning. It’s trained against all characters; Falcon was filtered out due to some (resolvable) bugs. No filtering/fine tuning using APM or MMR was used.

Does it have delayed reaction times? Does it learn as it plays? It is trained with two frames of delay, while a human reaction time is around 18 frames. This is an area of future work; check out Vlad Firoiu’s previous work. It does not learn as it plays. It is trained once and deployed in a frozen state. (Technical note: it may have meta-learning properties as it is trained using a LSTM, but I haven’t investigated this too much.)

How can I play against it? Is the code available? I have been hosting it on my twitch@rakkob while interest lasts — although only limited to 1 person at a time, it’s reached thousands of plays over a few weeks. The code is not going to be open sourced at this time.

What’s your next steps? How do I follow along? I am mainly focusing on other projects currently. There is lots of interesting future work though—tackling the delayed reaction time, fine tuning on pro players, training against old opponents, etc. Reach out to me if you’re interested in working on these ideas or others! Join the Slippi Discord’s #artificial-intelligence channel, and follow me on twitter for updates @otter_collapse.

Thanks to: Fizzi and the Slippi team (Nikki and others) for Slippi and the FFW code that allows us to speed up training donate to Fizzi here; altf4 for libmelee and much more; Vlad Firoiu for initial dataset, headless Dolphin and related code, and various discussions, ideas, inspiration; Krohnos for gecko code for endless time mode; Aach, Lizardy, Raffle Winner, and Toph for playtesting early versions.

Slippi Discord (join #artificial-intelligence after getting a dev role): https://discord.com/invite/YRzDxT5

Libmelee: https://github.com/altf4/libmelee

Vlad’s libmelee and custom Dolphin w/ FFW and Null Video https://github.com/vladfi1/libmelee/tree/dev https://github.com/vladfi1/slippi-Ishiiruka/tree/exi-ai

Public SLP Database v3 (compiled by altf4): https://drive.google.com/file/d/1VqRECRNL8Zy4BFQVIHvoVGtfjz4fi9KC/view

Vlad’s open source imitation learning project https://github.com/vladfi1/slippi-ai and RL project https://github.com/vladfi1/phillip

Research supported with Cloud TPUs from Google’s TPU Research Cloud (TRC)