GitHub - timovelten/2048: Playing 2048 with on-policy reinforcement learning. Featuring a custom PPO implementation with action masking.

This is my attempt at using reinforcement learning to solve the game 2048.

It features:

A gym.Env to play 2048, where performance critical parts are numba compiled (this is important, as most of the training time is actually spent simulating environments)
A custom somewhat generic PPO implementation (loosely based on the stable baselines 3 implementation)
Action masking. In my testing, the agent really struggled to learn when a move was legal, and it really hindered progress.
A simple reward function. A natural choice of reward function for 2048 is to simply award the agent the score delta between two states (i.e. merging two 1024 tiles to give one 2048 tile would give a reward of 2048.0). However, this did not work very well in practice. I assume this is due to the exponential nature of these rewards. The reward function that I currently use is (score delta) / (sum of all tiles on the board), and the agent is awarded a small bonus for actually achiving the 2048 tile. Critically, I do not reward any hand picked features (for example, giving a reward for putting larger tiles into the corners).

Training

With the current hyper parameters, training is rather slow. It takes around 70 000 000 timesteps for the agent to somewhat consistently achieve the 2048 tile (around 15% of the time in validation). Training for a lot longer (around 800 000 000 timesteps, which took 10 hours), the agent finally achieves the 2048 tile in 75% of attempts, and achieves the 4096 tile in around 15% of cases.

Demo

Watch the model achieve the 4096 tile (using the weights stored under weights/) here. You can watch it play in your own browser by following the instructions in misc/play_browser.py.

demo.webm

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
misc		misc
ppo		ppo
weights		weights
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
train.py		train.py
trajectory_writer.py		trajectory_writer.py
twentyfortyeight.py		twentyfortyeight.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training

Demo

About

Releases

Packages

Languages

License

timovelten/2048

Folders and files

Latest commit

History

Repository files navigation

Training

Demo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages