2048 Temporal Difference Learning

A 2048 agent with N-Tuple Network trained using Backward Temporal Coherence Learning.

Benchmark (Intel® Core™ i5-8300H Processor)

Score

1 ply search is the trained model without any tree search algorithm

3 ply is depth 2 expectimax search with the trained model for evaluation function

Depth	Games	Score	% 32768	% 16384	% 8192
1 ply	100000	201381	3.62	37.14	73.86
3 ply	1000	419479	35.7	80.2	95.5

You can achieve similar results with:

# Build the program with the 4x6 tuple network
make STRUCTURE=nw4x6

# Train the network with 1000000 episodes, α = 1.0, λ = 0.5, TC and restart strategy
./2048 train -e 1000 -a 1.0 -l 0.5 -c -o -r

# Run the agent for 10000 games with no search
./2048 agent -e 10000

# Run the agent for 300 games with 3 ply search
./2048 agent -e 300 -d 2

Tuple networks

Structure	Size	Max tile	Speed (1 thread)	Speed (8 threads)	Speed (3 ply, 8 threads)
nw5x4	1.25MB	8192	5901430 moves/s	20643753 moves/s	40542 moves/s
nw4x5	16MB	16384	5560291 moves/s	18563599 moves/s	50428 moves/s
nw6x5	24MB	16384	3905823 moves/s	12509119 moves/s	37908 moves/s
nw4x6	256MB	32768	2553369 moves/s	8381214 moves/s	40051 moves/s
nw8x6	512MB	32768	1525958 moves/s	4279976 moves/s	19764 moves/s

Features

Backward TD(λ) learning.
N-Tuple Network with configurable structure.
Optional temporal coherence learning and restart strategy.
Expectimax search with configurable depth.
Webview based GUI application.
Multi-threaded training and evaluation.

To achieve high speed and fast learning, both the agent and training code are heavily optimized:

64-bit bitboard representation.
Table lookup for movement and reward.
Transposistion table with Zobrist Hash.
Bit optimizations.
Efficient N-Tuple Network implementation with static structure.

Usage

Download and unzip the trained model (8x6tuple network) here.

Build

GCC and GNU Make are required to build the program. Currently, the makefile only supports Linux. Building the program on other platforms should be possible, but not tested. Cross platform build script is planned.

To build the GUI, you need to install the GTK+ development libraries and webkit2gtk-4.0 and enable the submodule.

Building the program on Google Colaboratory is also supported if you disable the GUI.

Build steps

make [Options]

Parameters:

STRUCTURE: The tuple network structure. (default: nw4x6)
ENABLE_GUI: Enable the GUI. (default: true)
EXTRAS: Extra compiler options for profiling, etc.

Available structures: (See benchmark)

Train model

./2048 train [Options]

Parameters:

-a <alpha>    -- Set the learning rate
                 default: 0.1
-l <lambda>   -- Set the trace decay
                 default: 0.5
-e <episodes> -- Set the number of training games * 1000
                 default: 1
-t <threads>  -- Set the number of threads
                 default: 1 (0 uses all threads)
-i            -- Enable reading from a binary file
-o            -- Enable writing to a binary file
-c            -- Enable temporal coherence learning
-r            -- Enable restart strategy
-h            -- Show this message

Run agent

./2048 agent [Options]

Parameters:

-d <depth>    -- Set the search depth
                 default: 0
-e <episodes> -- Set the number of games to play
                 default: 1
-t <threads>  -- Set the number of threads
                 default: 1 (0 uses all threads)
-g            -- Enable the GUI
-h            -- Show this message

Example:

./agent -d2 -i100 -t 0 # 3 ply, 100 games, multi-threaded
./agent -d4 -g         # 5 ply, enable GUI

Example game with the GUI:

Todo

Add multi-stage learning
Add more settings to the GUI application

License

This app is licensed under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
src		src
.clang-format		.clang-format
.clangd		.clangd
.gitignore		.gitignore
LICENSE.MD		LICENSE.MD
gui.png		gui.png
makefile		makefile
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2048 Temporal Difference Learning

Benchmark (Intel® Core™ i5-8300H Processor)

Score

Tuple networks

Features

Usage

Build

Build steps

Train model

Run agent

Todo

License

About

Releases

Languages

License

ziap/2048-tdl

Folders and files

Latest commit

History

Repository files navigation

2048 Temporal Difference Learning

Benchmark (Intel® Core™ i5-8300H Processor)

Score

Tuple networks

Features

Usage

Build

Build steps

Train model

Run agent

Todo

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages