Connect4 seems not working. Am I wrong? #36

verystrongjoe · 2020-04-24T06:18:18Z

I am running your code with the game connect4. It is already doing more than 250k steps. but the reward value is declining and approaching the bottom.

werner-duvaud · 2020-04-24T14:59:08Z

Hi,

Running MuZero on connect4 requires a lot of computing power that we don't have yet. The default hyperparameters may not allow good learning.

We quickly tested it today (slightly increased the number of blocks and the size of the replay buffer), it seems to learn slowly.

What do you call the reward value?
How many self played games do you have for 250k training steps ?

fidel-schaposnik · 2020-04-27T16:29:34Z

If you haven't done so already, you may want to check https://medium.com/oracledevs/lessons-from-alpha-zero-part-6-hyperparameter-tuning-b1cfcbe4ca9a and previous articles in that series to get an idea for the various hyperparameter values you may use (many directly translate from AlphaZero)

ahainaut closed this as completed Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connect4 seems not working. Am I wrong? #36

Connect4 seems not working. Am I wrong? #36

verystrongjoe commented Apr 24, 2020

werner-duvaud commented Apr 24, 2020

fidel-schaposnik commented Apr 27, 2020

Connect4 seems not working. Am I wrong? #36

Connect4 seems not working. Am I wrong? #36

Comments

verystrongjoe commented Apr 24, 2020

werner-duvaud commented Apr 24, 2020

fidel-schaposnik commented Apr 27, 2020