We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am running your code with the game connect4. It is already doing more than 250k steps. but the reward value is declining and approaching the bottom.
The text was updated successfully, but these errors were encountered:
Hi,
Running MuZero on connect4 requires a lot of computing power that we don't have yet. The default hyperparameters may not allow good learning.
We quickly tested it today (slightly increased the number of blocks and the size of the replay buffer), it seems to learn slowly.
What do you call the reward value? How many self played games do you have for 250k training steps ?
Sorry, something went wrong.
If you haven't done so already, you may want to check https://medium.com/oracledevs/lessons-from-alpha-zero-part-6-hyperparameter-tuning-b1cfcbe4ca9a and previous articles in that series to get an idea for the various hyperparameter values you may use (many directly translate from AlphaZero)
No branches or pull requests
I am running your code with the game connect4. It is already doing more than 250k steps. but the reward value is declining and approaching the bottom.
The text was updated successfully, but these errors were encountered: