Permalink
Browse files

Update README.md

  • Loading branch information...
1 parent 5ff8ff6 commit eb35b8fc3a6b4776087c1e77223d6fb519c49e1d @yenchenlin committed Apr 17, 2016
Showing with 1 addition and 1 deletion.
  1. +1 −1 README.md
View
@@ -76,7 +76,7 @@ At first, I initialize all weight matrices randomly using a normal distribution
I start training by choosing actions uniformly at random for the first 10,000 time steps, without updating the network weights. This allows the system to populate the replay memory before training begins.
-Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it **flap** too much and thus keeps itself at the top of the game screen and finally bump the pipe clumsy. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low.
+Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it **flap** too much and thus keeps itself at the top of the game screen and finally bump the pipe in a clumsy way. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low.
However, in other games, initialize ϵ to 1 is more reasonable.
During training time, at each time step, the network samples minibatches of size 32 from the replay memory to train on, and performs a gradient step on the loss function described above using the Adam optimization algorithm with a learning rate of 0.000001. After annealing finishes, the network continues to train indefinitely, with ϵ fixed at 0.001.

0 comments on commit eb35b8f

Please sign in to comment.