Skip to content

DQN, REINFORCE, actor-critic, Q-learning, SARSA, Monte Carlo prediction & control, policy & value iteration

Notifications You must be signed in to change notification settings

xysun/rl-algorithms

Repository files navigation

Collection of my implementation of reinforcement learning algorithms

Deep RL

  • DQN
    • Use negative reward to penalise terminal state
    • Let Tensorflow do as much batch processing as possible (I was doing individual inference sequentially for a training batch, lots of time wasted)
    • During Q target update, must use network's current weight for Q_s(t+1), instead of the weight during that particular observation.
    • Provide all action space to training! MSE(q_update, max(prediction)) is wrong, because the max(prediction) can be from a different action than what was recorded in experience and was used for Q update.
    • Smoothed performance over episodes: (lighter blue line is unsmoothed) img

Classical RL:

About

DQN, REINFORCE, actor-critic, Q-learning, SARSA, Monte Carlo prediction & control, policy & value iteration

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published