Pytorch-REINFORCE

pytorch implementation of REINFORCE, a monte carlo， policy network based Reinforcement Learning algorithm

Policy network

Comparing to value based algorithm like DQN, policy based method straightly generates the probability of every possible action(action distribution). To train such a network, monte carlo is adopted for the purpose of sampling trajectory. Each trajectory possesses its reward, we use gradient ascent to update the network in the direction of maximum reward.

REINFORCE

REINFORCE let the agent play a whole episode and update the weights using the reward obtained in such a way that the expected reward is maximum. So the expercted reward is: $$J(\theta) = E_{r~p(r;\theta}[R(\tau)] = \int_\tau r(\tau)p(\tau;\theta)d\tau$$

REINFORCE maximizes the above expection by gradient ascent and a few tricks are used to get the gradient. $$\nabla \log(J(\theta)) = \sum_{t \le 0} r(\tau)\nabla_\theta log\pi_\theta (a_t|s_t)$$

implementation

agent.py: the REINFORCE agent
network.py: the policy network
train.py: training entrance

Training

To start training, first modify the environment name in train.py then

python train.py

All the reward with respect to episode is recorded using tensorboard, to check

tensorboard --logdir=runs

Experiments and results

CartPole-v0

CartPole-v0 involves a pole on a car and an agent is required to hold control of the car(left/right) in order to prevent the pole from falling. Each step, the agent is rewarded 1 if the pole fails to fall, with a maximum steps of 200.

The following is loss with respect to episodes

And a demo of the trained agents

Pong-v0

Pong-v0 is a atari2600 game where an agent is trying to play against computer in the environment. The final reward is the score of agent minus the score of the computer.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
images		images
runs/Mar18_00-32-06_WilliamdeMacBook-Pro.local		runs/Mar18_00-32-06_WilliamdeMacBook-Pro.local
.gitignore		.gitignore
CartPole-v0.pth		CartPole-v0.pth
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
network.py		network.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pytorch-REINFORCE

Policy network

REINFORCE

implementation

Training

Experiments and results

CartPole-v0

Pong-v0

About

Releases

Packages

Languages

License

williamium3000/Pytorch-REINFORCE

Folders and files

Latest commit

History

Repository files navigation

Pytorch-REINFORCE

Policy network

REINFORCE

implementation

Training

Experiments and results

CartPole-v0

Pong-v0

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages