Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

A3C code and models for Atari games in gym

Multi-GPU version of the A3C algorithm in Asynchronous Methods for Deep Reinforcement Learning.

Results of the code trained on 47 different Atari games were uploaded to OpenAI Gym and available for download. Most of them were the best reproducible results on gym. However OpenAI has later removed the leaderboard from their site.

To train on an Atari game:

./ --env Breakout-v0 --gpu 0

In each iteration it trains on a batch of 128 new states. The speed is about 20 iterations/s (2.5k images/s) on 1 V100 GPU plus 12+ CPU cores. Note that the network architecture is larger than what's used in the original paper.

The pretrained models are all trained with 4 GPUs for about 2 days. But on simple games like Breakout, you can get decent performance within several hours. For example, it takes only 2 hours on a V100 to reach 400 average score on Breakout.

Some practicical notes:

  1. Prefer Python 3; Windows not supported.
  2. Training with a significant slower speed (e.g. on CPU) will result in very bad score, probably because of the slightly off-policy implementation.
  3. Occasionally, processes may not get terminated completely. If you're using Linux, install python-prctl to prevent this.

To test a model:

Download models from model zoo.

Watch the agent play: ./ --task play --env Breakout-v0 --load Breakout-v0.npz

Dump some videos: ./ --task dump_video --load Breakout-v0.npz --env Breakout-v0 --output output_dir --episode 3

This table lists available pretrained models and scores (average over 100 episodes), with their submission links. The old submission site is not maintained any more so the links might become invalid any time.

AirRaid(2727) Alien (2611) Amidar(1376) Assault(3397)
Asterix(407432) Asteroids(1965) Atlantis(217186) BankHeist(1274)
BattleZone(29210) BeamRider(5972) Berzerk(2289) Breakout (667)
Carnival(5211) Centipede(2909) ChopperCommand(6031) CrazyClimber(105297)
DemonAttack(33992) DoubleDunk(23) ElevatorAction(11377) FishingDerby(34)
Frostbite(6824) Gopher(22595) Gravitar(2144) IceHockey(19)
Jamesbond(640) JourneyEscape(-407) Kangaroo(6540) Krull(6100)
KungFuMaster(34767) MsPacman(5738) NameThisGame(15321) Phoenix(75312)
Pong(21) Pooyan(5607) Qbert(20182) Riverraid(14185)
RoadRunner(60615) Robotank(60) Seaquest(46890) SpaceInvaders(3454)
StarGunner(93480) Tennis(23) Tutankham(275) UpNDown(92163)
VideoPinball(140156) WizardOfWor(3824) Zaxxon(32894)

All models above are trained with the -v0 variant of atari games. Note that this variant is quite different from DeepMind papers, so the scores are not directly comparable. The most notable differences are:

  • Each action is randomly repeated 2~4 times.
  • Inputs are RGB instead of greyscale.
  • An episode is limited to 60000 steps.
  • Lost of live is not end of episode.

Also see the DQN implementation in tensorpack

You can’t perform that action at this time.