Skip to content

Commit

Permalink
Initial code release of the BDPI algorithm
Browse files Browse the repository at this point in the history
  • Loading branch information
steckdenis committed Sep 12, 2018
0 parents commit 6832906
Show file tree
Hide file tree
Showing 11 changed files with 1,888 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
__pycache__
out-*
commands_*
*.episode
674 changes: 674 additions & 0 deletions COPYING

Large diffs are not rendered by default.

27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Sample-Efficient Reinforcement Learning with Bootstrapped Dual Policy Iteration

This repository contains the complete implementation of the Bootstrapped Dual Policy Iteration algorithm we developed over the past year and a half. The repository also contains scripts to re-run and re-plot our experiments.

## Organization

The following components are available in this repository

* The complete Python 3 source code of our algorithm;
* Two OpenAI Gym environments: Five Rooms and Table (FrozenLake8x8 is available in the OpenAI Gym);
* Scripts to re-run all our experiments in a fully automated way.

The files are organized as follows:

* `gym_envs/`: Gym environments.
* `main.py`: A simple RL agent that performs actions in a Gym environment, and learns using BDPI
* `bdpi.py`: The BDPI learning algorithm (actor, critics, and glue between them)
* `experiments_gym.sh`: A script that produces a job description for a given environment. Run an experiment with `./experiments_gym.sh table && cat commands_table.sh | parallel -jCORES`

## Dependencies

Reproducing our results require a computer with the following components:

* A recent Linux distribution
* Python 3, with `lzo` and PyTorch
* GNU Parallel
* Gnuplot
Loading

0 comments on commit 6832906

Please sign in to comment.