-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial code release of the BDPI algorithm
- Loading branch information
0 parents
commit 6832906
Showing
11 changed files
with
1,888 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
__pycache__ | ||
out-* | ||
commands_* | ||
*.episode |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Sample-Efficient Reinforcement Learning with Bootstrapped Dual Policy Iteration | ||
|
||
This repository contains the complete implementation of the Bootstrapped Dual Policy Iteration algorithm we developed over the past year and a half. The repository also contains scripts to re-run and re-plot our experiments. | ||
|
||
## Organization | ||
|
||
The following components are available in this repository | ||
|
||
* The complete Python 3 source code of our algorithm; | ||
* Two OpenAI Gym environments: Five Rooms and Table (FrozenLake8x8 is available in the OpenAI Gym); | ||
* Scripts to re-run all our experiments in a fully automated way. | ||
|
||
The files are organized as follows: | ||
|
||
* `gym_envs/`: Gym environments. | ||
* `main.py`: A simple RL agent that performs actions in a Gym environment, and learns using BDPI | ||
* `bdpi.py`: The BDPI learning algorithm (actor, critics, and glue between them) | ||
* `experiments_gym.sh`: A script that produces a job description for a given environment. Run an experiment with `./experiments_gym.sh table && cat commands_table.sh | parallel -jCORES` | ||
|
||
## Dependencies | ||
|
||
Reproducing our results require a computer with the following components: | ||
|
||
* A recent Linux distribution | ||
* Python 3, with `lzo` and PyTorch | ||
* GNU Parallel | ||
* Gnuplot |
Oops, something went wrong.