Skip to content

Commit

Permalink
release 0.1
Browse files Browse the repository at this point in the history
  • Loading branch information
zuoxingdong committed Sep 20, 2018
1 parent 4c511a5 commit e843d25
Show file tree
Hide file tree
Showing 30 changed files with 577 additions and 35 deletions.
23 changes: 11 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,22 +22,23 @@

# Basics

`lagom` balances between the flexibility and the userability when developing reinforcement learning (RL) algorithms. The library is built on top of [PyTorch](https://pytorch.org/) and provides modular tools to quickly prototype RL algorithms. However, we do not go overboard, because going too low level is rather time consuming and prone to potential bugs, while going too high level degrades the flexibility which makes it difficult to try out some crazy ideas.
`lagom` balances between the flexibility and the usability when developing reinforcement learning (RL) algorithms. The library is built on top of [PyTorch](https://pytorch.org/) and provides modular tools to quickly prototype RL algorithms. However, it does not go overboard, because too low level is often time consuming and prone to potential bugs, while too high level degrades the flexibility which makes it difficult to try out some crazy ideas fast.

We are continuously making `lagom` more 'self-contained' to run experiments quickly. Now, it internally supports base classes for multiprocessing ([master-worker framework](https://en.wikipedia.org/wiki/Master/slave_(technology))) to parallelize (e.g. experiments and evolution strategies). It also supports hyperparameter search by defining configurations either as grid search or random search.
We are continuously making `lagom` more 'self-contained' to set up and run experiments quickly. It internally supports base classes for multiprocessing ([master-worker framework](https://en.wikipedia.org/wiki/Master/slave_(technology))) for parallelization (e.g. experiments and evolution strategies). It also supports hyperparameter search by defining configurations either as grid search or random search.

One of the main pipelines to use `lagom` can be done as following:
1. Define environment and RL agent
2. User runner to collect data for agent
3. Define algorithm to train agent
4. Define experiment and configurations.
A common pipeline to use `lagom` can be done as following:
1. Define [environment](lagom/envs) and [agent](lagom/agents) (mainly for RL)
2. Use [runner](lagom/runner) to collect data (trajectories or segments) for agent
3. Define [engine](lagom/engine) for training and evaluating the agent
4. Define [algorithm](lagom/base_algo.py)
5. Define [experiment](lagom/experiment) and [configurations](lagom/experiment/configurator.py)

A graphical illustration is coming soon.

# Installation

## Install dependencies
Run the following command to install [all the dependencies](./requirements.txt):
Run the following command to install [all required dependencies](./requirements.txt):

```bash
pip install -r requirements.txt
Expand All @@ -53,7 +54,7 @@ We also provide some bash scripts in [scripts/](scripts/) directory to automatic

## Install lagom

Run the following command to install from source:
Run the following commands to install lagom from source:

```bash
git clone https://github.com/zuoxingdong/lagom.git
Expand All @@ -73,7 +74,7 @@ The documentation hosted by ReadTheDocs is available online at [http://lagom.rea

# Examples

We shall continuously provide [examples/](examples/) to use lagom.
We are continuously providing [examples/](examples/) to use lagom.

# Test

Expand All @@ -86,7 +87,6 @@ pytest test -v
# Roadmap

## Core
- Readthedocs Documentation
- Tutorials
## More standard RL baselines
- TRPO/PPO
Expand All @@ -99,7 +99,6 @@ pytest test -v
## More standard networks
- Monte Carlo Dropout/Concrete Dropout
## Misc
- VecEnv: similar to that of OpenAI baseline
- Support pip install
- Technical report

Expand Down
2 changes: 1 addition & 1 deletion examples/es/rl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ One could modify [experiment.py](./experiment.py) to quickly set up different co

# Results

<img src='data/result.png' width='100%'>
<img src='data/result.png' width='75%'>
8 changes: 4 additions & 4 deletions examples/policy_gradient/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
We benchmark three baselines for policy gradient method in several different perspectives
1. REINFORCE
2. Actor-Critic/Vanilla Policy Gradient
3. Advantage Actor-Critic (A2C)
This example includes the implementations of the following policy gradient algorithms:

- [REINFORCE](reinforce)
- [Vanilla Policy Gradient (VPG)](vpg)
- [Advantage Actor-Critic (A2C)](a2c)
17 changes: 17 additions & 0 deletions examples/policy_gradient/a2c/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Advantage Actor Critic (A2C)

This is an implementation of [A2C](https://blog.openai.com/baselines-acktr-a2c/) algorithm.

# Usage

Run the following command to start parallelized training:

```bash
python main.py
```

One could modify [experiment.py](./experiment.py) to quickly set up different configurations.

# Results

<img src='data/result.png' width='75%'>
1 change: 1 addition & 0 deletions examples/policy_gradient/a2c/experiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ def make_configs(self):
configurator.fixed('algo.gamma', 0.99)

configurator.fixed('agent.standardize_Q', False) # whether to standardize discounted returns
configurator.fixed('agent.standardize_adv', True) # whether to standardize advantage estimates
configurator.fixed('agent.max_grad_norm', 0.5) # grad clipping, set None to turn off
configurator.fixed('agent.entropy_coef', 0.01)
configurator.fixed('agent.value_coef', 0.5)
Expand Down
1 change: 1 addition & 0 deletions examples/policy_gradient/a2c/logs/0/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ algo.lr: 0.001
algo.use_lr_scheduler: true
algo.gamma: 0.99
agent.standardize_Q: false
agent.standardize_adv: true
agent.max_grad_norm: 0.5
agent.entropy_coef: 0.01
agent.value_coef: 0.5
Expand Down
Binary file modified examples/policy_gradient/a2c/logs/configs.pkl
Binary file not shown.
232 changes: 232 additions & 0 deletions examples/policy_gradient/a2c/main.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/zuo/Code/lagom/lagom/core/plotter/__init__.py:9: UserWarning: ImageViewer failed to import due to pyglet. \n",
" warnings.warn('ImageViewer failed to import due to pyglet. ')\n"
]
}
],
"source": [
"from pathlib import Path\n",
"from lagom.experiment import Configurator\n",
"\n",
"from lagom import pickle_load\n",
"\n",
"from lagom.core.plotter import CurvePlot"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>cuda</th>\n",
" <th>env.id</th>\n",
" <th>env.standardize</th>\n",
" <th>network.hidden_sizes</th>\n",
" <th>algo.lr</th>\n",
" <th>algo.use_lr_scheduler</th>\n",
" <th>algo.gamma</th>\n",
" <th>agent.standardize_Q</th>\n",
" <th>agent.standardize_adv</th>\n",
" <th>...</th>\n",
" <th>agent.constant_std</th>\n",
" <th>agent.std_state_dependent</th>\n",
" <th>agent.init_std</th>\n",
" <th>train.timestep</th>\n",
" <th>train.N</th>\n",
" <th>train.T</th>\n",
" <th>eval.N</th>\n",
" <th>log.record_interval</th>\n",
" <th>log.print_interval</th>\n",
" <th>log.dir</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>True</td>\n",
" <td>HalfCheetah-v2</td>\n",
" <td>True</td>\n",
" <td>[64, 64]</td>\n",
" <td>0.001</td>\n",
" <td>True</td>\n",
" <td>0.99</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>False</td>\n",
" <td>0.5</td>\n",
" <td>1000000.0</td>\n",
" <td>16</td>\n",
" <td>5</td>\n",
" <td>10</td>\n",
" <td>100</td>\n",
" <td>1000</td>\n",
" <td>logs</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1 rows × 25 columns</p>\n",
"</div>"
],
"text/plain": [
" ID cuda env.id env.standardize network.hidden_sizes algo.lr \\\n",
"0 0 True HalfCheetah-v2 True [64, 64] 0.001 \n",
"\n",
" algo.use_lr_scheduler algo.gamma agent.standardize_Q \\\n",
"0 True 0.99 False \n",
"\n",
" agent.standardize_adv ... agent.constant_std \\\n",
"0 True ... None \n",
"\n",
" agent.std_state_dependent agent.init_std train.timestep train.N train.T \\\n",
"0 False 0.5 1000000.0 16 5 \n",
"\n",
" eval.N log.record_interval log.print_interval log.dir \n",
"0 10 100 1000 logs \n",
"\n",
"[1 rows x 25 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"log_folder = Path('logs')\n",
"\n",
"list_config = pickle_load(log_folder/'configs.pkl')\n",
"configs = Configurator.to_dataframe(list_config)\n",
"configs"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def load_results(log_folder, ID, f):\n",
" p = Path(log_folder)/str(ID)\n",
" \n",
" list_result = []\n",
" for sub in p.iterdir():\n",
" if sub.is_dir() and (sub/f).exists():\n",
" list_result.append(pickle_load(sub/f))\n",
" \n",
" return list_result\n",
"\n",
"\n",
"def get_returns(list_result):\n",
" returns = []\n",
" for result in list_result:\n",
" #x_values = [i['evaluation_iteration'][0] for i in result]\n",
" x_values = [i['accumulated_trained_timesteps'][0] for i in result]\n",
" y_values = [i['average_return'][0] for i in result]\n",
" returns.append([x_values, y_values])\n",
" \n",
" return returns\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"ID = 0\n",
"env_id = configs.loc[configs['ID'] == ID]['env.id'].values[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"list_result = load_results('logs', ID, 'eval_logs.pkl')\n",
"returns = get_returns(list_result)\n",
"x_values, y_values = zip(*returns)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plot = CurvePlot()\n",
"plot.add('A2C', y_values, xvalues=x_values)\n",
"ax = plot(title=f'A2C on {env_id}', \n",
" xlabel='Iteration', \n",
" ylabel='Mean Episode Reward', \n",
" num_tick=6, \n",
" xscale_magnitude=None)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ax.figure.savefig('data/result.png')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
17 changes: 17 additions & 0 deletions examples/policy_gradient/reinforce/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# REINFORCE

This is an implementation of [REINFORCE](https://link.springer.com/article/10.1007/BF00992696) algorithm.

# Usage

Run the following command to start parallelized training:

```bash
python main.py
```

One could modify [experiment.py](./experiment.py) to quickly set up different configurations.

# Results

<img src='data/result.png' width='75%'>
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion examples/policy_gradient/reinforce/experiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def make_configs(self):

configurator.fixed('cuda', True) # whether to use GPU

configurator.fixed('env.id', 'Reacher-v2')
configurator.fixed('env.id', 'HalfCheetah-v2')
configurator.fixed('env.standardize', True) # whether to use VecStandardize

configurator.fixed('network.hidden_sizes', [64, 64])
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
2 changes: 1 addition & 1 deletion examples/policy_gradient/reinforce/logs/0/config.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ID: 0
cuda: true
env.id: Reacher-v2
env.id: HalfCheetah-v2
env.standardize: true
network.hidden_sizes:
- 64
Expand Down
Binary file modified examples/policy_gradient/reinforce/logs/configs.pkl
Binary file not shown.

0 comments on commit e843d25

Please sign in to comment.