Project_RL

Implementation of the Deep Deterministic Policy Gradient (DDPG) and Maximum A Posteriori Policy Optimization (MPO) Reinforcement Learning Algorithms for continuous control on OpenAI gym environments.

Prerequisites

To use the Algorithms will require python3 (>=3.6.5).

This should install all dependencies and the packages (cloning -> into a directory of choice):
```
git clone https://github.com/theogruner/rl_pro_telu
cd rl_pro_telu
pip install -e .
```
Or they can be installed manually

Installation gym: https://github.com/openai/gym

Installation quanser: https://git.ias.informatik.tu-darmstadt.de/quanser/clients/tree/master

Additionally PyTorch, Numpy, Tensorflow, TensorboardX are required.

Usage Examples

Note

The algorithms are intended for continuous gym environments !

DDPG

Usage with provided noises (Ornstein Uhlenbeck, Adaptive Parameter)

If you want to use DDPG with our pre-implemented noises you can simply do this via the main_ddpg.py file.
- Training on the Qube-v0 (can be replaced with any environment id e.g. Levitation-v0) environment with default hyperparameters and Ornstein-Uhlenbeck noise, saving the model as furuta_model.pt and the log in furuta_model (saves by default as ddpg_model.pt, and log in a automatic generated directory). Saving/logging could be disabled with --no-save and --no-log
```
python3 path/to/main_ddpg.py --env Qube-v0 --save_path furuta_model.pt --log_name furuta_log
```
- Loading a saved model and evaluating it in the Qube-v0 environment. Number of episodes and their length for testing can be adapted with --eval_episodes and --eval_ep_length
```
python3 path/to/main_ddpg.py --env Qube-v0 --no-train --eval --load furuta_model.pt
```
- Training and evaluating are executed sequentially, so if u want to evaluate a just trained model, this will work:
```
python3 path/to/main_ddpg.py --env Qube-v0 --train -eval --save_path furuta_model.pt --log_name furuta_log 
```
- Loading a model and setting the train flag will continue training on the loaded model
```
python3 path/to/main_ddpg.py --env Qube-v0 --train -eval --save_path furuta_model.pt --log_name furuta_log --load furuta_model.pt
```
- Adapting hyperparameters is quite intuitive:
```
python3 path/to/main_ddpg.py --env Qube-v0 --gamma 0.5 --tau 0.1 --batch_size 1024
```
- For more information, e.g. about adaptable hyperparameters, use:
```
python3 path/to/main_ddpg.py --help
```

Usage with self defined noise

To use a self defined noise you will need to write a script by yourself.

Make sure the script is the same directory as the ddpg package (only if you didn't install them with PyPI).
The noise should extend the Noise class in noise.py (contain a reset and iteration function)

Following examples cover previous examples functionality

import gym    
import quanser_robots

from ddpg import DDPG
from ddpg import OrnsteinUhlenbeck

# create environment and noise
env = gym.make('Qube-v0')
action_shape = env.action_space.shape[0] 
noise = OrnsteinUhlenbeck(action_shape)

# setup a DDPG model w.r.t. the environment and self defined noise
model = DDPG(env, noise, save_path="furuta_model.pt", log_name="furuta_log")

# load a model
model.load_model("furuta_model.pt")
#trains the model
model.train()
#evaluates the model and returns a meaned reward over all episodes
mean_reward = model.eval(episodes=100, episode_length=500)
print(mean_reward)

# always close the environment when finished 
env.cose()

Setting Hyperparameters would look something like this:

model = DDPG(env, noise, gamma=0.5, tau=0.1, learning_rate=1e-2)

Using a model as a controller

If you want to use your a model as a simple controller just call the model passing an observation:

ctrl = DDPG(env, noise)
ctrl.load('furuta_model.pt')

while not done:
   env.render()
   act = ctrl(obs)
   obs, rwd, done, info = env.step(act)

# always close the environment when finished
env.close()

MPO

Using MPO is analogous to ddpg

Use main_mpo.py instead of main_ddpg.py
For information on the parameters you can set: python3 main_mpo.py --help

Due to no noise needed writing an own script is not necessary but for completeness a little code snippet as example:

import gym
import quanser_robots
from mpo import MPO

env = gym.make('Qube-v0')
model = MPO(env, save_path="furuta_model.pt", log_name="furuta_log")
# continues like DDPG example ...

Logging

By default logging is enabled and safes the logs in the runs/ directory. Name of the logs can be set by the log_name parameter (--log_name argument). Inspecting them works with:

tensorboard --logdir=*/PATH/TO/runs*

This starts a local server, which can be accessed in the browser. Connecting to the server should result in something like this:

Open Source Infos

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Authors

Theo Gruner - theogruner
Luca Dziarski - DariusSchneider

See also the list of contributors who participated in this project.

License

This project is licensed under the GNU GPL3 License - see the LICENSE file for details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project_RL

Prerequisites

Usage Examples

Note

DDPG

MPO

Logging

Open Source Infos

Contributing

Authors

License

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
data		data
ddpg		ddpg
mpo		mpo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main_ddpg.py		main_ddpg.py
main_mpo.py		main_mpo.py
setup.py		setup.py

License

theogruner/rl_pro_telu

Folders and files

Latest commit

History

Repository files navigation

Project_RL

Prerequisites

Usage Examples

Note

DDPG

MPO

Logging

Open Source Infos

Contributing

Authors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages