Status: WIP, actively developed
This project started as an excercise in training a Deep Q-Learning agent to navigate in BananaCollector (Unity ML-Agents 0.4.0) environment.
Currently, I'm working on turning this project into a more general framework to train RL agents in any OpenAI Gym environment with any of the most common RL Agent libraries. I think, too often environments, agents and the code between (providing visualization, optimization, persistance, reproducibility) are tightly coupled. Although this framework could work as glue between algorithm libraries and training environments, the main purpose is still mainly for myself to learn more about Deep Reinforcement Learning.
I don't consider the version to be v0.1 quite yet, so expect thing to change and break.
Algorithm | State/Action Spaces | Example environments |
---|---|---|
DQN | Continuous/Discrete | |
DDPG | Continuous/Continuous | |
Q-learning | Discrete/Discrete | Taxi (Gym) |
Expected Sarsa | Discrete/Discrete | Taxi (Gym) |
- add DDPG algorithm
- add pytest tests
- add Interaction class to serve as a middleman between the environment and the agent, to remove need for custom handling loops
- add more environments and an easier way to use and install them
- add better measurement (mean/min/max, episode lenghts, details of exploration, etc)
- add hyperparameter tuning example
- add Prioritized Experience Replay
- add Sarsa/Q-Learning agents and examples
- add callbacks for flexible logging, and adapters to handle non-standard environments and agents
- make runs reproducible
- remove legacy ML-Agents v0.4.0 examples
- add OpenAI Gym examples
- update usage examples and architecture diagrams
- add as simple as can be vanilla versions of DQN and DDPG
- add basic algorithms like Policy Iteration and Vanilla Policy Gradient
- make DQN and DDPG agents capable of learning from pixel data
- add ML-Agents examples
- add proper documentation
- add examples of using algorithms from other packages
- add more advanced variations of DQN
- add more advanced algorithms like PPO and SAC
Requirements for running the project are Linux (or similar, like WSL), Python version >= 3.6 and Pytorch 1.4.0.
- Clone the repository
git clone https://github.com/tjkemp/ubik-agent.git
- Create a python virtual environment and install dependencies
cd ubik-agent
python -m venv venv
source venv/bin/activate
pip install --no-deps -r requirements.txt
All the examples are in the examples package directory as classes in their own files. (Currently only Gym Taxi v3.)
You can run example modules in the following format: python -m <package.module> <method> <experiment_name>
To get help on arguments for each executable, run the module with -h
switch.
$ python -m examples.banana -h
usage: taxi.py [-h] {optimize,random,run,train} ...
optional arguments:
-h, --help show this help message and exit
method:
{optimize,random,run,train}
a method in the class
optimize optimize
random random
run run
train train
To test the environment with an agent behaving totally randomly run the executable with argument random.
python -m examples.taxi random
The python executable takes the directory name of an instance of trained agent as an argument. All the trained agents are saved into the directory models.
python -m examples.taxi train my-cool-agent
At the end of the training, the agent model will be saved in the directory models/my-cool-agent as checkpoint.pth
.
python -m examples.taxi run my-cool-agent
- This project is licensed under MIT LICENSE.
- The original code was inspired by Udacity Deep Reinforcement Learning nanodegree materials.