Solving continuous control via Q-learning

Simple implementation of the Decoupled Q-networks (DecQN) agent in TensorFlow 2. The implementation is based on the Acme library and follows their overall design patterns with additional customization of the run loop and logging, as well as agent definition. This is a minimal re-implementation that streamlines agent definition and should yield the performance reported in our accompanying paper.

Installation

Get the code and change directory:

git clone https://github.com/tseyde/decqn.git && cd decqn
Install conda environment and activate:

conda env create -f decqn.yml && conda activate decqn && cd decqn
Run an example experiment with DecQN:

python3 run_experiment.py --algorithm=decqn --task=walker_walk

Method overview

Main changes compared to DQN:

Discretize continuous action space along each dimension by only considering bang-bang actions
Instead of enumerating the action space, add 1 value output per dimension per bin to the Q-network
Recover overall value function by choosing one output per action dimension and taking the mean

This assumes a linear value function decomposition and treats single-agent continuous control as a multi-agent discrete control problem. The key difference to the original DQN agent is the reduced number of output dimensions of the Q-network and the additional aggregation across action dimensions. The remaining structure of the original agent may be left unchanged.

Reference

If you find our agent or code useful in your own research, please refer to our paper:

@article{seyde2022solving,
  title={Solving Continuous Control via Q-learning},
  author={Seyde, Tim and Werner, Peter and Schwarting, Wilko and Gilitschenski, Igor and Riedmiller, Martin and Rus, Daniela and Wulfmeier, Markus},
  journal={arXiv preprint arXiv:2210.12566},
  year={2022}
}

Benchmark performance

Performance on a variety of tasks from the DeepMind Control Suite as well as MetaWorld.

Feature observations

DecQN trained on feature observations in comparison to the D4PG and DMPO baseline agents.

Pixel observations

DecQN trained on pixel observations in comparison to the DrQ-v2 and DreamerV2 baseline agents.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
decqn		decqn
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
decqn.yml		decqn.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Solving continuous control via Q-learning

Installation

Method overview

Reference

Benchmark performance

Feature observations

Pixel observations

About

Releases

Packages

Languages

License

tseyde/decqn

Folders and files

Latest commit

History

Repository files navigation

Solving continuous control via Q-learning

Installation

Method overview

Reference

Benchmark performance

Feature observations

Pixel observations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages