In this work, we present RMBench, the first benchmark for robotic manipulations, which have high-dimensional continuous action and state spaces. We implement and evaluate reinforcement learning algorithms that directly use observed pixels as inputs.
This repository is the official implementation of our paper: Y. Xiang et al., “RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control,” Oct. 2022, doi: 10.48550/arXiv.2210.11262.
- VPG Sutton et al., 2000
- TRPO Schulman et al., 2015
- PPO Schulman et al., 2017
- DDPG Silver et al., 2014
- TD3 Fujimoto et al., 2018
- SAC Haarnoja et al., 2018
- DrQ-v2 Yarats et al., 2021
We utilize dm_control software package, which has task suites for reinforcement learning agents in an articulated-body simulation. We focus on the manipulation tasks with a 3D robotic arm, which can be divided into five categories: lifting, placing, reaching, stacking, and reassembling. They are described briefly below.
Category | Task | Description |
Lifting | Lift brick | Elevate a brick above a threshold height. |
Lift large box | Elevate a large box above a threshold height. The box is too large to be grasped by the gripper, requiring non-prehensile manipulation. | |
Reaching | Reach site | Move the end effector to a target location in 3D space. |
Reach brick | Move the end effector to a brick resting on the ground. | |
Placing | Place cradle | Place a brick inside a concave `cradle' situated on a pedestal. |
Place brick | Place a brick on top of another brick that is attached to the top of a pedestal. Unlike the stacking tasks below, the two bricks are not required to be snapped together in order to obtain maximum reward. | |
Stacking | Stack 2 bricks | Snap together two bricks, one of which is attached to the floor. |
Stack 2 bricks movable base | Same as `stack 2 bricks', except both bricks are movable. | |
Reassembling | Reassemble 5 bricks random order | The episode begins with all five bricks already assembled in a stack, with the bottom brick being attached to the floor. The agent must disassemble the top four bricks in the stack, and reassemble them in the opposite order. |
- Install MuJoCo
- Obtain a license on the MuJoCo website.
- Download MuJoCo binaries here. such as 'mujoco210_linux.zip'
- Unzip the downloaded archive into ~/.mujoco/mujoco210
$ mkdir ~/.mujoco/mujoco210
$ cp mujoco210\_linux.zip ~/.mujoco/mujoco210
$ cd ~/.mujoco/mujoco210
$ unzip mujoco210\_linux.zip
- Place your license key file mjkey.txt at ~/.mujoco/mujoco210.
$ cp mjkey.txt ~/.mujoco/mujoco210
$ cp mjkey.txt ~/.mujoco/mujoco210/mujoco210_linux/bin
- Add environment variables: Use the env variables MUJOCO_PY_MJKEY_PATH and MUJOCO_PY_MUJOCO_PATH to specify the MuJoCo license key path and the MuJoCo directory path.
$ export MUJOCO\_PY\_MJKEY\_PATH=$MUJOCO\_PY\_MJKEY\_PATH:~/.mujoco/mujoco210/mjkey.txt
$ export MUJOCO\_PY\_MUJOCO\_PATH=$MUJOCO\_PY\_MUJOCO\_PATH:~/.mujoco/mujoco210/mujoco210\_linux
- Append the MuJoCo subdirectory bin path into the env variable LD_LIBRARY_PATH.
$ export LD\_LIBRARY\_PATH=$LD\_LIBRARY\_PATH:~/.mujoco/mujoco210/bin
- Install the required python library
$ pip install -r requirements.txt
For example, we want to train agents using DrQ-v2 algorithms for 'reaching site' tasks:
$ cd 00\_DrQv2
$ python drqv2_train.py task=reach_site
When the training process finishes, you can use 'plot_curve.py' to plot the curves of rewards.
Part of this code is inspired by SpinningUp2018 and DrQ-v2
Please kindly consider citing our paper in your publications.
@misc{https://doi.org/10.48550/arxiv.2210.11262,
doi = {10.48550/ARXIV.2210.11262},
url = {https://arxiv.org/abs/2210.11262},
author = {Xiang, Yanfei and Wang, Xin and Hu, Shu and Zhu, Bin and Huang, Xiaomeng and Wu, Xi and Lyu, Siwei},
keywords = {Robotics (cs.RO), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}