Language-Conditioned Imitation Learning for Robot Manipulation Tasks

This repository is the official implementation of Language-Conditioned Imitation Learning for Robot Manipulation Tasks, which has been accepted to NeurIPS 2020 as spotlight presentation.

When using this code and/or model, we would apprechiate the following citation:

@misc{stepputtis2020languageconditioned,
      title={Language-Conditioned Imitation Learning for Robot Manipulation Tasks}, 
      booktitle = {Advances in Neural Information Processing Systems},
      author={Simon Stepputtis and Joseph Campbell and Mariano Phielipp and Stefan Lee and Chitta Baral and Heni Ben Amor},
      year={2020},
      eprint={2010.12083},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
}

Inddex

Environment Setup

Local Setup

Our code is tested on Ubuntu 18.04 with Python 3.6. At this time, running our code on MacOS or Windows is not supported. To install Python requirements:

pip install -r requirements.txt

Further requirements for Evaluation:

CoppeliaSim: Downloading and installing the player version will be sufficient, as long as you do not want to change the simulation environment itself. Our code was tested with version 4.0 and 4.1
ROS 2 Eloquent: ROS is used for communication between the simulator and the neural network. Before running our code, please make sure to compile and source the workspace ros2 in this repository in order for our code to find the required packages
PyRep: Please follow the installation instructions at the respective repository.

Further requirements for Data Collection:

Orocos KDL: The python-wrapper has to match the solver version installed on your system. We strongly suggest to install both components from the git repository. For Python3, the following GitHub-Issue provides guidance for the installation process.

To run the model, you need to download the dataset, pre-trained model, and other required files. The required files can be downloaded from here. The downloaded file contains a pre-trained model, the processed training dataset (and other supporting files), and the test-data used for evaluation. The downloaded file should be placed next to the root folder of this repository. The folder LanguagePolicies and the extracted GDrive should reside in the same directory.

Docker

If you rather look at this code in a Docker container, we provided a Dockerfile with this repository. To build the container, run the following

docker build -t languagepolicies .

After the container is successfully built, start it with the following command (please note that the container takes some time to start up fully)

docker run -p 6081:80 -e RESOLUTION=1280x720 --rm languagepolicies

After seeing some terminal output, direct your browser to localhost:6081. This repository is fully set up in ~/Code, and you can follow the instructions below to train and/or evaluate the model. In the container, you can find a terminal in the start menu under System Tools -> LXTerminal.

Please note that data collection and processing is not supported in the docker container.

Quick Start

A detailed description of the training and evaluation process can be found on our Details: Training and Evaluation page. If you are interested in collecting data, please refer to our Details: Data Collection page.

Training

To train the model with default parameters, run the following command in this repository's root directory.

python main.py

The trained model will be located in Data/Model, and TensorBoard logs will be in Data/TBoardLog. Overall, training will take around 35 hours, depending on your hardware. A GPU is not required, and our model has been trained on a node with two Intel Xeon CPU E5-2699A v4 @ 2.40GHz. Please note that the usage of a GPU is not beneficial to our model due to the use of a custom RNN loop.

Evaluation

Our model can be live-evaluated in CoppeliaSim. To run the evaluation, ROS2 is required. Please start by building the ros2 workspace and source it. First, the pre-trained model will be loaded from the GDrive directory and provided as a service with

python service.py

After the service has been started, the model can be evaluated in the simulator with

python val_model_vrep.py

This will create a file val_result.json after ten evaluation runs (Results in our paper are from 100 runs. This value can be changed). Results can be printed in the terminal by running.

python viz_val_vrep.py

Results

We summarize the results of testing our model on a set of 100 unseen, new environments. Our model's overall task success describes the percentage of cases in which the cup was first lifted, and then successfully poured into the correct bowl. This sequence of steps was successfully executed in 84% of the new environments. Picking alone achieves a 98% success rate while pouring results in 85%. The Detection rate indicates the success rate of the semantic model, attempting to identify the correct objects. Content-In-Bowl outlines the percentage of material that was delivered to the correct bowl during the pouring action. Finally, we report the mean-absolute-error of the robot's joint configuration. These results indicate that the model appropriately generalizes the trained behavior to changes in object position, verbal command, or perceptual input. In additon, we also compared the models performance to a simple RNN approach and a recent state-of-the-art baseline ("Pay attention!-robustifying a deep visuomotor policy through task-focused visual attention" Abolghasemi et. al.):

Model	Picking	Pouring	Sequential	Detection	Content-In-Bowl	MAE (Joints, Radiant)
Simple RNN	58%	0%	0%	52%	7%	0.30°
PayAttention!	23%	8%	0%	66%	41%	0.13°
Ours	98%	85%	84%	94%	94%	0.05°

Further results can be found in our Additional Results page.

An execution of our model in a specific environment is shown below. First, the languaage command Rais the green cup and an image of the current environment is given to the model. This allows the robot to identify the target object in the current environment, as well as and desired action. After the cup has been picked up, a second comand Fill all of it into the small red bowl is issued and processed in the same environment. In addition to identify the target bowl and action (the what and where), the robot also identifies a quantity modifier, used to describe how the robot should execute the described task. In this case, all of the cup's content is filled into the target bowl.

More examples can be found in the Additional Examples

Contributing

If you would like to contribute or have any suggestions, feel free to open an issue on this GitHub repository or contact the first author of this work!

All contributions welcome! All content in this repository is licensed under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
doc		doc
model_src		model_src
ros2/src/policy_translation		ros2/src/policy_translation
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.py		main.py
ours_full_cl.json		ours_full_cl.json
requirements.txt		requirements.txt
service.py		service.py
val_model_vrep.py		val_model_vrep.py
val_result.json		val_result.json
viz_val_vrep.py		viz_val_vrep.py

License

taocao/LanguagePolicies

Folders and files

Latest commit

History

Repository files navigation

Language-Conditioned Imitation Learning for Robot Manipulation Tasks

Inddex

Environment Setup

Local Setup

Docker

Quick Start

Training

Evaluation

Results

Contributing

About

Resources

License

Stars

Watchers

Forks

Languages