Skip to content

CRAVES: Controlling Robotic Arm with a Vision-based, Economic System


Notifications You must be signed in to change notification settings


Repository files navigation

CRAVES: Controlling Robotic Arm with a Vision-based, Economic System

This is the code for pose estimation module of CRAVES. If you want to test on the OWI-535 hardware, please refer the control module here.

The project controls a toy robotic arm (OWI-535) with a single RGB camera. Please see the system pipeline and how it works in docs/ first before trying the code. The following animation shows the arm controlling by a mounted camera to reach a goal without relying on any other sensors.


Here are some visualization result from the YouTube dataset:


./data_generation/ contains samples on how to visualize the images and their annotations.

Dataset Download

We created three datasets for this project, namely synthetic, lab and youtube.

Download the datasets from here.

For the usage of these datasets, please refer to here.

Pose Estimation

  1. Download the checkpoint for the pretrained model here and put it into a folder, e.g. ./checkpoint/checkpoint.pth.tar.

  2. Create a folder for result saving, e.g. ./saved_results.

  3. Open ./scripts/ Make sure --data-dir, --resume and --save-result-dir match with the folder where you put the datasets, the pre-train model and the saved result in, respectively. For example, --data-dir ../data/test_20181024 --resume ../checkpoint/checkpoint.pth.tar --save-result-dir ../saved_results

  4. cd ./scripts then run sh and you can see the accuracy on the real lab dataset.

The output you should expect to see:

=> creating model 'hg', stacks=2, blocks=1
=> loading checkpoint '../checkpoint/checkpoint.pth.tar'
=> loaded checkpoint '../checkpoint/checkpoint.pth.tar' (epoch 30)
    Total params: 6.73M
No. images of dataset 1 : 428
merging 1 datasets, total No. images: 428
No. minibatches in validation set:72

Evaluation only
Processing |################################| (72/72) Data: 0.000000s | Batch: 0.958s | Total: 0:01:08 | ETA: 0:00:01 | Loss: 0.0009 | Acc:  0.9946

As you can see, the overall accuracy on the lab dataset is 99.46% under the PCK@0.2 metric.

Other shell scripts you may want to try:

  • and train a model from scratch with synthetic dataset only and with multiple datasets, respectively.
  • evaluate model on synthetic dataset
  • val_arm_reall_with_3D: evaluate model on synthetic dataset, giving both 2D and 3D output.
  • and evaluate model on youtube dataset, with all keypoints and only visible keypoints, respectively.

Dependencies: pytorch with version 0.4.1 or higher, OpenCV

The 2D pose estimation module is developed based on pytorch-pose.

Data Generation from Simulator

Download the binary for Windows or Linux (tested in Ubuntu 16.04).

Unzip and run ./LinuxNoEditor/

Run the following script to generate images and ground truth

pip install unrealcv imageio
cd ./data_generation

Generated data are saved in ./data/new_data by default. You can visualize the groundtruth with the script ./data_generation/

Control System

The control module of CRAVES is hosted in another repo,

Please see this repo for hardware drivers, pose estimator, a PID-like controller, and a RL-based controller.


If you found CRAVES useful, please consider citing:

  title={CRAVES: Controlling Robotic Arm with a Vision-based, Economic System},
  author={Zuo, Yiming and Qiu, Weichao and Xie, Lingxi and Zhong, Fangwei and Wang, Yizhou and Yuille, Alan L},


If you have any question or suggestions, please open an issue in this repo. Thanks.

Disclaimer: authors are a group of scientists working on computer vision research. They are not associated with the company manufactures this arm. If you have a better hardware to recommend, or want to apply this technique to your arm, please contact us.


CRAVES: Controlling Robotic Arm with a Vision-based, Economic System







No releases published



Contributors 4