Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Source for "Improving Robot Success Detection using Static Object Data." Rosario Scalise, Jesse Thomason, Yonatan Bisk, and Siddhartha Srinivasa. IROS 2019.

Supplementary Material

Supplementary Material


Sensor streams and robot trajectory rosbag files are available on request. These files total around 100GB.

Success Detection Models

This README walks you through:

  • extracting processed RGBD features from before / after dropping behavior executions;
  • extracting vector representations of RGB, depth, language, and visual information based on RGBD, referring expression, and YCB static image data about objects; and
  • training and evaluating end-to-end networks that take in those vector representations to classify "in"/"on" success labels between pairs of objects.

To clone this repo and download requirements, run:

git clone
cd YCBLanguage/
pip install -r requirements.txt

Downloading Data.

Static images (with preprocessed ResNet features already extracted) and downsampled RGB-D trial data can be downloaded at:

Static YCB Object Images

Move this imgs directory to data/imgs/.

RGB-D trial data

You'll need to place the RGB-D trial data in its own directory with no other JSON files.

Additionally, you will need a local copy of GloVe embedding vectors.

GloVe Vectors

For our experiments, we used glove.6B.300d.txt, 6 billion tokens of coverage with 300 dimensional embeddings.

Generating RGBD features.

To extract processed RGBD features from pre- and post-manipulation scans:

cd scripts
python --splits_infile ../data/3class_object_split.json --features_indir [your path to RGB-D trial data directory] --rgbd_features_outfile [RGB-D features json]

The key output here is --rgbd_features_outfile where the RGB-D features are stored for later processing.

Generating input/output training and evaluation data.

Next, prep all modalities' vectors using:

python --infile ../data/3class_object_split.json --metadata_infile ../data/all_data.json --glove_infile [your GloVe embeddings txt] --robot_infile [RGB-D features json] --exec_robo_indir ../src/robo_exec_gt/ --exec_human_indir ../src/human_exec_gt/ --rgbd_only 1 --out_fn [path for vectors]/rgbd_only_dev

--out_fn is the prefix for .on and .in JSON dumps of feature vectors for each modality: RGB, D, Vision, and Language, plus the target labels for the task, for each pair of objects. Labels are calculated for mturk, robo, and human execution, where applicable.

Note here that we're limiting the output to rgbd_only data: recording features only for the subset of Robot Pairs that have RGB-D trial data for from the robot. We can also get information for All Pairs, but with RGB and D feature vectors set to None, using:

python --infile ../data/3class_object_split.json --metadata_infile ../data/all_data.json --glove_infile [your GloVe embeddings txt] --robot_infile [RGB-D features json] --exec_robo_indir ../src/robo_exec_gt/ --exec_human_indir ../src/human_exec_gt/ --rgbd_only 0 --out_fn [path for vectors]/all_dev

If the optional test flag is set in these commands, the test set will be used instead of dev, e.g.,

python --infile ../data/3class_object_split.json --metadata_infile ../data/all_data.json --glove_infile [your GloVe embeddings txt] --robot_infile [RGB-D features json] --exec_robo_indir ../src/robo_exec_gt/ --exec_human_indir ../src/human_exec_gt/ --rgbd_only 1 --test 1 --out_fn [path for vectors]/rgbd_only_test
python --infile ../data/3class_object_split.json --metadata_infile ../data/all_data.json --glove_infile [your GloVe embeddings txt] --robot_infile [RGB-D features json] --exec_robo_indir ../src/robo_exec_gt/ --exec_human_indir ../src/human_exec_gt/ --rgbd_only 0 --test 1 --out_fn [path for vectors]/all_test

Training and evaluating models.

Run the majority class baseline, egocentric, and egocentric+object data models with:

python --models mc,rgbd,rgbd+glove+resnet --round_m_to_n 1 --outdir [robot_pairs_trained_models] --input [path for vectors]/rgbd_only_dev --train_objective robo --test_objective robo --random_restarts 1,2,3,4,5,6,7,8,9,10

Note that this loads and evaluates on the dev fold, and uses robo target labels from robot trials.

All results in our submission use 10 random restarts with the given seeds. Omit this flag to run a single instance of the given models with a random seed.

To run the egocentric+(pretrained) object data model, first pretrain the object data layers on All Pairs with target labels from mturk annotations, then load the pretrained weights for training and evaluating against robo labels:

python --models glove+resnet --round_m_to_n 1 --outdir [all_pairs_trained_models] --input [path for vectors]/all_dev --train_objective mturk --test_objective mturk --random_restarts 1,2,3,4,5,6,7,8,9,10
python --models rgbd+glove+resnet --round_m_to_n 1 --outdir [robot_pairs_trained_models] --input [path for vectors]/rgbd_only_dev --train_objective robo --test_objective robo --lv_pretrained_dir [all_pairs_trained_models] --random_restarts 1,2,3,4,5,6,7,8,9,10

This trains a separate model for each set of weights found in the [all_pairs_trained_models] directory.

Viewing contrastive examples.

To generate a summary of pairs for which two trained models disagree, record their predictions during training and evaluation with the optional predictions_outfile flag, and subsequently use to generate the summary

python --models rgbd,rgbd+glove+resnet --round_m_to_n 1 --outdir [robot_pairs_trained_models] --input [path for vectors]/rgbd_only_dev --train_objective robo --test_objective robo --lv_pretrained_dir [all_pairs_trained_models] --random_restarts 1,2,3,4,5,6,7,8,9,10 --predictions_outfile [dev predictions json]
cd analysis
python --predictions_infile [dev predictions json] --metadata_infile ../../data/all_data.json --torch_infile [path for vectors]/rgbd_only_dev --use_highest_acc --splits_infile ../../data/3class_object_split.json


Success detection for placing objects in and on one another informed by language and vision priors.






No releases published


No packages published