NeurIPS 2024 Datasets and Benchmarks
Yunong Liu1, Cristobal Eyzaguirre1, Manling Li1, Shubh Khanna1, Juan Carlos Niebles1, Vineeth Ravi2, Saumitra Mishra2, Weiyu Liu1*, Jiajun Wu1*
1Stanford University 2J.P. Morgan AI Research
*Equal advising
[Project Website] [Paper] [Dataset Setup Guide] [Notebook]
The IKEA-Manuals-at-Work dataset provides detailed annotations for aligning 3D models, instructional manuals, and real-world assembly videos. This is the first dataset to provide 4D grounding of assembly instructions on Internet videos, offering high-quality, spatial-temporal alignments between assembly instructions, 3D models, and real-world internet videos.
- 🪑 36 furniture models from 6 categories
- 🎥 98 assembly videos from the Internet
- 🔄 Dense spatio-temporal alignments between instructions and videos
- 📊 Rich annotations including part segmentation, 6D poses, and temporal alignments
# Create and activate conda environment
conda create -n IKEAVideo python=3.8
conda activate IKEAVideo
# Install dependencies
pip install -r requirements.txt
# Set PYTHONPATH
export PYTHONPATH="./src:$PYTHONPATH"data/
├── data.json # Main annotation file
├── parts/ # 3D model files
├── manual_img/ # Instruction manual images
├── pdfs/ # Original PDF manuals
└── videos/ # Assembly videos
The dataset includes:
- 3D Models: Detailed 3D models of furniture parts
- Instruction Manuals: Step-by-step assembly instructions
- Assembly Videos: Real-world assembly videos from the Internet
- Rich Annotations:
- ⏱️ Temporal step alignments
- 🔄 Temporal substep alignments
- 🎯 2D-3D part correspondences
- 🎨 Part segmentations
- 📐 Part 6D poses
- 📷 Estimated camera parameters
For detailed information about the dataset, please refer to our datasheet.
- Download Required Files:
- Annotation file:
data/data.json - Assembly videos: Stanford Digital Repository
- Clone the repo to obtain other resources (e.g. 3D models, manual images)
- Place downloads in their respective directories as shown in Dataset Structure
- Explore the Dataset:
Check our tutorial notebook:
notebooks/data_viz.ipynb
The dataset supports various research directions:
- 🔍 Assembly plan generation
- 🎯 Part-conditioned segmentation
- 📐 Part-conditioned pose estimation
- 🎥 Video object segmentation
- 🛠️ Shape assembly with instruction videos
We also provide the annotation tools used to create the IKEA Manuals at Work dataset.
This repository contains two annotation interfaces:
- Main Annotation Interface - For creating 2D masks and initial 3D poses
- Pose Refinement Interface - For fine-tuning 3D poses after initial estimation
Please see the respective directories for setup and usage instructions:
You can use the provided render_part.py script (in this repo) to project mesh vertices onto 2D images. Ensure you have the correct intrinsic and extrinsic matrices.
No, annotations are provided only for parts being interacted with in each frame. This aligns with the manual’s structure and supports assembly plan generation.
For more details, check the paper or feel free to ask!
This dataset is released under the CC-BY-4.0 license.
If you find this dataset useful for your research, please cite:
@inproceedings{
liu2024ikea,
title={{IKEA} Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos},
author={Yunong Liu and Cristobal Eyzaguirre and Manling Li and Shubh Khanna and Juan Carlos Niebles and Vineeth Ravi and Saumitra Mishra and Weiyu Liu and Jiajun Wu},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2024}
}For questions and feedback:
- 📮 Open an issue on this GitHub repository
- 📧 Email Yunong Liu

