Project Page | Video | Paper
In IJCV 2023
Shangzhe Wu*, Tomas Jakab*, Christian Rupprecht, Andrea Vedaldi (*equal contribution)
Visual Geometry Group, University of Oxford
DOVE - Deformable Objects from VidEos. Given a collection of video clips of an object category as training data, we learn a model that predicts a textured, articulated 3D mesh from a single image of the object.
Setup (with conda)
conda env create -f environment.yml
or manually:
conda install -c conda-forge matplotlib=3.3.1 opencv=3.4.2 scikit-image=0.17.2 pyyaml=5.4.1 tensorboard=2.7.0 trimesh=3.9.35 configargparse=1.2.3 einops=0.3.2 moviepy=1.0.1
2. Install PyTorch
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
Note: The code is tested with PyTorch 1.6.0 and CUDA 10.1.
3. Install PyTorch3D
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install -c bottler nvidiacub
conda install -c pytorch3d pytorch3d=0.3.0
or follow the instructions. The code is tested with PyTorch3D 0.3.0.
4. Install LPIPS (for computing perceptual loss)
pip install lpips
The preprocessed datasets can be downloaded using the scripts in data/
cd data
The toy_birds
dataset consists of 3D scans and real photos of 23 toy birds, which are preprocessed and used for 3D evaluation. toy_birds_raw
contains all the raw captures.
The pretrained models on birds and horses can be downloaded using the scripts in results/
, eg:
cd results/bird && sh
cd results/horse && sh
Check the configuration files in config/
and run, eg:
python --config configs/bird/train_bird.yml --gpu 0 --num_workers 4
python --config configs/bird/test_bird.yml --gpu 0 --num_workers 4
After generating the results on the bird test set (using config/bird/test_bird.yml
), check the directories and run:
python scripts/
After generating the results on the bird test set (using config/bird/test_bird_toy.yml
), check the directories and run:
python scripts/
Note: The canonical pose may be facing either towards or away from the camera, as both are valid solutions. The current script assumes the canonical pose is facing away from the camera, hence the line 157 which rotates the mesh 180Β° to roughly align with the ground-truth scans. You might need to inspect the results and adjust accordiningly.
After generating the test results, check the directories and run:
python scripts/
There are multiple modes of visualization specified by render_mode
, including novel views, rotations and animations. Check the script for details.
title = {{DOVE}: Learning Deformable 3D Objects by Watching Videos},
author = {Shangzhe Wu and Tomas Jakab and Christian Rupprecht and Andrea Vedaldi},
journal = {IJCV},
year = {2023}