3D Machine Learning
In recent years, tremendous amount of progress is being made in the field of 3D Machine Learning, which is an interdisciplinary field that fuses computer vision, computer graphics and machine learning. This repo is derived from my study notes and will be used as a place for triaging new research papers.
I'll use the following icons to differentiate 3D representations:
📷Multi-view Images 👾Volumetric 🎲Point Cloud 💎Polygonal Mesh 💊Primitive-based
To contribute to this Repo, you may add content through pull requests or open an issue to let me know.
We have also created a Slack workplace for people around the globe to ask questions, share knowledge and facilitate collaborations. Together, I'm sure we can advance this field as a collaborative effort. Join the community with this link.
Table of Contents
- 3D Pose Estimation
- Single Object Classification
- Multiple Objects Detection
- Scene/Object Semantic Segmentation
- 3D Geometry Synthesis/Reconstruction
- Texture/Material Analysis and Synthesis
- Style Learning and Transfer
- Scene Synthesis/Reconstruction
- Scene Understanding
To see a survey of RGBD datasets, check out Michael Firman's collection as well as the associated paper, RGBD Datasets: Past, Present and Future. Point Cloud Library also has a good dataset catalogue.
Dataset for IKEA 3D models and aligned images (2013) [Link]
759 images and 219 models including Sketchup (skp) and Wavefront (obj) files, good for pose estimation.
Open Surfaces: A Richly Annotated Catalog of Surface Appearance (SIGGRAPH 2013) [Link]
OpenSurfaces is a large database of annotated surfaces created from real-world consumer photographs. Our annotation framework draws on crowdsourcing to segment surfaces from photos, and then annotate them with rich surface properties, including material, texture and contextual information.
PASCAL3D+ (2014) [Link]
12 categories, on average 3k+ objects per category, for 3D object detection and pose estimation.
ModelNet (2015) [Link]
127915 3D CAD models from 662 categories
ModelNet10: 4899 models from 10 categories
ModelNet40: 12311 models from 40 categories, all are uniformly orientated
A Large Dataset of Object Scans (2016) [Link]
10K scans in RGBD + reconstructed 3D models in .PLY format.
ObjectNet3D: A Large Scale Database for 3D Object Recognition (2016) [Link]
100 categories, 90,127 images, 201,888 objects in these images and 44,147 3D shapes.
Tasks: region proposal generation, 2D object detection, joint 2D detection and 3D object pose estimation, and image-based 3D shape retrieval
Thingi10K: A Dataset of 10,000 3D-Printing Models (2016) [Link]
10,000 models from featured “things” on thingiverse.com, suitable for testing 3D printing techniques such as structural analysis , shape optimization, or solid geometry operations.
ABC: A Big CAD Model Dataset For Geometric Deep Learning [Link][Paper]
This work introduce a dataset for geometric deep learning consisting of over 1 million individual (and high quality) geometric models, each associated with accurate ground truth information on the decomposition into patches, explicit sharp feature annotations, and analytic differential properties.
SUNRGB-D 3D Object Detection Challenge [Link]
19 object categories for predicting a 3D bounding box in real world dimension
Training set: 10,355 RGB-D scene images, Testing set: 2860 RGB-D images
SceneNN (2016) [Link]
100+ indoor scene meshes with per-vertex and per-pixel annotation.
ScanNet (2017) [Link]
An RGB-D video dataset containing 2.5 million views in more than 1500 scans, annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations.
Matterport3D: Learning from RGB-D Data in Indoor Environments (2017) [Link]
10,800 panoramic views (in both RGB and depth) from 194,400 RGB-D images of 90 building-scale scenes of private rooms. Instance-level semantic segmentations are provided for region (living room, kitchen) and object (sofa, TV) categories.
SUNCG: A Large 3D Model Repository for Indoor Scenes (2017) [Link]
The dataset contains over 45K different scenes with manually created realistic room and furniture layouts. All of the scenes are semantically annotated at the object level.
MINOS: Multimodal Indoor Simulator (2017) [Link]
MINOS is a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments. MINOS leverages large datasets of complex 3D environments and supports flexible configuration of multimodal sensor suites. MINOS supports SUNCG and Matterport3D scenes.
Facebook House3D: A Rich and Realistic 3D Environment (2017) [Link]
House3D is a virtual 3D environment which consists of 45K indoor scenes equipped with a diverse set of scene types, layouts and objects sourced from the SUNCG dataset. All 3D objects are fully annotated with category labels. Agents in the environment have access to observations of multiple modalities, including RGB images, depth, segmentation masks and top-down 2D map views.
HoME: a Household Multimodal Environment (2017) [Link]
HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning.
AI2-THOR: Photorealistic Interactive Environments for AI Agents [Link]
AI2-THOR is a photo-realistic interactable framework for AI agents. There are a total 120 scenes in version 1.0 of the THOR environment covering four different room categories: kitchens, living rooms, bedrooms, and bathrooms. Each room has a number of actionable objects.
Gibson Environment: Real-World Perception for Embodied Agents (2018 CVPR) [Link]
This platform provides RGB from 1000 point clouds, as well as multimodal sensor data: surface normal, depth, and for a fraction of the spaces, semantics object annotations. The environment is also RL ready with physics integrated. Using such datasets can further narrow down the discrepency between virtual environment and real world.
InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset [Link]
System Overview: an end-to-end pipeline to render an RGB-D-inertial benchmark for large scale interior scene understanding and mapping. Our dataset contains 20M images created by pipeline: (A) We collect around 1 million CAD models provided by world-leading furniture manufacturers. These models have been used in the real-world production. (B) Based on those models, around 1,100 professional designers create around 22 million interior layouts. Most of such layouts have been used in real-world decorations. (C) For each layout, we generate a number of configurations to represent different random lightings and simulation of scene change over time in daily life. (D) We provide an interactive simulator (ViSim) to help for creating ground truth IMU, events, as well as monocular or stereo camera trajectories including hand-drawn, random walking and neural network based realistic trajectory. (E) All supported image sequences and ground truth.
Large-Scale Point Cloud Classification Benchmark, which provides a large labelled 3D point cloud data set of natural scenes with over 4 billion points in total, and also covers a range of diverse urban scenes.
3D Pose Estimation
Viewpoints and Keypoints (2015) [Paper]
Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views (2015 ICCV) [Paper]
PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization (2015) [Paper]
Modeling Uncertainty in Deep Learning for Camera Relocalization (2016) [Paper]
Robust camera pose estimation by viewpoint classification using deep learning (2016) [Paper]
Geometric loss functions for camera pose regression with deep learning (2017 CVPR) [Paper]
Generic 3D Representation via Pose Estimation and Matching (2017) [Paper]
3D Bounding Box Estimation Using Deep Learning and Geometry (2017) [Paper]
6-DoF Object Pose from Semantic Keypoints (2017) [Paper]
Relative Camera Pose Estimation Using Convolutional Neural Networks (2017) [Paper]
3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions (2017) [Paper]
Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction (2018 CVPR) [Paper]
PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes (2018) [Paper]
Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images (2018 CVPR) [Paper]
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling (2018 CVPR) [Paper]
3D Pose Estimation and 3D Model Retrieval for Objects in the Wild (2018 CVPR) [Paper]
Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects (2018) [Paper]
Single Object Classification
Multiple Objects Detection
Object Detection in 3D Scenes Using CNNs in Multi-view Images (2016) [Paper]
DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding (2016) [Paper]
SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite (2017) [Paper]
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection (2017) [Paper]
Frustum PointNets for 3D Object Detection from RGB-D Data (CVPR2018) [Paper]
A^2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes (AAAI2019) [Paper]
Stereo R-CNN based 3D Object Detection for Autonomous Driving (CVPR2019) [Paper]
Scene/Object Semantic Segmentation
Unsupervised Co-Segmentation of a Set of Shapes via Descriptor-Space Spectral Clustering (2011) [Paper]
Learning Hierarchical Shape Segmentation and Labeling from Online Repositories (2017) [Paper]
3D Model Synthesis/Reconstruction
Parametric Morphable Model-based methods
The Space of Human Body Shapes: Reconstruction and Parameterization from Range Scans (2003) [Paper]
Category-Specific Object Reconstruction from a Single Image (2014) [Paper]
Part-based Template Learning methods
Model Composition from Interchangeable Components (2007) [Paper]
Data-Driven Suggestions for Creativity Support in 3D Modeling (2010) [Paper]
Photo-Inspired Model-Driven 3D Object Modeling (2011) [Paper]
Probabilistic Reasoning for Assembly-Based 3D Modeling (2011) [Paper]
A Probabilistic Model for Component-Based Shape Synthesis (2012) [Paper]
Structure Recovery by Part Assembly (2012) [Paper]
Fit and Diverse: Set Evolution for Inspiring 3D Shape Galleries (2012) [Paper]
AttribIt: Content Creation with Semantic Attributes (2013) [Paper]
Learning Part-based Templates from Large Collections of 3D Shapes (2013) [Paper]
Topology-Varying 3D Shape Creation via Structural Blending (2014) [Paper]
Estimating Image Depth using Shape Collections (2014) [Paper]
Single-View Reconstruction via Joint Analysis of Image and Shape Collections (2015) [Paper]
Interchangeable Components for Hands-On Assembly Based Modeling (2016) [Paper]
Shape Completion from a Single RGBD Image (2016) [Paper]
Deep Learning Methods
Texture/Material Analysis and Synthesis
Two-Shot SVBRDF Capture for Stationary Materials (SIGGRAPH 2015) [Paper]
Reflectance Modeling by Neural Texture Synthesis (2016) [Paper]
Modeling Surface Appearance from a Single Photograph using Self-augmented Convolutional Neural Networks (2017) [Paper]
High-Resolution Multi-Scale Neural Texture Synthesis (2017) [Paper]
Reflectance and Natural Illumination from Single Material Specular Objects Using Deep Learning (2017) [Paper]
Joint Material and Illumination Estimation from Photo Sets in the Wild (2017) [Paper]
JWhat Is Around The Camera? (2017) [Paper]
TextureGAN: Controlling Deep Image Synthesis with Texture Patches (2018 CVPR) [Paper]
Gaussian Material Synthesis (2018 SIGGRAPH) [Paper]
Non-stationary Texture Synthesis by Adversarial Expansion (2018 SIGGRAPH) [Paper]
Synthesized Texture Quality Assessment via Multi-scale Spatial and Statistical Texture Attributes of Image and Gradient Magnitude Coefficients (2018 CVPR) [Paper]
LIME: Live Intrinsic Material Estimation (2018 CVPR) [Paper]
Single-Image SVBRDF Capture with a Rendering-Aware Deep Network (2018) [Paper]
PhotoShape: Photorealistic Materials for Large-Scale Shape Collections (2018) [Paper]
Learning Material-Aware Local Descriptors for 3D Shapes (2018) [Paper]
FrankenGAN: Guided Detail Synthesis for Building Mass Models using Style-Synchonized GANs (2018 SIGGRAPH Asia) [Paper]
Style Learning and Transfer
Design Preserving Garment Transfer (2012) [Paper]
Analogy-Driven 3D Style Transfer (2014) [Paper]
Unsupervised Texture Transfer from Images to Model Collections (2016) [Paper]
Learning Detail Transfer based on Geometric Features (2017) [Paper]
Co-Locating Style-Defining Elements on 3D Shapes (2017) [Paper]
Appearance Modeling via Proxy-to-Image Alignment (2018) [Paper]
Interactive Furniture Layout Using Interior Design Guidelines (2011) [Paper]
Synthesizing Open Worlds with Constraints using Locally Annealed Reversible Jump MCMC (2012) [Paper]
Example-based Synthesis of 3D Object Arrangements (2012 SIGGRAPH Asia) [Paper]
Sketch2Scene: Sketch-based Co-retrieval and Co-placement of 3D Models (2013) [Paper]
Action-Driven 3D Indoor Scene Evolution (2016) [Paper]
The Clutterpalette: An Interactive Tool for Detailing Indoor Scenes (2015) [Paper]
Relationship Templates for Creating Scene Variations (2016) [Paper]
IM2CAD (2017) [Paper]
Predicting Complete 3D Models of Indoor Scenes (2017) [Paper]
Complete 3D Scene Parsing from Single RGBD Image (2017) [Paper]
Fully Convolutional Refined Auto-Encoding Generative Adversarial Networks for 3D Multi Object Scenes (2017) [Blog]
Adaptive Synthesis of Indoor Scenes via Activity-Associated Object Relation Graphs (2017 SIGGRAPH Asia) [Paper]
Automated Interior Design Using a Genetic Algorithm (2017) [Paper]
SceneSuggest: Context-driven 3D Scene Design (2017) [Paper]
A fully end-to-end deep learning approach for real-time simultaneous 3D reconstruction and material recognition (2017) [Paper]
Deep Convolutional Priors for Indoor Scene Synthesis (2018) [Paper]
Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth using Stochastic Grammars (2018) [Paper]
Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image (ECCV 2018) [Paper]
Language-Driven Synthesis of 3D Scenes from Scene Databases (SIGGRAPH Asia 2018) [Paper]
Deep Generative Modeling for Scene Synthesis via Hybrid Representations (2018) [Paper]
GRAINS: Generative Recursive Autoencoders for INdoor Scenes (2018) [Paper]
SEETHROUGH: Finding Objects in Heavily Occluded Indoor Scene Images (2018) [Paper]
Characterizing Structural Relationships in Scenes Using Graph Kernels (2011 SIGGRAPH) [Paper]
Understanding Indoor Scenes Using 3D Geometric Phrases (2013) [Paper]
Organizing Heterogeneous Scene Collections through Contextual Focal Points (2014 SIGGRAPH) [Paper]
SceneGrok: Inferring Action Maps in 3D Environments (2014, SIGGRAPH) [Paper]
PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding (2014) [Paper]
Learning Informative Edge Maps for Indoor Scene Layout Prediction (2015) [Paper]
Rent3D: Floor-Plan Priors for Monocular Layout Estimation (2015) [Paper]
A Coarse-to-Fine Indoor Layout Estimation (CFILE) Method (2016) [Paper]
DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes (2016) [Paper]
Deep Multi-Modal Image Correspondence Learning (2016) [Paper]
RoomNet: End-to-End Room Layout Estimation (2017) [Paper]
SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite (2017) [Paper]
Cross-Domain Self-supervised Multi-task Feature Learning using Synthetic Imagery (2018 CVPR) [Paper]
Pano2CAD: Room Layout From A Single Panorama Image (2018 CVPR) [Paper]
Automatic 3D Indoor Scene Modeling from Single Panorama (2018 CVPR) [Paper]