This is a curated list of "3D Vision and Language" research which is maintained by haonan. Watch this repository for the latest updates!
- 3D Multimodal Pretraining
- 3D Visual Understanding
- 3D Dense Captioning
- 3D Cross-Modal Retrieval
- 3D Visual QA & Reasoning
- 3D Visual Grounding
- Multi-view Learning
- Context-aware Alignment and Mutual Masking for 3D-Language Pre-training [CVPR 2023]
Zhao Jin1, Munawar Hayat2, Yuwei Yang1, Yulan Guo3, Yinjie Lei1†
1Sichuan University, 2Monash University, 3Sun Yat-sen University
-
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding [CVPR 2023]
Le Xue1†, Mingfei Gao1, Chen Xing1, Roberto Mart´ın-Mart´ın1,2, Jiajun Wu3, Caiming Xiong1, Ran Xu1, Juan Carlos Niebles1, Silvio Savarese1
1Salesforce Research, Palo Alto, USA, 2UT Austin, Texas, USA, 3Stanford University, Stanford, USA -
ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding [Arxiv 2023]
Le Xue1†, Ning Yu1, Shu Zhang1, Junnan Li1, Roberto Martín-Martín3, Jiajun Wu2, Caiming Xiong1, Ran Xu1, Juan Carlos > Niebles1,2, Silvio Savarese1,2
1Salesforce AI, 2Stanford University, 3The University of Texas at Austin -
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding [CVPR 2023]
Runyu Ding1*, Jihan Yang1*, Chuhui Xue2, Wenqing Zhang2, Song Bai2†, Xiaojuan Qi1†
1The University of Hong Kong, 2ByteDance -
Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning? [ICLR 2023]
Runpei Dong1, Zekun Qi1, Linfeng Zhang2, Junbo Zhang2, Jianjian Sun3, Zheng Ge4, Li Yi245†, Kaisheng Ma2†
1Xi’an Jiaotong University, 2Tsinghua University, 3MEGVII Technology, 4Shanghai Artificial Intelligence Laboratory, 5Shanghai Qi Zhi Institute
-
End-to-End 3D Dense Captioning with Vote2Cap-DETR [CVPR 2023]
Sijin Chen1*, Hongyuan Zhu2, Xin Chen3, Yinjie Lei4, Tao Chen1†, Gang YU3
1Fudan University, 2Institute for Infocomm Research, A*STAR, 3Tencent PCG, 4Sichuan University -
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans [CVPR 2021]
Dave Zhenyu Chen1, Ali Gholami22, Matthias Nießner1, Angel X. Chang2
1Technical University of Munich, 2Simon Fraser University
- RONO: Robust Discriminative Learning with Noisy Labels for 2D-3D Cross-Modal Retrieval [CVPR 2023]
Yanglin Feng1, Hongyuan Zhu2, Dezhong Peng1,3,4, Xi Peng1</sup, Peng Hu1†
1College of Computer Science, Sichuan University, 2Institute for Infocomm Research (I2R), A*STAR, 3Sichuan Zhiqian Technology, 4Chengdu Ruibei Yingte Information Technology
- 3D Concept Learning and Reasoning from Multi-View Images [CVPR 2023]
Yining Hong1, Chunru Lin2, Yilun Du3, Zhenfang Chen5, Joshua B. Tenenbaum3, Chuang Gan4,5
1UCLA, 2Shanghai Jiaotong University, 3MIT CSAIL, 4UMass Amherst, 5MIT-IBM Watson AI Lab
- EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding [CVPR 2023]
Yanmin Wu1, Xinhua Cheng1, Renrui Zhang2,3, Zesen Cheng1, Jian Zhang1†
1Shenzhen Graduate School, Peking University, 2The Chinese University of Hong Kong, 3Shanghai AI Laboratory
-
Investigating and Mitigating the Side Effects of Noisy Views in Multi-view Clustering in Practical Scenarios [Arxiv 2023]
Jie Xu, Gang Niu, Xiaolong Wang, Yazhou Ren, Lei Feng, Xiaoshuang Shi, Heng Tao Shen, Xiaofeng Zhu -
Deep Incomplete Multi-View Clustering via Mining Cluster Complementarity [AAAI 2022]
Jie Xu1, Chao Li1, Yazhou Ren1†, Liang Peng1, Yujie Mo1, Xiaoshuang Shi1†, Xiaofeng Zhu12
1UESTC, 2Shenzhen Institute for Advanced Study, UESTC -
Multi-level Feature Learning for Contrastive Multi-view Clustering [CVPR 2022]
Jie Xu1†, Huayi Tang1†, Yazhou Ren1, Liang Peng1, Xiaofeng Zhu12, Lifang He3
1UESTC, 2Shenzhen Institute for Advanced Study, UESTC, 3Lehigh University -
Multi-VAE: Learning Disentangled View-common and View-peculiar Visual Representations for Multi-view Clustering [ICCV 2022]
Jie Xu1, Yazhou Ren1†, Huayi Tang1, Xiaorong Pu1, Xiaofeng Zhu1, Ming Zeng2, Lifang He3
1UESTC, 2CMU, 3Lehigh University