Skip to content

vgthengane/Awesome-Multimodal-in-3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Awesome-Multimodal-in-3D

A comprehensive surevy on Multimodal Models in 3D

Table of Contents


Generation

Title arXiv Github WebSite Pub. & Date
CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language arXiv nan nan 2022
SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation arXiv nan nan 2023
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation arXiv nan nan 2023
Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models arXiv nan nan 2023
Multimodal 3D Hand Pose Enhancement for Sign Language arXiv nan nan 2022
Text and Image Guided 3D Avatar Generation and Manipulation arXiv nan nan 2022
TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models arXiv nan nan 2023
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models arXiv nan nan 2023
Guide3D: Create 3D Avatars from Text and Image Guidance arXiv nan nan 2023
High-Fidelity Generalized Emotional Talking Face Generation with Multi-Modal Emotion Space Learning arXiv nan nan 2023
MRIS: A Multi-modal Retrieval Approach for Image Synthesis on Diverse Modalities arXiv nan nan 2023
Text2Tex: Text-driven Texture Synthesis via Diffusion Models arXiv nan nan 2023
Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation arXiv nan nan 2023
Text2Light: Zero-Shot Text-Driven HDR Panorama Generation arXiv nan nan 2023
Text-to-3D using Gaussian Splatting arXiv nan nan 2023
Autoregressive 3D Shape Generation via Canonical Mapping arXiv nan nan 2022
SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation arXiv nan nan 2023
Efficient Text-Guided 3D-Aware Portrait Generation with Score Distillation Sampling on Distribution arXiv nan nan 2023
Cross-Modal 3D Shape Generation and Manipulation arXiv nan nan 2022
ITportrait: Image-Text Coupled 3D Portrait Domain Adaptation arXiv nan nan 2023
FaceFormer: Speech-Driven 3D Facial Animation with Transformers arXiv nan nan 2022
Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints arXiv nan nan 2023
Text-guided 3D Human Generation from 2D Collections arXiv nan nan 2023
TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration arXiv nan nan 2023
Semantify: Simplifying the Control of 3D Morphable Models using CLIP arXiv nan nan 2023
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following arXiv nan nan 2023
Zero3D: Semantic-Driven Multi-Category 3D Shape Generation arXiv nan nan 2023
HeadSculpt: Crafting 3D Head Avatars with Text arXiv nan nan 2023
AI-enabled Automatic Multimodal Fusion of Cone-Beam CT and Intraoral Scans for Intelligent 3D Tooth-Bone Reconstruction and Clinical Applications arXiv nan nan 2022
T$^3$Bench: Benchmarking Current Progress in Text-to-3D Generation arXiv nan nan 2023
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars arXiv nan nan 2022
ArK: Augmented Reality with Knowledge Interactive Emergent Ability arXiv nan nan 2023
AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion arXiv nan nan 2023
SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification arXiv nan nan 2023
TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields arXiv nan nan 2023
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation arXiv nan nan 2023
DreamWaltz: Make a Scene with Complex 3D Animatable Avatars arXiv nan nan 2023
Zero-Shot Text-Guided Object Generation with Dream Fields arXiv nan nan 2022
ClipMatrix: Text-controlled Creation of 3D Textured Meshes arXiv nan nan 2021
AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control arXiv nan nan 2023
3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation arXiv nan nan 2023
CG-NeRF: Conditional Generative Neural Radiance Fields for 3D-aware Image Synthesis arXiv nan nan 2023
Shap-E: Generating Conditional 3D Implicit Functions arXiv nan nan 2023
Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion arXiv nan nan 2023
LERF: Language Embedded Radiance Fields arXiv nan nan 2023
CLIP-Mesh: Generating textured meshes from text using pretrained image-text models arXiv nan nan 2022
Chupa: Carving 3D Clothed Humans from Skinned Shape Priors using 2D Diffusion Probabilistic Models arXiv nan nan 2023
DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model arXiv nan nan 2023
PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain Gap Using Pose-Preserved Text-to-Image Diffusion arXiv nan nan 2023
MPE4G: Multimodal Pretrained Encoder for Co-Speech Gesture Generation arXiv nan nan 2023
Decomposing NeRF for Editing via Feature Field Distillation arXiv nan nan 2022
DreamHuman: Animatable 3D Avatars from Text arXiv nan nan 2023
SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation arXiv nan nan 2023
Image-free Domain Generalization via CLIP for 3D Hand Pose Estimation arXiv nan nan 2023
3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models arXiv nan nan 2023
Audio2Gestures: Generating Diverse Gestures from Audio arXiv nan nan 2023
Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders arXiv nan nan 2021
AI Choreographer: Music Conditioned 3D Dance Generation with AIST++ arXiv nan nan 2021
3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion arXiv nan nan 2023
3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process arXiv nan nan 2023
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis arXiv nan nan 2023
Magic3D: High-Resolution Text-to-3D Content Creation arXiv nan nan 2023
CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding arXiv nan nan 2023
3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows arXiv nan nan 2023
StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects arXiv nan nan 2023
DreamStone: Image as a Stepping Stone for Text-Guided 3D Shape Generation arXiv nan nan 2023
ISS: Image as Stepping Stone for Text-Guided 3D Shape Generation arXiv nan nan 2023
ATT3D: Amortized Text-to-3D Object Synthesis arXiv nan nan 2023
Learning Versatile 3D Shape Generation with Improved AR Models arXiv nan nan 2023
Revisiting Transformer for Point Cloud-based 3D Scene Graph Generation arXiv nan nan 2023

Pretraining

Title arXiv Github WebSite Pub. & Date
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding arXiv nan nan 2022
Grounding Pretrained Features in 3D Representations arXiv nan nan 2023
MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts arXiv nan nan 2023
CrysMMNet: Multimodal Representation for Crystal Property Prediction arXiv nan nan 2023
Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training arXiv nan nan 2023
CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images arXiv nan nan 2023
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition arXiv nan nan 2023
3D Concept Learning and Reasoning from Multi-View Images arXiv nan nan 2023
3D-LLM: Injecting the 3D World into Large Language Models arXiv nan nan 2023
Pri3D: Can 3D Priors Help 2D Representation Learning? arXiv nan nan 2021
Joint Representation Learning for Text and 3D Point Cloud arXiv nan nan 2023
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training arXiv nan nan 2023
ConceptFusion: Open-set Multimodal 3D Mapping arXiv nan nan 2023
MotionGPT: Human Motion as a Foreign Language arXiv nan nan 2023
JM3D&JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues arXiv nan nan 2023
Context-aware Alignment and Mutual Masking for 3D-Language Pre-training arXiv nan nan 2023
Point Clouds Are Specialized Images: A Knowledge Transfer Approach for 3D Understanding arXiv nan nan 2023
ViT-Lens: Towards Omni-modal Representations arXiv nan nan 2023

Editing and Manupulation

Title arXiv Github WebSite Pub. & Date
ClipFace: Text-guided Editing of Textured 3D Morphable Models arXiv nan nan 2023
CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout arXiv nan nan 2023
Volumetric Disentanglement for 3D Scene Manipulation arXiv nan nan 2022
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions arXiv nan nan 2023
LADIS: Language Disentanglement for 3D Shape Editing arXiv nan nan 2022
Local 3D Editing via 3D Distillation of CLIP Knowledge arXiv nan nan 2023

detection

Title arXiv Github WebSite Pub. & Date
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers arXiv nan nan 2022
Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild arXiv nan nan 2023
HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection arXiv nan nan 2023
ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion arXiv nan nan 2023
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection arXiv nan nan 2023
FUTR3D: A Unified Sensor Fusion Framework for 3D Detection arXiv nan nan 2023
DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors arXiv nan nan 2022
FocalFormer3D : Focusing on Hard Instance for 3D Object Detection arXiv nan nan 2023
Focal Sparse Convolutional Networks for 3D Object Detection arXiv nan nan 2022
AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection arXiv nan nan 2022
BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection arXiv nan nan 2022
Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving arXiv nan nan 2023
Semantics-aware LiDAR-Only Pseudo Point Cloud Generation for 3D Object Detection arXiv nan nan 2023
A Generalized Multi-Modal Fusion Detection Framework arXiv nan nan 2023
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection arXiv nan nan 2021
DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras and Radars arXiv nan nan 2022
PointSee: Image Enhances Point Cloud arXiv nan nan 2022
FusionFormer: A Multi-sensory Fusion in Bird's-Eye-View and Temporal Consistent Transformer for 3D Object Detection arXiv nan nan 2023
Joint Multi-Object Detection and Tracking with Camera-LiDAR Fusion for Autonomous Driving arXiv nan nan 2021
Geometric-aware Pretraining for Vision-centric 3D Object Detection arXiv nan nan 2023
TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning arXiv nan nan 2022
FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration arXiv nan nan 2023
FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields arXiv nan nan 2023
Center Feature Fusion: Selective Multi-Sensor Fusion of Center-based Objects arXiv nan nan 2023
SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detection arXiv nan nan 2022
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection arXiv nan nan 2023
Paint and Distill: Boosting 3D Object Detection with Semantic Passing Network arXiv nan nan 2022
RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection arXiv nan nan 2023
X 3 KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection arXiv nan nan 2023
Boosting Unsupervised Domain Adaptation for 3D Object Detection in Point Clouds with 2D Image Semantic Information arXiv nan nan 2023
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion arXiv nan nan 2023
Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection arXiv nan nan 2022
Unifying Voxel-based Representation with Transformer for 3D Object Detection arXiv nan nan 2022
MoDAR: Using Motion Forecasting for 3D Object Detection in Point Cloud Sequences arXiv nan nan 2023
DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection arXiv nan nan 2022
AShapeFormer: Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection via Transformers arXiv nan nan 2023
Delving into the Pre-training Paradigm of Monocular 3D Object Detection arXiv nan nan 2022
MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection arXiv nan nan 2023
Cross-Modal Analysis of Human Detection for Robotics: An Industrial Case Study arXiv nan nan 2021
PAI3D: Painting Adaptive Instance-Prior for 3D Object Detection arXiv nan nan 2022
EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object Detection arXiv nan nan 2022
Multi-Modal 3D Object Detection by Box Matching arXiv nan nan 2023
Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning arXiv nan nan 2022
Open-Vocabulary Point-Cloud Object Detection without 3D Annotation arXiv nan nan 2023
Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation arXiv nan nan 2023

Segmentation

Title arXiv Github WebSite Pub. & Date
A Simple and Robust Framework for Cross-Modality Medical Image Segmentation applied to Vision Transformers arXiv nan nan 2023
MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic Segmentation arXiv nan nan 2023
Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation arXiv nan nan 2023
Exploiting the Complementarity of 2D and 3D Networks to Address Domain-Shift in 3D Semantic Segmentation arXiv nan nan 2023
Segment Anything in 3D with NeRFs arXiv nan nan 2023
Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs arXiv nan nan 2023
Optimal Latent Vector Alignment for Unsupervised Domain Adaptation in Medical Image Segmentation arXiv nan nan 2021
MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation arXiv nan nan 2023
Generative Text-Guided 3D Vision-Language Pretraining for Unified Medical Image Segmentation arXiv nan nan 2023
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs arXiv nan nan 2023
PointDC:Unsupervised Semantic Segmentation of 3D Point Clouds via Cross-modal Distillation and Super-Voxel Clustering arXiv nan nan 2023
UniDA3D: Unified Domain Adaptive 3D Semantic Segmentation Pipeline arXiv nan nan 2023
MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation arXiv nan nan 2023
Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images arXiv nan nan 2022
Prototype Adaption and Projection for Few- and Zero-Shot 3D Point Cloud Semantic Segmentation arXiv nan nan 2023
Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors arXiv nan nan 2023
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation arXiv nan nan 2023
Contrastive Learning for Self-Supervised Pre-Training of Point Cloud Segmentation Networks With Image Data arXiv nan nan 2023
Cross-modal Learning for Domain Adaptation in 3D Semantic Segmentation arXiv nan nan 2022
Revisiting Multi-modal 3D Semantic Segmentation in Real-world Autonomous Driving arXiv nan nan 2023
Language-guided Semantic Style Transfer of 3D Indoor Scenes arXiv nan nan 2022
2DDATA: 2D Detection Annotations Transmittable Aggregation for Semantic Segmentation on Point Cloud arXiv nan nan 2023
Auto-Prompting SAM for Mobile Friendly 3D Medical Image Segmentation arXiv nan nan 2023
MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving arXiv nan nan 2023
BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain Generalization of 3D Semantic Segmentation arXiv nan nan 2023
CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with Modality-Correlated Cross-Attention for Brain Tumor Segmentation arXiv nan nan 2022
Weakly Supervised 3D Open-vocabulary Segmentation arXiv nan nan 2023
PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models arXiv nan nan 2023
UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase arXiv nan nan 2023

Tracking

Title arXiv Github WebSite Pub. & Date
3D Multi-Object Tracking Using Graph Neural Networks with Cross-Edge Modality Attention arXiv nan nan 2022
LATTE: LAnguage Trajectory TransformEr arXiv nan nan 2022
3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking arXiv nan nan 2023
EagerMOT: 3D Multi-Object Tracking via Sensor Fusion arXiv nan nan 2021
MMF-Track: Multi-modal Multi-level Fusion for 3D Single Object Tracking arXiv nan nan 2023

Matching

Title arXiv Github WebSite Pub. & Date
Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching arXiv nan nan 2023

ov-detection

Title arXiv Github WebSite Pub. & Date
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection arXiv nan nan 2023

Anomaly Detection

Title arXiv Github WebSite Pub. & Date
Complementary Pseudo Multimodal Feature for Point Cloud Anomaly Detection arXiv nan nan 2023
EasyNet: An Easy Network for 3D Industrial Anomaly Detection arXiv nan nan 2023

Grounding

Title arXiv Github WebSite Pub. & Date
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding arXiv nan nan 2022
Learning Point-Language Hierarchical Alignment for 3D Visual Grounding arXiv nan nan 2022
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance arXiv nan nan 2023
NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations arXiv nan nan 2023
Multi-View Transformer for 3D Visual Grounding arXiv nan nan 2022
Learning Point-Language Hierarchical Alignment for 3D Visual Grounding arXiv nan nan 2022
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection arXiv nan nan 2022
3D VR Sketch Guided 3D Shape Prototyping and Exploration arXiv nan nan 2023

Completion

Title arXiv Github WebSite Pub. & Date
AGG-Net: Attention Guided Gated-convolutional Network for Depth Image Completion arXiv nan nan 2023

Style-Transfer

Title arXiv Github WebSite Pub. & Date
TeSTNeRF: Text-Driven 3D Style Transfer via Cross-Modal Learning arXiv nan nan 2023
TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition arXiv nan nan 2022
HyperStyle3D: Text-Guided 3D Portrait Stylization via Hypernetworks arXiv nan nan 2023
CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer arXiv nan nan 2023

Pose Estimation

Title arXiv Github WebSite Pub. & Date
ZeroPose: CAD-Model-based Zero-Shot Pose Estimation arXiv nan nan 2023
Weakly Supervised 3D Multi-person Pose Estimation for Large-scale Scenes based on Monocular Camera and Single LiDAR arXiv nan nan 2022
PoseScript: 3D Human Poses from Natural Language arXiv nan nan 2022
PoseFix: Correcting 3D Human Poses with Natural Language arXiv nan nan 2023
Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes arXiv nan nan 2023
CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting arXiv nan nan 2023
Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation arXiv nan nan 2022
Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation arXiv nan nan 2021
Cross-Domain 3D Hand Pose Estimation with Dual Modalities arXiv nan nan 2023
3D-Augmented Contrastive Knowledge Distillation for Image-based Object Pose Estimation arXiv nan nan 2022

Scene Understanding

Title arXiv Github WebSite Pub. & Date
Towards Label-free Scene Understanding by Vision Foundation Models arXiv nan nan 2023
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP arXiv nan nan 2023
Semantics-guided Transformer-based Sensor Fusion for Improved Waypoint Prediction arXiv nan nan 2023
Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding arXiv nan nan 2023
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding arXiv nan nan 2023
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models arXiv nan nan 2022
OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding arXiv nan nan 2023

Manupulation

Title arXiv Github WebSite Pub. & Date
PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation arXiv nan nan 2023
TextDeformer: Geometry Manipulation using Text Guidance arXiv nan nan 2023
Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation arXiv nan nan 2023

Retrival

Title arXiv Github WebSite Pub. & Date
Democratising 2D Sketch to 3D Shape Retrieval Through Pivoting arXiv nan nan 2023
RONO: Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal Retrieval arXiv nan nan 2023
TextANIMAR: Text-based 3D Animal Fine-Grained Retrieval arXiv nan nan 2023
SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and Multi-View for 3D Object Retrieval arXiv nan nan 2023
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data arXiv nan nan 2023
Towards 3D VR-Sketch to 3D Shape Retrieval arXiv nan nan 2022

Classification

Title arXiv Github WebSite Pub. & Date
Multimodal Brain Disease Classification with Functional Interaction Learning from Single fMRI Volume arXiv nan nan 2023

Localization

Title arXiv Github WebSite Pub. & Date
3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions arXiv nan nan 2023
UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input arXiv nan nan 2023
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language arXiv nan nan 2023

Question Answering

Title arXiv Github WebSite Pub. & Date
Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes arXiv nan nan 2023

Prediction

Title arXiv Github WebSite Pub. & Date
3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in Point Cloud arXiv nan nan 2023

Depth Estimation

Title arXiv Github WebSite Pub. & Date
Towards Zero-Shot Scale-Aware Monocular Depth Estimation arXiv nan nan 2023

Finetuning

Title arXiv Github WebSite Pub. & Date
ImageBind-LLM: Multi-modality Instruction Tuning arXiv nan nan 2023

Recognition

Title arXiv Github WebSite Pub. & Date
LiCamGait: Gait Recognition in the Wild by Using LiDAR and Camera Multi-modal Visual Sensors arXiv nan nan 2022
LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape Recognition arXiv nan nan 2023
Cross-Modal Learning with 3D Deformable Attention for Action Recognition arXiv nan nan 2023
FER-former: Multi-modal Transformer for Facial Expression Recognition arXiv nan nan 2023

Reconstruction

Title arXiv Github WebSite Pub. & Date
Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation arXiv nan nan 2023
Zero-1-to-3: Zero-shot One Image to 3D Object arXiv nan nan 2023

Style Generation

Title arXiv Github WebSite Pub. & Date
Style-aware Augmented Virtuality Embeddings (SAVE) arXiv nan nan 2023

Scene Understanding

Title arXiv Github WebSite Pub. & Date
M$^{3}$3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding arXiv nan nan 2023

Captioning

Title arXiv Github WebSite Pub. & Date
Scalable 3D Captioning with Pretrained Models arXiv nan nan 2023

Releases

No releases published

Packages

No packages published