- Simple but Effective: CLIP Embeddings for Embodied AI. 2022 CVPR
- ZSON: Zero-shot Object-Goal Navigation using MultiModal Goal Embeddings. 2022
- CLIP on Wheels: Zero-shot Object Navigation as Object Localization and Exploration. 2022
- ViNG: Learning Open-World Navigation with Visual Goals. 2021 ICRA
- Pre-Trained Language Models for Interactive Decision-Making. 2022
- R3M: A Universal Visual Representation for Robot Manipulation. 2022 CoRL
- BC-Z: Zero-shot Task Generalization with Robotic Imitation Learning. 2021 CoRL
- Grounding Language with Visual Affordance over Unstructured Data. 2022
- What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data. 2022
- LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision and Action. 2022 CoRL
- Visual Language Maps for Robot Navigation.
- Do as I Can, Not as I Say: Grounding Language in Robotics Affordances.
- Open-vocabulary Queryable Scene Representations for Real World Planning
- Language Models as Zero-shot Planners: Extracting Actionable Knowledge for Embodied Agents.
- REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments. 2020 CVPR
- ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
- SQA3D: SITUATED QUESTION ANSWERING IN 3D SCENES. 2023 ICLR
- Episodic Transformer for Vision-and-Language Navigation. 2021 ICCV
- Lxmert: Learning crossmodality encoder representations from transformers. 2019 EMNLP
- Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. 2019 NIPS
- Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks.
- Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
- Cross-modal Map Learning for Vision and Language Navigation
- Airbert: In-domain Pretraining for Vision-and-Language Navigation
- Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models
- LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
- LEBP — Language Expectation & Binding Policy: A Two-Stream Framework for Embodied Vision-and-Language Interaction Task Learning Agents
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
- Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning, 2017 ICRA
- Semi-Parametric Topological Memory for Navigation, 2018 ICLR
- Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks, 2019 CVPR
- Neural Topological SLAM for Visual Navigation, 2020 CVPR
- Visual Graph Memory with Unsupervised Representation for Visual Navigation, 2021 ICCV
- No RL, No Simulation: Learning to Navigate without Navigating, 2021 NIPS
- Topological Semantic Graph Memory for Image-Goal Navigation, 2022 CoRL
- Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation, 2022 CVPR
- Memory-Augmented Reinforcement Learning for Image-Goal Navigation, 2022 IROS
- Last-Mile Embodied Visual Navigation, 2022 CoRL
- ViNG: Learning Open-World Navigation with Visual Goals, 2021 ICRA
- Lifelong Topological Visual Navigation, 2022 RA-L
- Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models. 2022 NIPS workshop
- Scaling Robot Learning with Semantically Imagined Experience. 2023 \
- Learning Universal Policies via Text-Guided Video Generation, 2023
- Policy Adaptation from Foundation Model Feedback, 2023 CVPR
- CLIPort: What and Where Pathways for Robotic Manipulation, 2021 CoRL
- RT-1: Robotics Transformer for Real-World Control at Scale. 2022
- Open-World Object Manipulation using Pre-trained Vision-Language Models. 2023
- R3M: A Universal Visual Representation for Robot Manipulation. 2022 CoRL