Contrastive-VisionVAE-Follower is a model used for multi-modal task called Vision-and-Language Navigation (VLN).
-
Updated
Jan 24, 2024 - C++
Contrastive-VisionVAE-Follower is a model used for multi-modal task called Vision-and-Language Navigation (VLN).
A list of research papers on knowledge-enhanced multimodal learning
LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Official repository of "Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation". We present the first dataset - R2R-IE-CE - to benchmark instructions errors in VLN. We then propose a method, IEDL.
Official implementation of the NAACL 2024 paper "Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning"
Code for ORAR Agent for Vision and Language Navigation on Touchdown and map2seq
Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty
Fast-Slow Test-time Adaptation for Online Vision-and-Language Navigation
FLAME: Learning to Navigate with Multimodal LLM in Urban Environments (arXiv:2408.11051)
[ECCV 2022] Official pytorch implementation of the paper "FedVLN: Privacy-preserving Federated Vision-and-Language Navigation"
Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP 2021 paper Sub-Instruction Aware Vision-and-Language Navigation
Code for 'Chasing Ghosts: Instruction Following as Bayesian State Tracking' published at NeurIPS 2019
Repository for Vision-and-Language Navigation via Causal Learning (Accepted by CVPR 2024)
Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations
Code of the NeurIPS 2021 paper: Language and Visual Entity Relationship Graph for Agent Navigation
Code and Data of the CVPR 2022 paper: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation
Add a description, image, and links to the vision-and-language-navigation topic page so that developers can more easily learn about it.
To associate your repository with the vision-and-language-navigation topic, visit your repo's landing page and select "manage topics."