PyTorch code for Finding in NAACL 2022 paper "Probing the Role of Positional Information in Vision-Language Models".
-
Updated
Jul 20, 2022 - Python
PyTorch code for Finding in NAACL 2022 paper "Probing the Role of Positional Information in Vision-Language Models".
Source code and documentation for the LREC-COLING'24 paper "Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies"
Training and inferencing model to extract license number plate
alt text for lazy people
A comparitive study between the two of the best performing open source Vision Language Models - Google Gemini Vision and CogVLM
Code and models for the paper 'Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge' published at AAAI 2022 DSTC10 Workshop
An end-to-end multimodal framework incorporating explicit knowledge graphs and OOD-detection. (NeurIPS23)
VinVL+L: Enriching Visual Representation with Location Context in Visual Question Answering (VQA)
Vision-Controllable Natural Language Generation
PyTorch implementation of the paper: All For One: Multi-modal Multi-Task Learning
Multilanguage vision and language research. Fork of Facebook AI Research (FAIR) modular framework for vision & language research (MMF).
Code for the paper "Learning English with Peppa Pig" https://doi.org/10.48550/arXiv.2202.12917
Multimodal Learning - using CLIP (Internship Project)
Under review. [IROS 2024] PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
An end-to-end vision and language model incorporating explicit knowledge graphs and OOD-detection.
Code for paper "Rethinking Task Sampling for Few-shot Vision-Language Transfer Learning" COLING 2022 workshop
Probe Vision-Language Models
Adaptively fine tuning transformer based models for multiple domains and multiple tasks
Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.
To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."