#

vision-language

Here are 92 public repositories matching this topic...

leaderj1001 / Vision-Language

Vision-Language, Solve GQA(Visual Reasoning in the Real World) dataset.

vqa vision-language gqa

Updated May 9, 2019
Python

nashory / rtic-gcn-pytorch

Official PyTorch Implementation of RITC

composition vision-language fashion-iq

Updated Oct 26, 2021
Python

henghuiding / Vision-Language-Transformer

[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation

tensorflow keras transformer vision-language referring-segmentation tpami iccv2021 vision-language-transformer

Updated Jan 7, 2022
Python

engindeniz / DialogSummary-VideoQA

[ICCV 2021] On the hidden treasure of dialog in video question answering

language-models video-understanding vision-language video-question-answering knowledge-base-videoqa

Updated Mar 30, 2022
Python

HanqingWangAI / Active_VLN

The repository of ECCV 2020 paper `Active Visual Information Gathering for Vision-Language Navigation`

navigation vision-language eccv2020 embodied-ai embodied-navigation room-to-room

Updated Apr 9, 2022
Python

mczhuge / Kaleido-BERT

💐Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

fashion e-commerce bert multimodal pre-training vision-language

Updated Jun 29, 2022
Python

plxmert

phiyodr / plxmert

PyTorch code for Finding in NAACL 2022 paper "Probing the Role of Positional Information in Vision-Language Models".

naacl transformers vision-and-language pre-training vision-language lxmert naacl2022 unibwm

Updated Jul 20, 2022
Python

DavidK0 / SUTS-for-VLMs

This repository contains a spatial understanding test suite for vision-language models

test-suite university-of-washington vision-language

Updated Sep 5, 2022
Python

MikeWangWZHL / VidIL

Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

clip blip msvd youcook2 vision-language msrvtt vatex gpt-3 video-language vlep

Updated Sep 15, 2022
Python

doc-doc / HQGA

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)

videoqa vision-language video-question-answering conditional-graph-hierarchy

Updated Sep 17, 2022
Python

zanyarz / NeuralTwinsTalk

Pytorch Implementation of NeuralTwinsTalk Presented @ IEEE HCCAI 2020.

natural-language-processing computer-vision image-captioning vision-language

Updated Sep 26, 2022
Python

chi0tzp / ContraCLIP

Authors official PyTorch implementation of the "ContraCLIP: Interpretable GAN generation driven by pairs of contrasting sentences".

gans clip interpretability vision-language non-linear-paths

Updated Oct 1, 2022
Python

liveseongho / DramaQA

DramaQA Starter Code (2021)

pytorch vision-language video-question-answering multi-modal-learning

Updated Oct 10, 2022
Python

zdou0830 / METER

METER: A Multimodal End-to-end TransformER Framework

vision-language

Updated Nov 16, 2022
Python

yangli18 / VLTVG

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022

cross-modal visual-grounding vision-language visual-linguistic

Updated Dec 2, 2022
Python

amazon-science / mix-generation

MixGen: A New Multi-Modal Data Augmentation

data-augmentation multimodal vision-language pretraining data-efficiency

Updated Jan 9, 2023
Python

zinengtang / DeCEMBERT

Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)

video vision-language video-language video-language-understanding

Updated Jan 12, 2023
Python

howard-hou / BagFormer

PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

vision-language cross-modal-retrieval image-text-retrieval

Updated Jan 14, 2023
Python

eric-ai-lab / Mitigate-Gender-Bias-in-Image-Search

Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arxiv.org/abs/2109.05433

image-search multimodality gender-bias fairness-ml vision-language

Updated Feb 6, 2023
Python

woodfrog / vse_infty

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021 (Oral)

pytorch vse visual-semantic vision-language cross-modal-retrieval image-text-matching

Updated Feb 20, 2023
Python

Improve this page

Add a description, image, and links to the vision-language topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language topic, visit your repo's landing page and select "manage topics."