LAVIS - A One-stop Library for Language-Vision Intelligence
-
Updated
Oct 11, 2024 - Jupyter Notebook
LAVIS - A One-stop Library for Language-Vision Intelligence
500,000 multimodal short video data and baseline models. 50万条多模态短视频数据集和基线模型(TensorFlow2.0)。
Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.
This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers". As a part of this release we share the information about recent multimodal datasets which are available for research purposes. We found that although 100+ multimodal language resources are available…
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
Compose multimodal datasets 🎹
Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.
[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics
Collects a multimodal dataset of Wikipedia articles and their images
Official Git repository for "Hakimov, S., and Schlangen, D., (2023). Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks. Findings of the Association for Computational Linguistics (ACL 2023 Findings)"
Image Recommendation for Wikipedia Articles
Create a large, well-managed and clean data-set for the task of music composition for video soundtracks.
Pre-Processing of Annotated Music Video Corpora (COGNIMUSE and DEAP)
All experiments were done to classify multimodal data.
Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields
Add a description, image, and links to the multimodal-datasets topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-datasets topic, visit your repo's landing page and select "manage topics."