Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
-
Updated
Jul 1, 2019 - Python
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.
[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
Compose multimodal datasets 🎹
Collects a multimodal dataset of Wikipedia articles and their images
Official Git repository for "Hakimov, S., and Schlangen, D., (2023). Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks. Findings of the Association for Computational Linguistics (ACL 2023 Findings)"
Pre-Processing of Annotated Music Video Corpora (COGNIMUSE and DEAP)
Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields
Add a description, image, and links to the multimodal-datasets topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-datasets topic, visit your repo's landing page and select "manage topics."