Skip to content

Latest commit

 

History

History
78 lines (70 loc) · 3.79 KB

Data.md

File metadata and controls

78 lines (70 loc) · 3.79 KB

Data

Data file name Size
open-llava-next_instruct_mix1M.json 1.64 GB
vqa_collection.zip 30.20 GB

We have made every effort to align our training data with that of LLaVA-NeXT. However, we were unable to access the tens of thousands of real user interaction data that LLaVA-NeXT collected. As a result, we used 200K ALLaVA-Instruct-VFLAN-4V data as a substitute. Additionally, since TextVQA has been included in the training data of most existing LMMs, we chose to retain it to enable fair comparisons with other LMMs.

Dataset

The dataset, based on sharegpt4v_mix665k, has been expanded to include ALLaVA-Instruct-VFLAN-4V, DocVQA, SynDog-EN, ChartQA, DVQA, AI2D, and GeoQA+, totaling 1M image-text pairs.

Prepare Images

First, download all images we used.

Then, organize the data as follows:

Open-LLaVA-NeXT
├── ...
├── data
│   ├── llava
│   │   ├── llava_pretrain
│   │   │   ├── images
│   ├── coco
│   │   ├── train2017
│   ├── sam
│   │   ├── images
│   ├── gqa
│   │   ├── images
│   ├── ocr_vqa
│   │   ├── images
│   ├── textvqa
│   │   ├── train_images
│   ├── vg
│   │   ├── VG_100K
│   │   ├── VG_100K_2
│   ├── open-llava-next
│   │   ├── open-llava-next_instruct_mix1M.json
│   ├── web-celebrity
│   │   ├── images
│   ├── web-landmark
│   │   ├── images
│   ├── wikiart
│   │   ├── images
│   ├── allava_vflan
│   │   ├── images
│   │   │   ├── images_191task_1k
│   ├── share_textvqa
│   │   ├── images
│   ├── ai2d
│   │   ├── images
│   ├── chatqa
│   │   ├── train
│   │   │   ├── png
│   ├── docvqa
│   │   ├── train
│   │   │   ├── documents
│   ├── dvqa
│   │   ├── images
│   ├── geoqa+ 
│   │   ├── images
│   ├── synthdog-en
│   │   ├── images
├── ...