Skip to content

Latest commit

 

History

History
174 lines (138 loc) · 6.5 KB

DATASET.md

File metadata and controls

174 lines (138 loc) · 6.5 KB

Prepare Datasets for PSALM

The training process of PSALM has two stages. First stage is visual language alignment and the second stage is joint training of multiple segmentation tasks.

We use a custom dataset to enable joint training. We assume that all datasets' root path are under /datasets.

First stage training

We follow LLaVA's training strategy, see here for a detailed dataset preparation.

Second stage joint training

The second stage joint training of PSALM contains four different tasks: Generic Segmentation, Referring Segmentation, Interactivate Segmentation, and Visual-Language Tasks.

We use COCO Panoptic for Generic Segmentation, RefCOCO/+/g for Referring Segmentation, COCO-Interactive for Interactive Segmentation and LLaVA-1.5's training data for Visual-Language Tasks.

(Optional) We also support LVIS for PSALM's second stage joint training.

Expected dataset structure for COCO:

coco/
  annotations/
    instances_{train,val}2017.json
    panoptic_{train,val}2017.json
  {train,val}2017/
    # image files that are mentioned in the corresponding json
  panoptic_{train,val}2017/  # png annotations
  panoptic_semseg_{train,val}2017/  # generated by the script mentioned below

Install panopticapi by:

pip install git+https://github.com/cocodataset/panopticapi.git

run python datasets/build_COCO_instance.py, to get dataset format for COCO instance segmentation.

run python datasets/prepare_coco_semantic_annos_from_panoptic_annos.py, to extract semantic annotations from panoptic annotations (only used for evaluation).

Expected dataset structure for RefCOCO/+/g:

refseg/
    refcoco/
        instances.json
        merged_google.json
        refs(google).p
        refs(unc).p
    refcoco+/
        instances.json
        refs(unc).p
    refcocog/
        instances.json
        refs(google).p
        refs(umd.p)
    images/
        mscoco/
            train2014/

run python datasets/build_RefCOCO.py, to get the dataset format for joint training.

Dataset preparation for COCO-Interactive:

We build COCO-Interactive upon COCO-Instance. So make sure follow the instruction of COCO preparation.

run python datasets/build_COCO_Interactivate.py, to get the dataset format for joint training.

Also you can directly download converted file of COCO-Interactive in Google Drive | Baidu Cloud . Detailed format of downloaded file is here

Dataset preparation for LLaVA-1.5 training data:

Please download the images and annotation following llava 1.5 stage 2 training instruction.

# Do not need to download COCO again
gqa/
    images/
ocr_vqa/
    images/
textvqa/
    train_images/
vg/
    VG_100K/
    VG_100K_2/
llava_v1_5_mix665k.json

Since LLaVA-1.5 dataset contain text-only samples, run python datasets/prepare_llava_1_5.py to filter text-only samples. Note to change paths in prepare_llava_1_5.py to your dataset paths.

(Optional) Expected dataset structure for LVIS:

We only use LVIS dataset for training. If you have already downloaded the COCO images, you only need to download the LVIS annotations.

lvis/
    {train, val}2017/
        # Since you already have the coco image, there is no need to download this
    lvis_v1_train.json
    lvis_v1_val.json

run python datasets/build_lvis.py, to get the dataset format for joint training.

Zero-shot evaluation for other dataset

PSALM shows powerful zero-shot capability for many unseen tasks: Open-Vocabulary Segmentation, Generalized Referring Segmentation, and Video Object Segmentation.

Dataset preparation for Open-Vocabulary Segmentation:

We follow here for preparation of cityscapes, ADE20k, Pascal VOC, and Pascal Context.

Expected dataset structure for gRefCOCO:

Download the gRefCOCO dataset from this link and put in the same folder of RefCOCO

refer_seg/
    grefcoco/
        grefs(unc).json
        instances.json
    refcoco/
    refcoco+/
    refcocog/

run python datasets/build_gRefCOCO.py, to get the dataset format for evaluation.

Expected dataset structure for DAVIS-2017

DAVIS/
    2017/
        trainval/
            Annotations/
                480p/
                    # name for each video
            ImageSets/
                2017/
                    train.txt
                    val.txt
            JPEGImages/
                480p/
                    # name for each video

run python datasets/build_DAVIS.py, to get the dataset format for evaluation.

Download Converted Dataset Files

You can download converted files (Google Drive | Baidu Cloud (code: hust)). The dowloaded files should in following structure:

refcoco/
    refcoco_val.json
    refcoco_testA.json
    ...
refcoco+/
    refcoco+_val.json
    refcoco+_testA.json
    ...
refcocog/
    refcocog_val.json
    refcocog_test.json
    ...
grefcoco/
    refcocog_val.json
    refcocog_testA.json
    refcocog_testB.json
coco_interactive_train_psalm.json # training set for interactive coco
coco_interactive_val_psalm.json # val set for interactive coco
instruction_dataset_coco_format.json: # GT for COCO instance
    #you need to put this file in psalm/output/instance_segmentation
instruction_dataset_coco_format.json.lock #you need to put this file in psalm/output/instance_segmentation
instance_train_psalm.json: training set for COCO instance 
instance_val_psalm.json: val set for COCO instance 
trainval_val_psalm.json: val set for DAVIS