The training process of PSALM has two stages. First stage is visual language alignment and the second stage is joint training of multiple segmentation tasks.
We use a custom dataset to enable joint training. We assume that all datasets' root path are under /datasets.
We follow LLaVA's training strategy, see here for a detailed dataset preparation.
The second stage joint training of PSALM contains four different tasks: Generic Segmentation, Referring Segmentation, Interactivate Segmentation, and Visual-Language Tasks.
We use COCO Panoptic for Generic Segmentation, RefCOCO/+/g for Referring Segmentation, COCO-Interactive for Interactive Segmentation and LLaVA-1.5's training data for Visual-Language Tasks.
(Optional) We also support LVIS for PSALM's second stage joint training.
Expected dataset structure for COCO:
coco/
annotations/
instances_{train,val}2017.json
panoptic_{train,val}2017.json
{train,val}2017/
# image files that are mentioned in the corresponding json
panoptic_{train,val}2017/ # png annotations
panoptic_semseg_{train,val}2017/ # generated by the script mentioned below
Install panopticapi by:
pip install git+https://github.com/cocodataset/panopticapi.git
run python datasets/build_COCO_instance.py
, to get dataset format for COCO instance segmentation.
run python datasets/prepare_coco_semantic_annos_from_panoptic_annos.py
, to extract semantic annotations from panoptic annotations (only used for evaluation).
Expected dataset structure for RefCOCO/+/g:
refseg/
refcoco/
instances.json
merged_google.json
refs(google).p
refs(unc).p
refcoco+/
instances.json
refs(unc).p
refcocog/
instances.json
refs(google).p
refs(umd.p)
images/
mscoco/
train2014/
run python datasets/build_RefCOCO.py
, to get the dataset format for joint training.
We build COCO-Interactive upon COCO-Instance. So make sure follow the instruction of COCO preparation.
run python datasets/build_COCO_Interactivate.py
, to get the dataset format for joint training.
Also you can directly download converted file of COCO-Interactive in Google Drive | Baidu Cloud . Detailed format of downloaded file is here
Please download the images and annotation following llava 1.5 stage 2 training instruction.
# Do not need to download COCO again
gqa/
images/
ocr_vqa/
images/
textvqa/
train_images/
vg/
VG_100K/
VG_100K_2/
llava_v1_5_mix665k.json
Since LLaVA-1.5 dataset contain text-only samples, run python datasets/prepare_llava_1_5.py
to filter text-only samples. Note to change paths in prepare_llava_1_5.py
to your dataset paths.
(Optional) Expected dataset structure for LVIS:
We only use LVIS dataset for training. If you have already downloaded the COCO images, you only need to download the LVIS annotations.
lvis/
{train, val}2017/
# Since you already have the coco image, there is no need to download this
lvis_v1_train.json
lvis_v1_val.json
run python datasets/build_lvis.py
, to get the dataset format for joint training.
PSALM shows powerful zero-shot capability for many unseen tasks: Open-Vocabulary Segmentation, Generalized Referring Segmentation, and Video Object Segmentation.
We follow here for preparation of cityscapes, ADE20k, Pascal VOC, and Pascal Context.
Expected dataset structure for gRefCOCO:
Download the gRefCOCO dataset from this link and put in the same folder of RefCOCO
refer_seg/
grefcoco/
grefs(unc).json
instances.json
refcoco/
refcoco+/
refcocog/
run python datasets/build_gRefCOCO.py
, to get the dataset format for evaluation.
Expected dataset structure for DAVIS-2017
DAVIS/
2017/
trainval/
Annotations/
480p/
# name for each video
ImageSets/
2017/
train.txt
val.txt
JPEGImages/
480p/
# name for each video
run python datasets/build_DAVIS.py
, to get the dataset format for evaluation.
You can download converted files (Google Drive | Baidu Cloud (code: hust)). The dowloaded files should in following structure:
refcoco/
refcoco_val.json
refcoco_testA.json
...
refcoco+/
refcoco+_val.json
refcoco+_testA.json
...
refcocog/
refcocog_val.json
refcocog_test.json
...
grefcoco/
refcocog_val.json
refcocog_testA.json
refcocog_testB.json
coco_interactive_train_psalm.json # training set for interactive coco
coco_interactive_val_psalm.json # val set for interactive coco
instruction_dataset_coco_format.json: # GT for COCO instance
#you need to put this file in psalm/output/instance_segmentation
instruction_dataset_coco_format.json.lock #you need to put this file in psalm/output/instance_segmentation
instance_train_psalm.json: training set for COCO instance
instance_val_psalm.json: val set for COCO instance
trainval_val_psalm.json: val set for DAVIS