How can i store dataset while im using train.py #28

Caixy1113 · 2023-03-07T12:45:17Z

Dear timojl,

I have a question regarding the use of the training.py script. Specifically, I am wondering how to store my training dataset. I am encountering an error that says there is no "dataset_repository" folder. Could you please provide guidance on how to properly store my training dataset including PhraseCut,Coco and so on?
Thank you for your time and assistance.

Best regards,
Cai

erkoiv · 2023-03-08T10:50:42Z

Disclaimer - Not a CLIPSeg author, just a user.

The unzipped and structured datasets should be located in ~/datasets/dataset_name/... This will circumvent the functionality to un-tar a complete and already structured dataset from ~/dataset_repository/, although you could use this if you have one.
I reverse engineered the dataset setup for COCO. Setting up this dataset for training required following these instructions from the hsnet repository:

COCO-20i

Download COCO2014 train/val images and annotations:

wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip

Download COCO2014 train/val annotations from our Google Drive: [train2014.zip], [val2014.zip]. (and locate both train2014/ and val2014/ under annotations/ directory).

Resulting in folder structure:

~/datasets/
├── COCO-20i/
│ ├── annotations/
│ │ ├── train2014/ # (dir.) training masks (from Google Drive)
│ │ ├── val2014/ # (dir.) validation masks (from Google Drive)
│ │ └── ..some json files..
│ ├── train2014/
│ └── val2014/

If you have a dataset that is set up like the COCO one seen above, then you can change the dataset folder name in the 'wrappers' folder in the 'coco_wrapper.py' file to have the code use your custom dataset instead, although this will also require some changes in the way CLIPSeg uses hsnet to index the dataset.

timojl · 2023-03-19T10:39:44Z

I think @erkoiv already provided a great answer. The get_from_repository function is primarily used as an internal tool. In this repository it is sufficient to put the data into ~/datasets/<dataset>/.

timojl closed this as completed Mar 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can i store dataset while im using train.py #28

How can i store dataset while im using train.py #28

Caixy1113 commented Mar 7, 2023

erkoiv commented Mar 8, 2023

timojl commented Mar 19, 2023

How can i store dataset while im using train.py #28

How can i store dataset while im using train.py #28

Comments

Caixy1113 commented Mar 7, 2023

erkoiv commented Mar 8, 2023

timojl commented Mar 19, 2023