Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can i store dataset while im using train.py #28

Closed
Caixy1113 opened this issue Mar 7, 2023 · 2 comments
Closed

How can i store dataset while im using train.py #28

Caixy1113 opened this issue Mar 7, 2023 · 2 comments

Comments

@Caixy1113
Copy link

Dear timojl,

I have a question regarding the use of the training.py script. Specifically, I am wondering how to store my training dataset. I am encountering an error that says there is no "dataset_repository" folder. Could you please provide guidance on how to properly store my training dataset including PhraseCut,Coco and so on?
Thank you for your time and assistance.

Best regards,
Cai

@erkoiv
Copy link

erkoiv commented Mar 8, 2023

Disclaimer - Not a CLIPSeg author, just a user.

  1. The unzipped and structured datasets should be located in ~/datasets/dataset_name/... This will circumvent the functionality to un-tar a complete and already structured dataset from ~/dataset_repository/, although you could use this if you have one.
  2. I reverse engineered the dataset setup for COCO. Setting up this dataset for training required following these instructions from the hsnet repository:

COCO-20i

Download COCO2014 train/val images and annotations:

wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip

Download COCO2014 train/val annotations from our Google Drive: [train2014.zip], [val2014.zip]. (and locate both train2014/ and val2014/ under annotations/ directory).

Resulting in folder structure:

~/datasets/
├── COCO-20i/
│ ├── annotations/
│ │ ├── train2014/ # (dir.) training masks (from Google Drive)
│ │ ├── val2014/ # (dir.) validation masks (from Google Drive)
│ │ └── ..some json files..
│ ├── train2014/
│ └── val2014/

If you have a dataset that is set up like the COCO one seen above, then you can change the dataset folder name in the 'wrappers' folder in the 'coco_wrapper.py' file to have the code use your custom dataset instead, although this will also require some changes in the way CLIPSeg uses hsnet to index the dataset.

@timojl
Copy link
Owner

timojl commented Mar 19, 2023

I think @erkoiv already provided a great answer. The get_from_repository function is primarily used as an internal tool. In this repository it is sufficient to put the data into ~/datasets/<dataset>/.

@timojl timojl closed this as completed Mar 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants