Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset upload structuring #577

Open
1 task done
Burhan-Q opened this issue Feb 19, 2024 · 10 comments
Open
1 task done

Dataset upload structuring #577

Burhan-Q opened this issue Feb 19, 2024 · 10 comments
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation web

Comments

@Burhan-Q
Copy link
Member

Search before asking

  • I have searched the HUB issues and found no similar bug report.

HUB Component

Datasets

Bug

Dataset structure shown on HUB

image

Tested working dataset structure and YAML file

data
└───data-seg20
        ├───data.yaml
        ├───train
        │     ├───images
        │     └───labels
        └───valid
              ├───images
              └───labels
path: ../data-seg20
train: train/images
val: valid/images
test: null
names:
  0: crack

Also mentioned on #569 (comment)

Environment

OS                  Windows-10-10.0.19045-SP0
Environment         Windows
Python              3.11.6
Install             git
RAM                 31.86 GB
CPU                 Intel Core(TM) i5-10600K 4.10GHz
CUDA                12.1

matplotlib          ✅ 3.8.2>=3.3.0
numpy               ✅ 1.26.3>=1.22.2
opencv-python       ✅ 4.9.0.80>=4.6.0
pillow              ✅ 10.2.0>=7.1.2
pyyaml              ✅ 6.0.1>=5.3.1
requests            ✅ 2.31.0>=2.23.0
scipy               ✅ 1.11.4>=1.4.1
torch               ✅ 2.1.1+cu121>=1.8.0
torchvision         ✅ 0.16.1+cu121>=0.9.0
tqdm                ✅ 4.66.1>=4.64.0
psutil              ✅ 5.9.7
py-cpuinfo          ✅ 9.0.0
thop                ✅ 0.1.1-2209072238>=0.1.1
pandas              ✅ 2.1.4>=1.1.4
seaborn             ✅ 0.13.1>=0.11.0

Minimal Reproducible Example

No response

Additional

I attempted again with the tiger pose dataset which uploaded with out issue, but failed due to a Timeout error. Retrying immediately raised a timeout error.

image

@Burhan-Q Burhan-Q added bug Something isn't working documentation Improvements or additions to documentation web labels Feb 19, 2024
@Burhan-Q
Copy link
Member Author

NOTE eventually the tiger pose dataset shows no errors, but I was not observing (or timing) when this occurred.

image

@kalenmike
Copy link
Member

@Burhan-Q We have error handling in place to manage multiple different formats, but we only suggest the correct one. I am not clear if you are stating that you are not able to upload a dataset format like the example or only that you can upload a dataset formatted differently?

The timeout error suggest that there was an issue connected with the server, we allow a retry option from the dropdown in those cases.

@kalenmike kalenmike self-assigned this Feb 20, 2024
@Burhan-Q
Copy link
Member Author

@kalenmike I was only able to upload a dataset with the structure mentioned in the opening comment. It is not possible to upload a dataset using the shown layout, not only did I have this issue it's been experienced by other users (how it was brought to my attention).

With respect to the timeout error, I did attempt a retry and when I did it immediately failed again, but I may not have waited enough time to try again. The timeout error seemingly "resolved itself" as it showed as correctly uploaded some time after uploading with no preventing.

@Burhan-Q
Copy link
Member Author

Burhan-Q commented Feb 20, 2024

One thing that was frustrating about the dataset uploading errors is that there is no indication as to what the error is or what the problem might be. This means that if an upload fails, as a user I have no clue why or what to change/fix. Having some kind of report of what errors occurred would be helpful.

@kalenmike
Copy link
Member

@Burhan-Q There is error reporting, it sounds like you just had the same issue every time. Timeout is no response from the server. We also have:

  • "YAML Not Found."
  • "Multiple YAMLs Found."
  • "Zip Formatted Incorrectly."
  • "Dataset Empty."
  • "YAML Formatting Error."
  • "Processing Error."
  • "Unable to Reach Server."

I may need to run through your issue with you tomorrow.

@kalenmike
Copy link
Member

kalenmike commented Feb 20, 2024

Also it looks like your dataset did not work because your YAML is not correct. Your YAML is telling us to look back a directory which is why you had to add another directory for it to work.

If you see the example YAML in HUB you will see there is no path key.

image

@Burhan-Q
Copy link
Member Author

@kalenmike that's the crazy part, the YAML with path: ../data-seg20 did work for me yesterday.

I decided to do some testing and I'm wondering if something was strange in particular in the last few days because all of the iterations I tested below worked without error. I tested changing the directory structure by varying the presence of a subdirectory in the .zip and by changing the directory layout (I call them out as HUB vs YOLO formats) as well as by varying the use of path: ../VisDrone20 vs path: VisDrone20 with the different dataset layouts.

Retesting 2024-02-20

Test 1

  • Successfully uploaded to HUB without errors
  • use path: ../VisDrone20 in YAML
  • includes subdirectory in .zip
  • use HUB dataset example structure
Details

VisDrone20.yaml

path: ../VisDrone20
train: images/train
val: images/val
test: null

names:
  0: pedestrian
  1: people
  2: bicycle
  3: car
  4: van
  5: truck
  6: tricycle
  7: awning-tricycle
  8: bus
  9: motor

VisDrone20.zip structure

VisDrone20.zip
└───VisDrone20
        ├───visdrone20.yaml
        ├───images
        │     ├───train
        │     └───val
        └───labels
              ├───train
              └───val

Test 2

  • Successfully uploaded to HUB without errors
  • use path: ../VisDrone20 in YAML
  • no subdirectory in .zip
  • use HUB dataset example structure
Details

VisDrone20.yaml

path: ../VisDrone20
train: images/train
val: images/val
test: null

names:
  0: pedestrian
  1: people
  2: bicycle
  3: car
  4: van
  5: truck
  6: tricycle
  7: awning-tricycle
  8: bus
  9: motor

VisDrone20.zip structure

VisDrone20.zip
    ├───visdrone20.yaml
    ├───images
    │     ├───train
    │     └───val
    └───labels
          ├───train
          └───val

Test 3

  • Successfully uploaded to HUB without errors
  • use path: VisDrone20 in YAML
  • no subdirectory in .zip
  • use HUB dataset example structure
Details

VisDrone20.yaml

path: ../VisDrone20
train: images/train
val: images/val
test: null

names:
  0: pedestrian
  1: people
  2: bicycle
  3: car
  4: van
  5: truck
  6: tricycle
  7: awning-tricycle
  8: bus
  9: motor

VisDrone20.zip structure

VisDrone20.zip
    ├───visdrone20.yaml
    ├───images
    │     ├───train
    │     └───val
    └───labels
          ├───train
          └───val

Test 4

  • Successfully uploaded to HUB without errors
  • use path: VisDrone20 in YAML
  • includes subdirectory in .zip
  • use HUB dataset example structure
Details

VisDrone20.yaml

path: VisDrone20
train: images/train
val: images/val
test: null

names:
  0: pedestrian
  1: people
  2: bicycle
  3: car
  4: van
  5: truck
  6: tricycle
  7: awning-tricycle
  8: bus
  9: motor

VisDrone20.zip structure

VisDrone20.zip
└───VisDrone20
        ├───visdrone20.yaml
        ├───images
        │     ├───train
        │     └───val
        └───labels
              ├───train
              └───val

Test 5

  • Successfully uploaded to HUB without errors
  • use path: VisDrone20 in YAML
  • includes subdirectory in .zip
  • use Ultralytics YOLO dataset structure
Details

VisDrone20.yaml

path: VisDrone20
train: train/images
val: val/images
test: null

names:
  0: pedestrian
  1: people
  2: bicycle
  3: car
  4: van
  5: truck
  6: tricycle
  7: awning-tricycle
  8: bus
  9: motor

VisDrone20.zip structure

VisDrone20.zip
└───VisDrone20
        ├───visdrone20.yaml
        ├───train
        │     ├───images
        │     └───labels
        └───val
              ├───images
              └───labels

Test 6

  • Successfully uploaded to HUB without errors
  • use path: ../VisDrone20 in YAML
  • includes subdirectory in .zip
  • use Ultralytics YOLO dataset structure
Details

VisDrone20.yaml

path: ../VisDrone20
train: train/images
val: val/images
test: null

names:
  0: pedestrian
  1: people
  2: bicycle
  3: car
  4: van
  5: truck
  6: tricycle
  7: awning-tricycle
  8: bus
  9: motor

VisDrone20.zip structure

VisDrone20.zip
└───VisDrone20
        ├───visdrone20.yaml
        ├───train
        │     ├───images
        │     └───labels
        └───val
              ├───images
              └───labels

Test 7

  • Successfully uploaded to HUB without errors
  • use path: ../VisDrone20 in YAML
  • no subdirectory in .zip
  • use Ultralytics YOLO dataset structure
Details

VisDrone20.yaml

path: ../VisDrone20
train: train/images
val: val/images
test: null

names:
  0: pedestrian
  1: people
  2: bicycle
  3: car
  4: van
  5: truck
  6: tricycle
  7: awning-tricycle
  8: bus
  9: motor

VisDrone20.zip structure

VisDrone20.zip
        ├───visdrone20.yaml
        ├───train
        │     ├───images
        │     └───labels
        └───val
              ├───images
              └───labels

Test 8

  • Successfully uploaded to HUB without errors
  • use path: VisDrone_20 in YAML
  • includes subdirectory in .zip
  • use Ultralytics YOLO dataset structure
Details

VisDrone20.yaml

path: VisDrone_20
train: train/images
val: val/images
test: null

names:
  0: pedestrian
  1: people
  2: bicycle
  3: car
  4: van
  5: truck
  6: tricycle
  7: awning-tricycle
  8: bus
  9: motor

VisDrone20.zip structure

VisDrone_20.zip
        ├───visdrone20.yaml
        ├───train
        │     ├───images
        │     └───labels
        └───val
              ├───images
              └───labels

@kalenmike
Copy link
Member

@Burhan-Q To confirm you are no longer seeing any errors?

We have an example of what a dataset should look like, but we also fix datasets with very common and obvious mistakes. The dataset processing happens after it is requested so sometimes it can fail without any reason or crash due to excess memory usage. We are constantly optimizing this.

@Burhan-Q
Copy link
Member Author

Yeah I was unable to get an error in testing any of the examples above. I failed to document as thoroughly the attempts I made from yesterday, so it makes it more difficult to pin down the issue. I think these tests cover most variations and all were successful.

@kalenmike is it possible for you to enable verbose logging to my HUB account? Something like "log every action for N hours" so there's a more traceable history for testing? To be clear I'm asking if it's possible, not for a feature add.

@kalenmike
Copy link
Member

@Burhan-Q No, that's not possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation web
Projects
None yet
Development

No branches or pull requests

2 participants