Add `ImageFolderDataset` #1232

laggui · 2024-02-01T21:09:02Z

Checklist

Confirmed that run-checks all script has been executed.

Related Issues/PRs

Closes #1132

Changes

Added ImageFolderDataset to load images from disk (following a folder structure)
Added a custom-image-dataset example that downloads CIFAR-10 and trains a simple CNN model using the ImageFolderdataset

Testing

Added a couple of unit tests and an example.

…ms sequence

laggui · 2024-02-01T21:10:56Z

Note that I added an ImageTarget enum for image classification, object detection and segmentation targets. At this time only the image classification targets are implemented while the other two would be for the future.

We could remove the options until they are implemented. Let me know your thoughts :)

/edit: welp, looks like I have some tests to fix tomorrow.

codecov · 2024-02-02T13:31:45Z

Codecov Report

Attention: 45 lines in your changes are missing coverage. Please review.

Comparison is base (9df2071) 84.41% compared to head (3072439) 84.40%.
Report is 8 commits behind head on main.

❗ Current head 3072439 differs from pull request most recent head 1095d00. Consider uploading reports for the commit 1095d00 to get more accurate results

Files	Patch %	Lines
burn-dataset/src/vision/image_folder.rs	79.72%	45 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1232      +/-   ##
==========================================
- Coverage   84.41%   84.40%   -0.02%     
==========================================
  Files         549      550       +1     
  Lines       61952    62174     +222     
==========================================
+ Hits        52295    52475     +180     
- Misses       9657     9699      +42

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nathanielsimard

Very cool and complete. I have a few comments to access before merging, but great job! 👏

burn-dataset/src/vision/image_folder.rs

examples/custom-image-dataset/src/data.rs

nathanielsimard · 2024-02-02T18:41:39Z

examples/custom-image-dataset/src/downloader.rs

+///
+/// Taken from [burn-dataset](https://github.com/tracel-ai/burn/blob/main/burn-dataset/src/vision/downloader.rs).
+#[tokio::main(flavor = "current_thread")]
+pub async fn download_file_as_bytes(url: &str, message: &str) -> Vec<u8> {


It seems to be duplicated from the mnist dataset implementation right? Maybe we can actually add cifar to burn-dataset and reuse the setup from mnist?

Yes, that's why I linked to it. That download function can be used for any other future dataset we might want to add, I just wasn't sure if we wanted to add CIFAR-10 explicitly so I put it as an example. And the downloader module is private so I copied the code over with a reference.

Btw the original CIFAR-10 source is in bytes form (I used a mirror with the folder structure to illustrate the ImageFolderDataset which is more common), closer to MNIST.

nathanielsimard · 2024-02-02T18:42:28Z

examples/custom-image-dataset/examples/custom-image-dataset.rs

+            TrainingConfig::new(SgdConfig::new().with_momentum(Some(MomentumConfig {
+                momentum: 0.9,
+                dampening: 0.,
+                nesterov: false,
+            }))),


The question is: is Adam better in this case 😅

I actually haven't fiddled with the training recipe too much for this example 😄 a bunch of things could probably be changed to further optimize the results, I just wanted to keep it simple.

Lmk if you want any changes to be applied to the training example.

laggui · 2024-02-02T21:05:41Z

Thanks for the feedback! Wasn't clear if you wanted to add CIFAR-10 to burn-dataset immediately or not. If that is the case just say so and I'll refactor it.

nathanielsimard

LGTM, we could add CIFAR to the burn-dataset crates, but since you mentioned you used the folder version as an example, maybe it's best to just keep it in the example dir. Maybe we can add another version more similar to MNIST in the crate.

laggui added 10 commits February 1, 2024 15:06

Add ImageFolderDataset

97b5c82

Implement TryFrom for image types and sort by path for repeatable ite…

f030c10

…ms sequence

Add CIFAR10Loader for custom ImageFolderdataset

9de5a95

Rename mod image -> image_folder

7dc72ee

Add CNN model and training

684acda

Update README description

e50ace8

Add ImageFolderDataset tests

6b7a0c6

Update Cargo.lock

6ebe74a

Fix typo

0fd93dd

Checks fixes

606a5b9

laggui requested review from louisfd and nathanielsimard February 1, 2024 21:09

Add type missing type hint for windows tests

cabe1a2

Rename DataType -> PixelDepth and add try_into tests

3072439

nathanielsimard requested changes Feb 2, 2024

View reviewed changes

laggui added 6 commits February 2, 2024 14:44

Rename new methods for classification

889dcf5

Rename ImageTarget to Annotation

2821b27

Generalize names for other image dataset types

021f380

Return Result from dataset creation

7f58cde

Move normalizer to ClassificationBatcher field

758e216

Add parse_image_annotation test

1095d00

laggui requested a review from nathanielsimard February 2, 2024 21:04

nathanielsimard approved these changes Feb 2, 2024

View reviewed changes

nathanielsimard merged commit 57ee2ce into main Feb 2, 2024
13 checks passed

nathanielsimard deleted the vision/image-dataset branch February 2, 2024 21:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `ImageFolderDataset` #1232

Add `ImageFolderDataset` #1232

laggui commented Feb 1, 2024

laggui commented Feb 1, 2024 •

edited

codecov bot commented Feb 2, 2024 •

edited

nathanielsimard left a comment

nathanielsimard Feb 2, 2024

laggui Feb 2, 2024 •

edited

nathanielsimard Feb 2, 2024

laggui Feb 2, 2024

laggui commented Feb 2, 2024

nathanielsimard left a comment

Add ImageFolderDataset #1232

Add ImageFolderDataset #1232

Conversation

laggui commented Feb 1, 2024

Checklist

Related Issues/PRs

Changes

Testing

laggui commented Feb 1, 2024 • edited

codecov bot commented Feb 2, 2024 • edited

Codecov Report

nathanielsimard left a comment

Choose a reason for hiding this comment

nathanielsimard Feb 2, 2024

Choose a reason for hiding this comment

laggui Feb 2, 2024 • edited

Choose a reason for hiding this comment

nathanielsimard Feb 2, 2024

Choose a reason for hiding this comment

laggui Feb 2, 2024

Choose a reason for hiding this comment

laggui commented Feb 2, 2024

nathanielsimard left a comment

Choose a reason for hiding this comment

Add `ImageFolderDataset` #1232

Add `ImageFolderDataset` #1232

laggui commented Feb 1, 2024 •

edited

codecov bot commented Feb 2, 2024 •

edited

laggui Feb 2, 2024 •

edited