Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ImageFolderDataset #1232

Merged
merged 18 commits into from Feb 2, 2024
Merged

Add ImageFolderDataset #1232

merged 18 commits into from Feb 2, 2024

Conversation

laggui
Copy link
Member

@laggui laggui commented Feb 1, 2024

Checklist

  • Confirmed that run-checks all script has been executed.

Related Issues/PRs

Closes #1132

Changes

  • Added ImageFolderDataset to load images from disk (following a folder structure)
  • Added a custom-image-dataset example that downloads CIFAR-10 and trains a simple CNN model using the ImageFolderdataset

Testing

Added a couple of unit tests and an example.

@laggui
Copy link
Member Author

laggui commented Feb 1, 2024

Note that I added an ImageTarget enum for image classification, object detection and segmentation targets. At this time only the image classification targets are implemented while the other two would be for the future.

We could remove the options until they are implemented. Let me know your thoughts :)

/edit: welp, looks like I have some tests to fix tomorrow.

Copy link

codecov bot commented Feb 2, 2024

Codecov Report

Attention: 45 lines in your changes are missing coverage. Please review.

Comparison is base (9df2071) 84.41% compared to head (3072439) 84.40%.
Report is 8 commits behind head on main.

❗ Current head 3072439 differs from pull request most recent head 1095d00. Consider uploading reports for the commit 1095d00 to get more accurate results

Files Patch % Lines
burn-dataset/src/vision/image_folder.rs 79.72% 45 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1232      +/-   ##
==========================================
- Coverage   84.41%   84.40%   -0.02%     
==========================================
  Files         549      550       +1     
  Lines       61952    62174     +222     
==========================================
+ Hits        52295    52475     +180     
- Misses       9657     9699      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@nathanielsimard nathanielsimard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool and complete. I have a few comments to access before merging, but great job! 👏

burn-dataset/src/vision/image_folder.rs Show resolved Hide resolved
burn-dataset/src/vision/image_folder.rs Outdated Show resolved Hide resolved
burn-dataset/src/vision/image_folder.rs Show resolved Hide resolved
burn-dataset/src/vision/image_folder.rs Show resolved Hide resolved
burn-dataset/src/vision/image_folder.rs Outdated Show resolved Hide resolved
examples/custom-image-dataset/src/data.rs Outdated Show resolved Hide resolved
///
/// Taken from [burn-dataset](https://github.com/tracel-ai/burn/blob/main/burn-dataset/src/vision/downloader.rs).
#[tokio::main(flavor = "current_thread")]
pub async fn download_file_as_bytes(url: &str, message: &str) -> Vec<u8> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be duplicated from the mnist dataset implementation right? Maybe we can actually add cifar to burn-dataset and reuse the setup from mnist?

Copy link
Member Author

@laggui laggui Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why I linked to it. That download function can be used for any other future dataset we might want to add, I just wasn't sure if we wanted to add CIFAR-10 explicitly so I put it as an example. And the downloader module is private so I copied the code over with a reference.

Btw the original CIFAR-10 source is in bytes form (I used a mirror with the folder structure to illustrate the ImageFolderDataset which is more common), closer to MNIST.

Comment on lines +16 to +20
TrainingConfig::new(SgdConfig::new().with_momentum(Some(MomentumConfig {
momentum: 0.9,
dampening: 0.,
nesterov: false,
}))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is: is Adam better in this case 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually haven't fiddled with the training recipe too much for this example 😄 a bunch of things could probably be changed to further optimize the results, I just wanted to keep it simple.

Lmk if you want any changes to be applied to the training example.

@laggui
Copy link
Member Author

laggui commented Feb 2, 2024

Thanks for the feedback! Wasn't clear if you wanted to add CIFAR-10 to burn-dataset immediately or not. If that is the case just say so and I'll refactor it.

Copy link
Member

@nathanielsimard nathanielsimard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we could add CIFAR to the burn-dataset crates, but since you mentioned you used the folder version as an example, maybe it's best to just keep it in the example dir. Maybe we can add another version more similar to MNIST in the crate.

@nathanielsimard nathanielsimard merged commit 57ee2ce into main Feb 2, 2024
13 checks passed
@nathanielsimard nathanielsimard deleted the vision/image-dataset branch February 2, 2024 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Example] Add custom images dataset
2 participants