-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Issue description:
Pulling two different releases using RemoteDataset.pull() using with_folders=True can result in images missing from the second release pull unless force_replace=True is specified.
When making consecutive calls to RemoteDataset.pull(release) with different releases, if any file in the second release shares filename with one in the first (even with different paths), pull is going to ignore those files unless force_replace=True
release_A references the following files:
/dir_A/000.png/dir_A/001.png
release_B references:
/dir_B/000.png/dir_B/002.png
After calling
RemoteDataset.pull(release_A, with_folders=True)
RemoteDataset.pull(release_B, with_folders=True)
The resulting images directory will be
images:
- dir_A:
- 000.png
- 001.png
- dir_B:
- 002.png
Expected Behavior:
Since /dir_B/000.png is not supposed to overwrite /dir_A/000.png , it's expected that the second pull should download it as well.
force_replace=True should only be necessary to download files with conflicting local paths.
Workaround
Specify force_replace=True when pulling data
Environment
python==3.10darwin-py==0.8.24