Skip to content

Empty subdirectories left in place while checking out previous version of the datasets #4344

@atifraza

Description

@atifraza

Bug Report

I am tracking versions of a set of datasets in this repository.
The directory structure is as shown below.

.
|- datasets             // Root directory for all datasets
|  |
|  |- dataset1          // Directories are named after the datasets
|  |  |
|  |  |- TRAIN.tsv
|  |  |- ...
|  |
|  |- ...

Each version of the datasets adds additional subdirectories to the datasets directory.
When checking out an older version (say v1) using git checkout v1 followed by a dvc checkout, DVC leaves empty subdirectories instead of removing them.

Specifically, if dataset1 was present in v1 but dataset2 was added by v2, checking out v1 leaves behind an empty dataset2 directory.

.
|- datasets             // Root directory for all datasets
|  |
|  |- dataset1          // Directories are named after the datasets
|  |  |
|  |  |- TRAIN.tsv      // Train/Test sets are actually from the correct version
|  |  |- ...
|  |
|  |- dataset2          // Empty directory
|  |
|  |- ...

Output of dvc version:

$ dvc version -v

DVC version: 1.3.1 (pip)
---------------------------------
Platform: Python 3.8.5 on Linux-4.9.0-0.bpo.6-amd64-x86_64-with-glibc2.10
Supports: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache types: hardlink, symlink
Repo: dvc, git
2020-08-06 00:01:05,317 DEBUG: Analytics is enabled.
2020-08-06 00:01:05,410 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmppp9x3ynw']'
2020-08-06 00:01:05,411 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmppp9x3ynw']'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugDid we break something?p2-mediumMedium priority, should be done, but less important

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions