Skip to content

diff: consider including modified git-tracked files #3385

@shcheklein

Description

@shcheklein

If the purpose of the dvc diff is to take a glance at what has changed in my iteration vs HEAD, or compare two experiments, it would be really convenient to include output of git diff --name-only in the output.

E.g. when I run dvc diff baseline-experiment bigrams-experiment, I get output like:

Modified:
    auc.metric
    data/features/
    data/features/test.pkl
    data/features/train.pkl
    model.pkl

files summary: 0 added, 0 deleted, 4 modified

it's fine, but can be misleading or not very informative. In this specific case there is also a change in the script train.py itself which is an essential part of the pipeline. I think, it would make to see at the name here.

Any though @iterative/engineering ?

UPDATE: Giving a second thought to this, I see that it's not even about DVC-tracked (cached) files vs Git-tracked files but DVC outputs vs outputs+dependencies vs all files (DVC and Git).

So, three options:

  1. Only changes to DVC "outputs" (e.g. including the Git-tracked metric file like in the example) above;
  2. Changes to DVC outputs and dependencies (that we take from DVC-files);
  3. All Git and DVC files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions