-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
If the purpose of the dvc diff is to take a glance at what has changed in my iteration vs HEAD, or compare two experiments, it would be really convenient to include output of git diff --name-only in the output.
E.g. when I run dvc diff baseline-experiment bigrams-experiment, I get output like:
Modified:
auc.metric
data/features/
data/features/test.pkl
data/features/train.pkl
model.pkl
files summary: 0 added, 0 deleted, 4 modified
it's fine, but can be misleading or not very informative. In this specific case there is also a change in the script train.py itself which is an essential part of the pipeline. I think, it would make to see at the name here.
Any though @iterative/engineering ?
UPDATE: Giving a second thought to this, I see that it's not even about DVC-tracked (cached) files vs Git-tracked files but DVC outputs vs outputs+dependencies vs all files (DVC and Git).
So, three options:
- Only changes to DVC "outputs" (e.g. including the Git-tracked metric file like in the example) above;
- Changes to DVC outputs and dependencies (that we take from DVC-files);
- All Git and DVC files.