Skip to content

diff: clean up output for changed files #2982

@dmpetrov

Description

@dmpetrov

Context:

$ dvc -V
0.77.3+e03136
# all txt files are data files
$ ls
newfile.txt     small.txt       ttt.txt
newfile.txt.dvc small.txt.dvc   ttt.txt.dvc
$ echo qwert >> small.txt
$ dvc add small.txt
$ git add small.txt.dvc
$ git commit -m hello
$ dvc diff HEAD^
dvc diff from ada9d97 to 1a22314 .  # <<-- "diff ada9d97..1a22314" might be enough

diff for 'small.txt'
-small.txt with md5 5f73f210b6202aa07279cdff6776b64f
+small.txt with md5 b4301ef665d0343e028fcd5821654edc

added file with size 6 Bytes .  # <<-- File was NOT added, it was modified. 6 bytes is diff, not size.

diff for 'ttt.txt'
-ttt.txt with md5 92a47ab38589702d290647ce24386181
+ttt.txt with md5 92a47ab38589702d290647ce24386181

file size was not changed .  # <<-- what is a reason to output this then? Also, is it about the previous diff or the next one - hard to read?

diff for 'newfile.txt'
-newfile.txt with md5 99370a5386cac00c93c9fdc836076a7d
+newfile.txt with md5 99370a5386cac00c93c9fdc836076a7d

file size was not changed  # <<-- what is a reason to output it then?

Also, it would be great to simplify this output for parsing.

Possible "clean" output:

$ dvc diff HEAD^
diff ada9d97..1a22314
Changed 'small.txt' +6 bytes
md5 5f73f210b6202aa07279cdff6776b64f b4301ef665d0343e028fcd5821654edc
New 'fakefile.txt' +13728494 bytes
md5 92a47ab38589702d290647ce24386181 99370a5386cac00c93c9fdc836076a7d
Removed 'fakefile_2222.txt' +8326436 bytes
md5 e0313660cfc07a10417543b8e4d08bea9 f4daf3bacc9a9494df7d641f572e92b66

Actions:

  • Not show not-changed files (at least if it was not specified explicitly).
  • Clearly distinguish added files and modified files - with the sizes.
  • Improve output for reading. not empty lines within the context of a single file (preferably not empty lines at all). See diff: clean up output for changed files #2982 (comment).
  • Simplify the format for parsing. See --porcelain

EDIT 12/21/19:

The same requirements should apply to directories:

$ dvc diff -t dir1 HEAD^
dvc diff from 025a8b3 to 41d8e27

diff for 'dir1'
-dir1 with md5 7492f47e0a1908a80d942a01702e6b8b.dir
+dir1 with md5 8f8824fbd7b1107ef69c85ab5db82973.dir

4 files untouched, 0 files modified, 1 file added, 0 files deleted, size was increased by 37 Bytes

Actions:

  • Add an option --porcelain (like in Git) or --to-json to make output format easy-to-parse for scripts.

EDIT 12/24/19:

  • Do not print checksum by default. A separate option might be needed.
  • Support files inside data directories.

Metadata

Metadata

Assignees

Labels

bugDid we break something?p1-importantImportant, aka current backlog of things to doproduct: VSCodeIntegration with VSCode extensionresearchuiuser interface / interaction

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions