Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Jan 24, 2020

Close: #2982, #2998, #2999, #1893, #2372
Related: #770

  • Shows the difference between HEAD and the current workspace by default
  • Don't show files that stayed the same between both trees.
  • Remove --target option.
  • Checksums are only displayed when --checksum option is provided.
  • Format diff for machine readability when --show-json option is given.
  • Changes in output:
    • Remove file size information
    • Recurse directory outputs and display its cached information.
    • Group diff by state (i.e. "Added", "Deleted", "Modified")
  • Allow running dvc diff when there's no cache available.
  • Update completion scripts
  • Update documentation (diff: update docs according to the new patch dvc.org#953)

Questions:

  • Should we introduce a --recursive option to explicitly recurse directory outputs?
  • Should we support renamed files?

@ghost ghost requested a review from dmpetrov January 24, 2020 07:13
@efiop
Copy link
Contributor

efiop commented Jan 24, 2020

@MrOutis What is the point of these tests if they are all commented out and TODos?

@efiop
Copy link
Contributor

efiop commented Jan 24, 2020

As for the CLI, I was thinking on using the following as output:
Would it be OK to show checksums only when --checksums (new option) is specified?

I think the ticket was saying that we shouldn't show checksums by default. Wouldn't bother with introducing --checksums until someone asks. It should be present in json output though.

Suor
Suor previously requested changes Jan 24, 2020
Copy link
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cool so far.

@ghost ghost changed the title diff: reimplement interface and tests from scratch [WIP] diff: reimplement interface and tests from scratch Jan 24, 2020
* OSError -> FileNotFoundError
* Make `a_ref` optional
* Use absolute imports for unrelated packages
* Relpath: `str(output.path_info)` -> `str(output)`
* `a_ref` defaults to `HEAD` at the signature and parser level
@ghost ghost changed the title [WIP] diff: reimplement interface and tests from scratch diff: reimplement interface and tests from scratch Jan 26, 2020
@ghost ghost requested review from Suor and efiop January 26, 2020 21:16
@ghost ghost dismissed Suor’s stale review January 26, 2020 21:16

@Suor, could you please take a look again? I did quite a lot of changes)

Suor
Suor previously requested changes Jan 28, 2020
cls._print_size(dct[diff.DIFF_SIZE])
)
return msg
for state, entries in diff.items()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of dict keys are not preserved before Python 3.6, so in Python 3.5 the sections will go in random order.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why the py3.5 test suite was failing! 🙈

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now iterating with an explicit order

dvc/repo/diff.py Outdated
old = outs[a_ref]
new = outs[b_ref or "working tree"]

added = new - old
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are abusing set mechanics and __eq__ here and below, this results into iterating over intersection very tricky and unobvious. People reading this code will expect it to conform to:

old - (old ^ new) == new - (old ^ new) == old & new

which it doesn't. And we don't need this anyway as we may simply use dicts:

saved_tree = repo.tree
try:
    repo.tree = get_ree(a_ref)
    # we don't have repo.outs, but might be we should, 
    # or make it repo.iter_outs()?
    old = {str(out): out.checksum for out in repo.outs}
    repo.tree = get_ree(b_ref)
    new = {str(out): out.checksum for out in repo.outs}
finally:
    repo.tree = saved_tree

# set efficiently converts dict keys to set, we only compare names here
added = {name: new[name] for name in set(new) - set(old)}
deleted = {name: old[name] for name in set(old) - set(new)}
changed = {
    name: {"old": old[name], "new": new[name]} 
    for name in set(old) & set(new) 
    if old[name] != new[name]
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was mind-bending 👌 thanks for pointing it out, @Suor !
just added it)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, instead of repo.iter_outs, I defined a method to return the dictionary with paths and checksums, might refactor that one later

@efiop efiop assigned ghost Jan 28, 2020
@ghost ghost changed the title diff: reimplement interface and tests from scratch [WIP] diff: reimplement interface and tests from scratch Jan 29, 2020
@ghost ghost changed the title [WIP] diff: reimplement interface and tests from scratch diff: reimplement interface and tests from scratch Jan 30, 2020
@ghost ghost requested a review from Suor January 30, 2020 00:57
Copy link
Contributor

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done testing. New issues were created for features outside of the scope of this change: #3256, #3255. Great communication during the process!

LGTM 👍

@efiop
Copy link
Contributor

efiop commented Jan 30, 2020

@MrOutis please check the tests, looks like there are some dict ordering issues.

@ghost
Copy link
Author

ghost commented Jan 30, 2020

@efiop , doing it right now

Mr. Outis and others added 2 commits January 30, 2020 09:26
@ghost ghost dismissed Suor’s stale review January 30, 2020 16:01

@Suor, I changed the diff logic to use dicts

@efiop efiop merged commit fd1e6f4 into treeverse:master Jan 30, 2020
@ghost ghost deleted the enhance-diff branch January 30, 2020 17:28
@jorgeorpinel
Copy link
Contributor

What's the fastest way to see if this is released or get notified when it is? I don't think it's included in 0.82.6, for example. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

diff: clean up output for changed files

4 participants