-
Notifications
You must be signed in to change notification settings - Fork 1.3k
dvc: retain formatting and comments in .dvc files #1885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @Suor ! Just a heads up: tests for your PR are failing on travis and appveyor. Please take a look. |
|
For the record: talked in private and decided that there is no need to add explicit docs for this, since this behavior was expected from day 1 and we didn't document anywhere that we dropped comments before. |
|
Good stuff. 👍 Minor comments, I would add a test to check that if we put a comment into a stage file it does not modify its md5 - dvc repro does not run. |
4639b73 to
72b02de
Compare
dvc/stage.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is leaking fds not considered bad in Python 3? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repo.tree.open() might return StringIO, which acts as context manager in Python 3, but not in Python 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When dropping Python 2 we will just remove closing() leaving with block there. Comments like this will help easily find code to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Suor Maybe we could return closing() in the tree itself then and leave a note about it there, instead of copying it everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using closing in a tree will prevent normal use of .open(). We may provide our subclasses of StringIO and BytesIO for Python 2 to make this transparently work everywhere, but is not worth the effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Suor Should be pretty simple to wrap them in compat.py, or am I missing something? At least we won't have to go looking for "is only needed for Python 2.7" comments and will have everything contained within compat.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of this change is around zero, we just hide a bit of complexity in other file, we grow overall complexity so I say it has negative value. It might had payed off if we used .tree.open() frequently, but most probably we will have those two uses next 8 months only.
So the best strategy is to wait 8 months and then remove this code with Python 2.7 support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Suor Again, that other file is a special file where we hide all py2/3 compatibility wrappers. Removing it is much easier than going around searching for your duplicated comments around the code, that don't even mention anything about StringIO. How do you imagine anyone would read your # NOTE: closing() is only needed for Python 2.7 comment in a year and understand that it was about StringIO not having exit in python2? I'm not sure what we are fighting for, is it so hard to add something like
class StringIO(StringIO.StringIO):
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
to compat.py and be done with it? Or am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all might be right in the long run, but there won't be one. Right now this is useless or negative value change, which should not be even discussed.
As for comment, anyone can search through project for "Python 2" and "Python 3" and remove closing(), he won't need to understand anything about StringIO. And whoever will make Python 2 drop will need to run that search anyway and that will take less time than doing senseless changes now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Suor 1 year is a long enough run to not create problems for ourselves by polluting code with nondescriptive comments. This is why when some piece of code is used more than once, in all other places we use compat.py and not copypasting NOTE: only needed for Python2 all over, which we would have to go looking for later. We do need to understand that closing is only needed for StringIO, because from your current comment it is not clear and people might get an impression that there are other things that need that. By adding a StringIO wrapper we mitigate the problem once and for all, and if anyone uses tree.open later he won't have to be aware of having to use closing() with it, because our wrapped StringIO would already be compatible with with.
|
@Suor Looks really good! A few minor comments above. |
tests/unit/utils/test_collections.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not testing that ValueError that you are raising in apply_diff. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test would have no value. This is not target behavior to test, this will be simple code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't even know if that code would work without at least unit testing it. We should either test it or remove it and/or replace it with an assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not code implementing any desirable behavior, we don't really care what it does as long as it's loud. And I am pretty sure that raise will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say I add code to test for ValueError, later I will change it to TypeError and test would be needed to change the same. A need for coordinated changes is a symptom of code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to assert, time will tell who was right)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Suor Thanks! 🙂 But now your assert doesn't match your ValueError that you've had before and has additional logic in it. It used to be:
if isinstance(src, Mapping) and isinstance(dest, Mapping):
...
elif isinstance(src, Seq) and isinstance(dest, Seq):
...
else:
raise ValueError()
and I meant that you could use something like
if isinstance(src, Mapping) and isinstance(dest, Mapping):
...
elif isinstance(src, Seq) and isinstance(dest, Seq):
...
else:
assert False
if you don't wish to test it. In this form, this is a simple defensive assert and there is no additional logic in the assert itself. And now you have
assert is_same_type(src, dst)
if isinstance(src, Mapping) and isinstance(dest, Mapping):
...
elif isinstance(src, Seq) and isinstance(dest, Seq):
...
which is not equivalent to your initial logic, right? And now your assert has additional logic in it, but at least it is not dead code and is actually run during normal operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My asserts do much. Yours don't test for src and dest being both strings or bools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New version of the code is still worse. There should be else and raise ValueError, this will guarantee that condition match is exhaustive and error type would make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your initial ValueError didn't test for that either and I was just proposing to use assert instead if you are so unwilling to test your ValueError.
|
@Suor Sorry, might've missed it, but have you added a test that @shcheklein requested in #1885 (comment) ? |
|
btw, guys, does this change affect this issue - #995 ? In what way? |
|
@shcheklein this doesn't touch #995. |
As part of this I started writing pytest-like tests and added commonly used fixtures.
Also test that line comments after md5 fields are retained.
It used to encode in system default encoding. Not sure if PyYAML was handling this silently or we were just lucky all the way.
efiop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! 🚀
ghost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review 😅
I left you some comments @Suor
|
Since this is already merged, after-review resolution comes in #1916. |
This PR does several things:
PyYAMLwithruamel.yamlload_stage_file()anddump_stage_file()is used everywhere.NOTE: This does not include support for custom fields in stage files. That would go in separate PR.