Skip to content

Conversation

@Suor
Copy link
Contributor

@Suor Suor commented Apr 15, 2019

This PR does several things:

  • ensures comments and custom string layout in stage files are retained in the load/dump cycle.
  • replaces PyYAML with ruamel.yaml
  • fixes broken abstraction of saving/loading dvc files directly via yaml lib, mostly in tests. Now load_stage_file() and dump_stage_file() is used everywhere.
  • adds pytest like fixtures to write functional tests with test dir
  • ensures utf-8 is always used in stage files regardless of OS default encoding
  • ensures stage files are always properly closed

NOTE: This does not include support for custom fields in stage files. That would go in separate PR.

@efiop
Copy link
Contributor

efiop commented Apr 15, 2019

Hi @Suor !

Just a heads up: tests for your PR are failing on travis and appveyor. Please take a look.

@Suor Suor changed the title dvc: retain formatting and comments in .dvc files WIP [WIP] dvc: retain formatting and comments in .dvc files Apr 15, 2019
@efiop
Copy link
Contributor

efiop commented Apr 15, 2019

For the record: talked in private and decided that there is no need to add explicit docs for this, since this behavior was expected from day 1 and we didn't document anywhere that we dropped comments before.

@shcheklein
Copy link
Contributor

Good stuff. 👍 Minor comments, I would add a test to check that if we put a comment into a stage file it does not modify its md5 - dvc repro does not run.

@Suor Suor force-pushed the dvc-formatting branch 2 times, most recently from 4639b73 to 72b02de Compare April 16, 2019 11:49
@Suor Suor changed the title [WIP] dvc: retain formatting and comments in .dvc files dvc: retain formatting and comments in .dvc files Apr 16, 2019
dvc/stage.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is leaking fds not considered bad in Python 3? 🙂

Copy link
Contributor Author

@Suor Suor Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repo.tree.open() might return StringIO, which acts as context manager in Python 3, but not in Python 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When dropping Python 2 we will just remove closing() leaving with block there. Comments like this will help easily find code to change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor Maybe we could return closing() in the tree itself then and leave a note about it there, instead of copying it everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using closing in a tree will prevent normal use of .open(). We may provide our subclasses of StringIO and BytesIO for Python 2 to make this transparently work everywhere, but is not worth the effort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor Should be pretty simple to wrap them in compat.py, or am I missing something? At least we won't have to go looking for "is only needed for Python 2.7" comments and will have everything contained within compat.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of this change is around zero, we just hide a bit of complexity in other file, we grow overall complexity so I say it has negative value. It might had payed off if we used .tree.open() frequently, but most probably we will have those two uses next 8 months only.

So the best strategy is to wait 8 months and then remove this code with Python 2.7 support.

Copy link
Contributor

@efiop efiop Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor Again, that other file is a special file where we hide all py2/3 compatibility wrappers. Removing it is much easier than going around searching for your duplicated comments around the code, that don't even mention anything about StringIO. How do you imagine anyone would read your # NOTE: closing() is only needed for Python 2.7 comment in a year and understand that it was about StringIO not having exit in python2? I'm not sure what we are fighting for, is it so hard to add something like

class StringIO(StringIO.StringIO):
    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()

to compat.py and be done with it? Or am I missing something here?

Copy link
Contributor Author

@Suor Suor Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all might be right in the long run, but there won't be one. Right now this is useless or negative value change, which should not be even discussed.

As for comment, anyone can search through project for "Python 2" and "Python 3" and remove closing(), he won't need to understand anything about StringIO. And whoever will make Python 2 drop will need to run that search anyway and that will take less time than doing senseless changes now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor 1 year is a long enough run to not create problems for ourselves by polluting code with nondescriptive comments. This is why when some piece of code is used more than once, in all other places we use compat.py and not copypasting NOTE: only needed for Python2 all over, which we would have to go looking for later. We do need to understand that closing is only needed for StringIO, because from your current comment it is not clear and people might get an impression that there are other things that need that. By adding a StringIO wrapper we mitigate the problem once and for all, and if anyone uses tree.open later he won't have to be aware of having to use closing() with it, because our wrapped StringIO would already be compatible with with.

@efiop
Copy link
Contributor

efiop commented Apr 16, 2019

@Suor Looks really good! A few minor comments above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not testing that ValueError that you are raising in apply_diff. 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would have no value. This is not target behavior to test, this will be simple code duplication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't even know if that code would work without at least unit testing it. We should either test it or remove it and/or replace it with an assert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not code implementing any desirable behavior, we don't really care what it does as long as it's loud. And I am pretty sure that raise will work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say I add code to test for ValueError, later I will change it to TypeError and test would be needed to change the same. A need for coordinated changes is a symptom of code duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to assert, time will tell who was right)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor Thanks! 🙂 But now your assert doesn't match your ValueError that you've had before and has additional logic in it. It used to be:

if isinstance(src, Mapping) and isinstance(dest, Mapping):
    ...
elif isinstance(src, Seq) and isinstance(dest, Seq):
    ...
else:
    raise ValueError()

and I meant that you could use something like

if isinstance(src, Mapping) and isinstance(dest, Mapping):
    ...
elif isinstance(src, Seq) and isinstance(dest, Seq):
    ...
else:
    assert False

if you don't wish to test it. In this form, this is a simple defensive assert and there is no additional logic in the assert itself. And now you have

assert is_same_type(src, dst)
if isinstance(src, Mapping) and isinstance(dest, Mapping):
    ...
elif isinstance(src, Seq) and isinstance(dest, Seq):
    ...

which is not equivalent to your initial logic, right? And now your assert has additional logic in it, but at least it is not dead code and is actually run during normal operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My asserts do much. Yours don't test for src and dest being both strings or bools.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New version of the code is still worse. There should be else and raise ValueError, this will guarantee that condition match is exhaustive and error type would make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your initial ValueError didn't test for that either and I was just proposing to use assert instead if you are so unwilling to test your ValueError.

@efiop
Copy link
Contributor

efiop commented Apr 17, 2019

@Suor Sorry, might've missed it, but have you added a test that @shcheklein requested in #1885 (comment) ?

@shcheklein
Copy link
Contributor

btw, guys, does this change affect this issue - #995 ? In what way?

@Suor
Copy link
Contributor Author

Suor commented Apr 18, 2019

@shcheklein this doesn't touch #995.

@Suor Suor mentioned this pull request Apr 18, 2019
@Suor Suor mentioned this pull request Apr 19, 2019
Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 🚀

@efiop efiop merged commit 0c86739 into treeverse:master Apr 19, 2019
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review 😅
I left you some comments @Suor

@Suor Suor mentioned this pull request Apr 23, 2019
@Suor
Copy link
Contributor Author

Suor commented Apr 23, 2019

Since this is already merged, after-review resolution comes in #1916.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants