-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Resume import #1540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resume import #1540
Conversation
d1b9126 to
314629a
Compare
dvc/remote/http.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tmp_file creates a random file name. See:
return fname + '.' + str(uuid.uuid4())
So you won't be able to find your previous unfinished download.
Your test doesn't catch this bug because you patch return value of tmp_file and you shouldn't do that. You should use something like fname + '.part' instead of self.tmp_file(fname).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think I will be able to find tmp file. See _existing_tmp(target_file):
targt_basename = os.path.basename(target_file)
if targt_basename in file:
return file
I check if target_file is in name of file and, if so, it returns it. I have to admit that this is what I wanted to ask: Will such check be enough for us? It surely narrows usage of resume option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record: discussed this during our meeting.
efiop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Could you please introduce --continue flag in this PR to make this behaviour non-default as we've discussed previously? Also a minor comment in the tests.
tests/test_import.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check return value as well, just to be sure. Same in the main() above.
efiop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! A few more comments down below.
dvc/command/imp.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about using -r here, since it is used for -r|--remote in other commands. How about we only leave a long option --resume for it for now?
dvc/project.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not pass resume here, but rather pass it to stage.run() below.
dvc/remote/http.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: why do we use __ (double underscore) instead of _ (single underscore) for private methods in some places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reason for that, ill fix that.
dvc/remote/http.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: present -> existing
dvc/remote/http.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably do fs.flush() (or something similar in python) right after that to ensure CHUNK SIZE is written asap and file check won't fail next time we run it with --resume
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be think about doing a smaller chunk size, something reasonable so OS can more or less atomically flush it, it should be probably power of 2 as well - some number of fs blocks
dvc/remote/http.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be I'm missing something but why do we check the target_file size here, not the partial one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, sorry!
tests/utils/httpd.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it used in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, problem with static server, is that even that we close and shutdown socket, then server, the port is still binded for some unspecified amount of time. We can solve it two ways:
- each time we use server draw different port
- try to bind it few times
Me and @efiop decided to go with the second approach.
Do you think I should change that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there are libraries that help you stub/mock HTTP API, some thing similar to https://github.com/bblimke/webmock . It a good practice to mock/stub HTTP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Do you think we should solve it in current tak, or make a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared it's not urgent, obviously. I would definitely take a look though at how long would take to mock this network related stuff. I'm worried that requiring a server on 8000 is not very reliable. For example, what will happen if I'm already debugging some app on 8000? It'll wait for 100 seconds each test, not what I would expect.
shcheklein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions to clarify, few improvement suggestions
|
@shcheklein @efiop please rereview |
|
@pared Could you please rebase on top of |
|
@efiop no problem, had to squash it anyway :) |
shcheklein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm! Minor suggestion: def _validate_existing_file_size(self, bytes_transferred, target_file) - rename target_file -> partial_file (or just path). Otherwise it's still confusing a little bit.
efiop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you! 🚀
|
How do I use this feature? I'm downloading a lot of large files using DVC, and I have to start all over if my laptop goes into sleep mode. |
The feature is no longer available. It was dropped in #2275 . Feel free to open a new feature request for supporting it |
Fixes #108