Skip to content

Conversation

@pared
Copy link
Contributor

@pared pared commented Jan 28, 2019

Fixes #108

@pared pared force-pushed the 108 branch 2 times, most recently from d1b9126 to 314629a Compare January 28, 2019 22:02
@pared pared requested a review from efiop January 29, 2019 13:21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp_file creates a random file name. See:

return fname + '.' + str(uuid.uuid4())

So you won't be able to find your previous unfinished download.
Your test doesn't catch this bug because you patch return value of tmp_file and you shouldn't do that. You should use something like fname + '.part' instead of self.tmp_file(fname).

Copy link
Contributor Author

@pared pared Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think I will be able to find tmp file. See _existing_tmp(target_file):

            targt_basename = os.path.basename(target_file)
            if targt_basename in file:
                return file

I check if target_file is in name of file and, if so, it returns it. I have to admit that this is what I wanted to ask: Will such check be enough for us? It surely narrows usage of resume option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record: discussed this during our meeting.

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Could you please introduce --continue flag in this PR to make this behaviour non-default as we've discussed previously? Also a minor comment in the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check return value as well, just to be sure. Same in the main() above.

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A few more comments down below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about using -r here, since it is used for -r|--remote in other commands. How about we only leave a long option --resume for it for now?

dvc/project.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not pass resume here, but rather pass it to stage.run() below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: why do we use __ (double underscore) instead of _ (single underscore) for private methods in some places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason for that, ill fix that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: present -> existing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably do fs.flush() (or something similar in python) right after that to ensure CHUNK SIZE is written asap and file check won't fail next time we run it with --resume

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be think about doing a smaller chunk size, something reasonable so OS can more or less atomically flush it, it should be probably power of 2 as well - some number of fs blocks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be I'm missing something but why do we check the target_file size here, not the partial one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, sorry!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it used in tests?

Copy link
Contributor Author

@pared pared Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, problem with static server, is that even that we close and shutdown socket, then server, the port is still binded for some unspecified amount of time. We can solve it two ways:

  • each time we use server draw different port
  • try to bind it few times

Me and @efiop decided to go with the second approach.
Do you think I should change that?

Copy link
Contributor

@shcheklein shcheklein Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are libraries that help you stub/mock HTTP API, some thing similar to https://github.com/bblimke/webmock . It a good practice to mock/stub HTTP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Do you think we should solve it in current tak, or make a new one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared it's not urgent, obviously. I would definitely take a look though at how long would take to mock this network related stuff. I'm worried that requiring a server on 8000 is not very reliable. For example, what will happen if I'm already debugging some app on 8000? It'll wait for 100 seconds each test, not what I would expect.

Copy link
Contributor

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions to clarify, few improvement suggestions

@pared
Copy link
Contributor Author

pared commented Feb 3, 2019

@shcheklein @efiop please rereview

@efiop
Copy link
Contributor

efiop commented Feb 3, 2019

@pared Could you please rebase on top of master? Had to push some style changes once again, sorry for the inconvenience. 🙁

@pared
Copy link
Contributor Author

pared commented Feb 3, 2019

@efiop no problem, had to squash it anyway :)

Copy link
Contributor

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! Minor suggestion: def _validate_existing_file_size(self, bytes_transferred, target_file) - rename target_file -> partial_file (or just path). Otherwise it's still confusing a little bit.

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you! 🚀

@efiop efiop changed the title [WIP] Resume import Resume import Feb 4, 2019
@efiop efiop merged commit 3d07c0a into treeverse:master Feb 4, 2019
@pared pared deleted the 108 branch February 27, 2019 12:22
@jpaasen
Copy link

jpaasen commented Jul 4, 2022

How do I use this feature? I'm downloading a lot of large files using DVC, and I have to start all over if my laptop goes into sleep mode.

@daavoo
Copy link
Contributor

daavoo commented Jul 4, 2022

How do I use this feature? I'm downloading a lot of large files using DVC, and I have to start all over if my laptop goes into sleep mode.

The feature is no longer available. It was dropped in #2275 . Feel free to open a new feature request for supporting it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants