-
Notifications
You must be signed in to change notification settings - Fork 1.3k
remote: gs/s3: remove batch_exists #2375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
For the record: as we've discussed in PMs, while we are at it, let's try to get rid of batch_exists, since we now have connection pools and no longer need it. |
|
@pared Could you please check how |
Sure, Ill prepare some benchmark. |
|
@efiop Prepare repo script: timing script: Average execution time for 5 runs: EDIT Ill run more extensive test. |
|
This should be much worse for ssh. You dropped using many sftp per connection, which was a significant optimization. |
|
P.S. What was the point of batch exists for gs/s3 in the first place? We may drop only those while still having batch exists for ssh. |
|
@pared did you set |
|
Also looks like |
Can't we use pool there too, same way we do for pull?
It was mainly because of batch_exists for ssh.
It is enabled by default. |
Not sure what you mean, add a pool of sftp connections in each SSH connection? This will require special handling anyway, like |
|
@Suor Before the connection pool, we had a problem that we were limited by ~4 ssh connections, so we've started using batch_exists which multiplied those 4 by 8 sftp connections. With connection pool in place, we are reusing already opened connections, which is probably why the tests show small performance degradation. |
|
@efiop we are still limited by 4 SSH connections with or without pool, if SSH server has many CPUs then this should not be enough. |
|
Average execution time for 50 repeats: |
|
@Suor but because of the pool, workers can reuse already opened ssh connections instead of opening new ones for each batch and then multiplexing sftp. |
|
@Suor I've got 12. Ill limit and retry tests |
|
@pared then something looks wrong, CPUs are not used properly by current master. Maybe it's IO bound for you. |
|
@Suor maybe I should try with "real" case? Like ssh cache on different physical machine? |
|
@pared you can try, it will add a network lag at least, which might also make a number of threads more important. |
|
BTW, using |
It is known, not using |
|
Tried the same bench scenario, tried jobs=1 and 2. Looks like there is almost no difference, at least vs local ssh. |
dvc/remote/base.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove list() call here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
So the benches for me, ran Current - 50s, |
|
Ok, so tested with big latency by checking the status from SF to India. And got 31m vs 1h+(couldn't wait longer and the progress bar is broken on master). So looks like we do need sftp pool too 🙁 |
|
I've also noticed that it spends around 10minutes before even checking the remote, so there might be something else broken. Need to investigate. |
|
Ok, guys, how about we re-define |
|
@efiop Ill retrieve previous version of cache exists for SSH then. |
dad1a4e to
fadf47c
Compare
| progress_callback = ProgressCallback(len(checksums)) | ||
|
|
||
| def exists_with_progress(chunks): | ||
| return self.batch_exists(chunks, callback=progress_callback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've lost the progress bar :)
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
Signed-off-by: Ruslan Kuprieiev <ruslan@iterative.ai>
efiop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Have you followed the guidelines in our
Contributing document?
Does your PR affect documented changes or does it add new functionality
that should be documented? If yes, have you created a PR for
dvc.org documenting it or at
least opened an issue for it? If so, please add a link to it.
Fix #2373