Skip to content

Conversation

@isidentical
Copy link
Contributor

@isidentical isidentical commented Jan 21, 2021

All trees

Previously implemented

  • s3
  • ssh
  • local

Implemented in this PR

  • azure
  • gdrive
  • gs
  • hdfs
  • oss
  • webhdfs
  • webdav / webdavs?
  • http / https?

@isidentical isidentical changed the title [WIP] tree: Implement upload_fobj protocol to all trees tree: Implement upload_fobj protocol to all trees Jan 22, 2021
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, information is confusing, probably they forgot to update the comment, or it should be interpreted in some other way. I see for example, that rclone has 8MB - https://forum.rclone.org/t/google-drive-and-optimal-drive-chunk-size/1186/11 ... and people mention 256M when they deal with very large files. I think it would be a good option to have, and we might indeed consider making it smaller by default to reduce memory pressure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use 64 MiB on all other providers, though I think there is no way (as of now) to pass the chunk_size into MediaIoBaseUpload. Maybe we should consider adding it as a feature to pydrive2?

@isidentical isidentical requested a review from a team January 22, 2021 13:10
@efiop
Copy link
Contributor

efiop commented Jan 25, 2021

Whoa, something strange is going on with the tests. I don't think it is related to your PR.

@efiop
Copy link
Contributor

efiop commented Jan 25, 2021

@isidentical Fixed the issue in #5333 . Please rebase.

@isidentical isidentical force-pushed the azure-multipart branch 2 times, most recently from 799b013 to ce5579a Compare January 26, 2021 08:22
Comment on lines +62 to 64

# Needed for some providers, and http open()
CHUNK_SIZE = 64 * 1024 * 1024 # 64 MiB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it is now only used in http, so I guess no need to have it in the base class.

Suggested change
# Needed for some providers, and http open()
CHUNK_SIZE = 64 * 1024 * 1024 # 64 MiB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in the open() call of _transfer_file()

Co-authored-by: Ruslan Kuprieiev <kupruser@gmail.com>
Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! 🔥

@efiop efiop merged commit a5e19ec into treeverse:master Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants