Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat S3 Transfer Manager v2 GetObject/DownloadObject #2996

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

wty-Bryant
Copy link
Contributor

Implement v2 s3 transfer manager's GetObject/DownloadObject api bound to single union client which mimics normal service client's initialization and api call. User could now download object sequentially with GetObject or asynchronously with DownloadObject.

Test: passed unit test for GetObject and DownloadObject, passed integration test of uploading object via s3 client and validating its content after downloading through v2 transfer manager.

@wty-Bryant wty-Bryant requested a review from a team as a code owner January 31, 2025 21:58
@@ -16,7 +19,7 @@ type Options struct {
MultipartUploadThreshold int64

// Option to disable checksum validation for download
DisableChecksum bool
DisableChecksumValidation bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really more complicated than just a bool now with the recent updates to flex checksums. We should probably promote this into the new RequestChecksumCalculation enum we have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This client field was listed in SEP as boolean, but we can change that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I try to change that to ResponseChecksumValidation enum, but since s3 client in transfer manager is just an interface, we could not configure normal s3 checksum options like what we do for s3 service client, I think it's ok to just keep that bool which is only used to set request checksum mode

}

// Queue the next range of bytes to read.
ch <- getChunk{w: g.w, start: g.pos - g.offset, withRange: g.byteRange()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as far as I can tell, you're now doing concurrent downloads but you're buffering the entire object in memory before returning it to the caller - which is all well and good until someone goes to try and pull a multi-gigabyte object.

We're basically going to have to return a reader that "composes" the response bodies from inner getPart requests in order. That will allow you to stream the response back to the caller without that extra memory pressure. You might have a "scratch" buffer for that internal composition but naturally it won't be the full object size.

wty-Bryant and others added 23 commits March 12, 2025 14:55
* recommit transfer manager v2 files

* change pool to store slice pointer

* add integ test for putobject

* update go mod

* minor changes for v0.1.0

* update tags

* update tags

* update integ test dependency version

* change err var name

* update go mod

* change input/output type comment

* minor change

---------

Co-authored-by: Tianyi Wang <wty@amazon.com>

rebase from main

rebase branch from main
move transfer manager v2 integration tests to within module

move putobject integ test to transfer manager module
add getobject integ test
@wty-Bryant wty-Bryant force-pushed the feat-tmv2-getobject branch from dd3bcc7 to 510189e Compare March 12, 2025 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants