Stream the retrieved S3 data to disk on Get #11

mikeauclair · 2025-03-18T14:14:54Z

Given the underlying gocache struct takes an io.Reader as the body, pass the io.ReadCloser returned from the S3 SDK directly to it to avoid reading all the entries directly into memory. Also adjust the callers of GetData to close the returned io.ReadCloser.

creachadair

Mechanically this change is fine, but I will note that it won't actually have the desired effect: The cache package accepts a reader there because that's what the prototype implementation does, but there is no practical way for it to actually stream data, so it's still going to read the whole thing into memory anyway. And the go tool will do the same, for roughly the same reason.

That said—I have no real objection to making the behaviour follow the interface. If we're going to do that, though, let's fix up the usage rather than changing the support library signature.

creachadair · 2025-03-18T15:38:07Z

lib/s3util/s3util.go

-	}
-	defer rc.Close()
-	return io.ReadAll(rc)
+func (c *Client) GetData(ctx context.Context, key string) (io.ReadCloser, error) {


Let's not change the signature of this method—if callers want the semantics of a reader, I think they should switch to calling Get directly.

mikeauclair · 2025-03-18T15:54:01Z

Can you say a little bit more about where in the chain the body would be read into memory regardless?

The protocol encodes all requests, data, and responses as JSON, using the standard encoder (on both ends). So you have to have the entire value in-hand before it can be written out anyway.

In theory one could write a custom encoder/decoder that turns the reader into an incremental writer for a base64 stream (with the JSON string framing, etc.), and reverses that on the other end, but that wouldn't currently be worthwhile anyway as the Go tool will pull the whole thing into memory anyway.

The protocol's use of a Reader here is aspirational, and the protocol wasn't properly-designed to support streaming. A better choice would have been to push the value as binary, since the caller provides the length anyway—but that wasn't how it shook out.

creachadair · 2025-03-18T16:54:13Z

Apparently I failed at quoting your comment to reply to it, sorry.

creachadair · 2025-03-18T17:07:42Z

The CI failure is not related to your change, if you rebase on main it should pass.

mikeauclair · 2025-03-18T17:13:22Z

Can you say a little bit more about where in the chain the body would be read into memory regardless?

The protocol encodes all requests, data, and responses as JSON, using the standard encoder (on both ends). So you have to have the entire value in-hand before it can be written out anyway.

In theory one could write a custom encoder/decoder that turns the reader into an incremental writer for a base64 stream (with the JSON string framing, etc.), and reverses that on the other end, but that wouldn't currently be worthwhile anyway as the Go tool will pull the whole thing into memory anyway.

The protocol's use of a Reader here is aspirational, and the protocol wasn't properly-designed to support streaming. A better choice would have been to push the value as binary, since the caller provides the length anyway—but that wasn't how it shook out.

Totally understood that the go bin will have to read the whole payload into memory, and that Puts necessarily have to have the data, but it looks like Get (which returns the path to the FS), doesn't actually need to have the payload in memory at all, as long as (in the "local miss but S3 hit", it can write it to the FS and get the path - am I missing something there?

For context, we're using this plugin in a context where we're seeing a bunch of CPU throttling in kubernetes, and I'm trying to drive down GC-driven CPU utilization on Gets (which are the majority case in our situation) to avoid having to decrease GOMAXPROCS for the consuming go process (using my fork reduces CPU usage in my case by ~50%)

creachadair · 2025-03-18T17:21:40Z

Totally understood that the go bin will have to read the whole payload into memory, and that Puts necessarily have to have the data, but it looks like Get (which returns the path to the FS), doesn't actually need to have the payload in memory at all, as long as (in the "local miss but S3 hit", it can write it to the FS and get the path - am I missing something there?

No, I think that's right. And the disk cache implementation (already) handles that case, so it certainly won't break anything.

…y call Get

…disk

lib/gobuild/gobuild.go

lib/s3util/s3util.go

creachadair

Thanks for your patience.

creachadair · 2025-03-18T20:47:51Z

This is now tagged at and after v0.0.22.

stream the retrieved get data to disk

929cfe6

creachadair reviewed Mar 18, 2025

View reviewed changes

mikeauclair added 2 commits March 18, 2025 17:25

wind back the API change and just have the get-from-s3 branch directl…

9884402

…y call Get

Merge remote-tracking branch 'upstream/main' into stream-get-data-to-…

6b0d04d

…disk

creachadair reviewed Mar 18, 2025

View reviewed changes

lib/gobuild/gobuild.go Show resolved Hide resolved

wire through contentlength

85f971f

creachadair reviewed Mar 18, 2025

View reviewed changes

lib/s3util/s3util.go Outdated Show resolved Hide resolved

dereference

060be1d

creachadair approved these changes Mar 18, 2025

View reviewed changes

creachadair merged commit 7346bbd into tailscale:main Mar 18, 2025
1 check passed

mikeauclair deleted the stream-get-data-to-disk branch March 20, 2025 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stream the retrieved S3 data to disk on Get #11

Stream the retrieved S3 data to disk on Get #11

Uh oh!

mikeauclair commented Mar 18, 2025

Uh oh!

creachadair left a comment

Uh oh!

creachadair Mar 18, 2025

Uh oh!

mikeauclair commented Mar 18, 2025 •

edited by creachadair

Loading

Uh oh!

creachadair commented Mar 18, 2025

Uh oh!

creachadair commented Mar 18, 2025

Uh oh!

mikeauclair commented Mar 18, 2025

Uh oh!

creachadair commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

creachadair left a comment

Uh oh!

Uh oh!

creachadair commented Mar 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Stream the retrieved S3 data to disk on Get #11

Stream the retrieved S3 data to disk on Get #11

Uh oh!

Conversation

mikeauclair commented Mar 18, 2025

Uh oh!

creachadair left a comment

Choose a reason for hiding this comment

Uh oh!

creachadair Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

mikeauclair commented Mar 18, 2025 • edited by creachadair Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

creachadair commented Mar 18, 2025

Uh oh!

creachadair commented Mar 18, 2025

Uh oh!

mikeauclair commented Mar 18, 2025

Uh oh!

creachadair commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

creachadair left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

creachadair commented Mar 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mikeauclair commented Mar 18, 2025 •

edited by creachadair

Loading