-
Notifications
You must be signed in to change notification settings - Fork 14
Stream the retrieved S3 data to disk on Get #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream the retrieved S3 data to disk on Get #11
Conversation
creachadair
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mechanically this change is fine, but I will note that it won't actually have the desired effect: The cache package accepts a reader there because that's what the prototype implementation does, but there is no practical way for it to actually stream data, so it's still going to read the whole thing into memory anyway. And the go tool will do the same, for roughly the same reason.
That said—I have no real objection to making the behaviour follow the interface. If we're going to do that, though, let's fix up the usage rather than changing the support library signature.
lib/s3util/s3util.go
Outdated
| } | ||
| defer rc.Close() | ||
| return io.ReadAll(rc) | ||
| func (c *Client) GetData(ctx context.Context, key string) (io.ReadCloser, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not change the signature of this method—if callers want the semantics of a reader, I think they should switch to calling Get directly.
The protocol encodes all requests, data, and responses as JSON, using the standard encoder (on both ends). So you have to have the entire value in-hand before it can be written out anyway. In theory one could write a custom encoder/decoder that turns the reader into an incremental writer for a base64 stream (with the JSON string framing, etc.), and reverses that on the other end, but that wouldn't currently be worthwhile anyway as the Go tool will pull the whole thing into memory anyway. The protocol's use of a Reader here is aspirational, and the protocol wasn't properly-designed to support streaming. A better choice would have been to push the value as binary, since the caller provides the length anyway—but that wasn't how it shook out. |
|
Apparently I failed at quoting your comment to reply to it, sorry. |
|
The CI failure is not related to your change, if you rebase on main it should pass. |
Totally understood that the go bin will have to read the whole payload into memory, and that Puts necessarily have to have the data, but it looks like Get (which returns the path to the FS), doesn't actually need to have the payload in memory at all, as long as (in the "local miss but S3 hit", it can write it to the FS and get the path - am I missing something there? For context, we're using this plugin in a context where we're seeing a bunch of CPU throttling in kubernetes, and I'm trying to drive down GC-driven CPU utilization on Gets (which are the majority case in our situation) to avoid having to decrease GOMAXPROCS for the consuming |
No, I think that's right. And the disk cache implementation (already) handles that case, so it certainly won't break anything. |
creachadair
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience.
|
This is now tagged at and after v0.0.22. |
Given the underlying gocache struct takes an io.Reader as the body, pass the io.ReadCloser returned from the S3 SDK directly to it to avoid reading all the entries directly into memory. Also adjust the callers of GetData to close the returned io.ReadCloser.