-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 'fs.io.readInputStreamGeneric' overallocation of underlying buffers #3318
Fix 'fs.io.readInputStreamGeneric' overallocation of underlying buffers #3318
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Unfortunately I do not think this is the right way to solve this problem. The concern is what if there is actually not enough data available on the InputStream
at a particular moment, but it is not yet closed. E.g. consider interacting with an external process over stdin/stdout.
Have you seen this issue?
I proposed a couple different ideas to solve it in #3106 (comment).
Ah, I'm sorry, I missed your edit. This is interesting :)
Like current implementation
fs2.Stream
chunks are published with each successfulInputStream#read
invocation but may share underlying array buffer.
I guess my only concern with this approach is that a single To avoid that would require allocating an appropriately sized array and copying into that, as suggested in #3106 (comment). |
@armanbilge, thanks for your input! The issue I'm trying to resolve is that we have network data in size of megabytes and, naturally, we assume that buffer of 1MiB would be good enough. Only problem is that underlying What's worse, it is really hard to diagnose: at first everything works, maybe using more memory than expecting (it's JVM after all), then suddenly there is OOM error, then you spend some time to find a leak, except there's no leaks, and only finally you find that memory is full of To fix this currently we need to put custom
I think situation of 'short data streams' if not less frequent then at least much more detectable than issue above. To replicate issue above you will need 1024 unconsumed streams of 1KiB each and I would say that it is more question if wrongly guessing median data size. Also, use of copy introduces some questions:
I would argue that this implementation tries its best to translate user intention: 'write data from stream into continuous memory regions of |
96e2787
to
aea8f85
Compare
ae7ced1
to
71bd067
Compare
71bd067
to
4e10816
Compare
I like this approach and I'm curious if we should do something similar for TCP sockets. |
Current implementation of
fs2.io.readInputStream
allocates new array ofchunkSize
size for every invocation ofInputStream#read(..)
. The problem is thatread
spec does not requireInputStream
implementations to write provided buffer fully or until stream exhaustion.This leads to situation where if
readInputStream
chunk size is big (megabytes) and underlying input stream innner 'chunks' are small (bytes or kilobytes), returnedStream[F, Byte]
are very 'sparse' where in every chunk only small amount of allocated capacity is actually used.It could be shown easily by combining
fs2.io.toInputStream
withfs2.io.readInputStream
.This PR fixes this by reusing leftovers of allocated
Array[Byte]
for consequent 'InputStream' reads until either buffer is fully written or stream is exhausted. Like current implementationfs2.Stream
chunks are published with each successfulInputStream#read
invocation but may share underlying array buffer.