Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting Chunking in ZStream.fromIterator #4834

Merged
merged 5 commits into from
Apr 12, 2021

Conversation

adamgfraser
Copy link
Contributor

Sometimes we want to pull values from an Iterator in chunks for greater efficiency.

/**
* Creates a stream from an iterator that may throw exceptions.
*/
def fromIterator[A](iterator: => Iterator[A], maxChunkSize: Int = 1): ZStream[Any, Throwable, A] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up that adding a default parameter breaks backwards compatibility, like here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think since we are not guaranteeing binary compatibility at this point that is okay. I think this is the right API and it should be a very easy change, but open to other perspectives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also add another method with 2 parameters, and the 1 parameter one can delegate to this one.

@adamgfraser
Copy link
Contributor Author

@iravid Can I get a review on this?

.fold(
Pull.fail,
iterator =>
ZIO.effect {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you know the chunk size, it's probably more efficient to pre-allocate an array, fill it, convert to Chunk, and take the number of elements actually read (which is <= the chunk size).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hinting the chunk builder with the desired size will have the same effect and avoid dealing with the array directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the one watch out here is if the user uses an extremely large maxChunkSize. For example, if I say "I'm not sure how many elements I have, but just read them all as one Chunk" and do fromIterator(iterator, Int.MaxValue) I am going to blow up with an OutOfMemoryError because it is going to try to actually allocate an Array with Int.MaxValue size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My POV on this is that if it says "chunk" on the label, it's eager and must fit in memory. So I kinda think it's a fair tradeoff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is right but just to push on this that means that ZStream.fromIterator(Iterator(1, 2, 3, 4, 5), Int.MaxValue) will blow up. I think definitely if the Iterator is actually too large to fit in memory it has to fail, but failing when the Iterator is small is a little counterintuitive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Maybe we could call it a chunkHint and that way we would be free to limit it to some reasonable value (our default chunk size, e.g.).

if (maxChunkSize <= 1) {
if (iterator.isEmpty) Pull.end else Pull.emit(iterator.next())
} else {
val builder = ChunkBuilder.make[A]()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val builder = ChunkBuilder.make[A]()
val builder = ChunkBuilder.make[A](maxChunkSize)

iravid
iravid previously approved these changes Apr 6, 2021
Copy link
Member

@iravid iravid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with the sizeHint changes!

@adamgfraser adamgfraser merged commit 6f2dfc6 into zio:master Apr 12, 2021
@adamgfraser adamgfraser deleted the fromiterator branch April 12, 2021 23:38
@deusaquilus
Copy link

Wow, you guys really care about getting the naming right. I’m impressed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants