Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop Seek or input size knowledge requirements #7

Merged
merged 3 commits into from
Aug 3, 2022

Conversation

AlexTMjugador
Copy link
Member

Requiring input byte sources to implement the Seek trait is onerous for end-users, as most programs that generate DEFLATE streams do not impose seekability requirements. In the Unix world, it's common to pipe the output of a program as an input for a compression program, which is a non-seekable data source. In addition, achieving input seekability is often non-practical in network scenarios due to buffering resource requirements and other reasons.

By being smarter about how we use a sliding window, we can drop seekability and exact input size knowledge requirements from the API exposed by this crate, making it readily applicable for even more usage scenarios, without affecting compression. A high-level overview of the new technique is given through code comments.

A downside I can see of this change is that it requires a sliding window ZOPFLI_MASTER_BLOCK_SIZE bytes (≈ 1 MiB) bigger than before due to the need to temporarily store an additional uncompressed master block in memory. However, I think that the better API design makes this trade-off worth it: the additional memory usage is insignificant for the kind of computers that are most likely to run a compression algorithm as demanding as Zopfli anyway.

This is also a breaking change, as the compress_seekable function was removed from the public API. As a minor note, the sliding window for the deflate function is no longer allocated in the stack, which avoids running out of stack memory in practical scenarios I've encountered: allocating ZOPFLI_MASTER_BLOCK_SIZE bytes on the stack is a lot, and the cost of calling the memory allocator pales in comparison to actually compressing data with Zopfli.

@AlexTMjugador AlexTMjugador added the enhancement New feature or request label Jul 24, 2022
@AlexTMjugador AlexTMjugador added this to the v0.7.0 milestone Jul 24, 2022
@mqudsi
Copy link
Collaborator

mqudsi commented Jul 25, 2022

Thanks for putting in all this work, @AlexTMjugador. I'm all for dropping the Seek or file size oracle requirement here, but I do have a couple of questions about the approach you took, if you don't mind.

  • If we magically knew whether or not a block was the last one at the time of reading it, that would make the code a bit simpler, no? Because we can easily implement a fake "peek" by reading one byte more than we want the very first time, then so long as we read in a full block each time thereafter we can know that it's not the end of the input stream (because we have at least one byte more than what we're going to handle).
  • If we do just keep your current approach, I think we can at least avoid the 2x memory reads/writes penalty if we use a ring buffer approach instead of moving the previous buffer to the window location each time and the data we just read to the previous block location. If you don't want to do a ring buffer due to all the index juggling, two separate ZOPFLI_WINDOW_SIZE + ZOPFLI_MASTER_BLOCK_SIZE buffers (previous and current) that you swap between each time means you increase the memory requirement by ZOPFLI_WINDOW_SIZE but you avoid moving the currently read bytes to the previous block index each time. (Zopfli is already insanely slow, anything we can do to avoid adding on to that would be nice.)

Not that it's a problem but note that the changes are already breaking because you drop insize from the public compress() function; the dropping of compresss_seekable is actually a non-issue since we could just keep it until a breaking release (although there would be no benefit to using it anymore).

@AlexTMjugador
Copy link
Member Author

AlexTMjugador commented Jul 25, 2022

Thank you for the review! 😄 I will try to resolve your questions.

If we magically knew whether or not a block was the last one at the time of reading it, that would make the code a bit simpler, no?

Yes, it would. The only thing preventing us from compressing blocks as they are read without any further ado is knowing whether that block is the last one in the stream or not, as we can't seek back to write the proper last block flag after the fact.

Because we can easily implement a fake "peek" by reading one byte more than we want the very first time, then so long as we read in a full block each time thereafter we can know that it's not the end of the input stream (because we have at least one byte more than what we're going to handle).

This is a very interesting idea! It requires handling that extra byte with care, so implementing it is not dead simple, but I've done so in the latest commit I've pushed. I think this technique provides for a far more elegant solution to the problem, without any additional memory usage (technically, it uses an extra byte to store what I've called the "sentinel byte", but this is offset by the simpler implementation with less code and local variables).

Requiring input byte sources to implement the `Seek` trait is onerous
for end-users, as most programs that generate DEFLATE streams do not
impose seekability requirements. In the Unix world, it's common to
pipe the output of a program as an input for a compression program,
which is a non-seekable data source. In addition, achieving input
seekability is often non-practical in network scenarios due to buffering
resource requirements and other reasons.

By being smarter about how we use a sliding window, we can drop
seekability and exact input size knowledge requirements from the API
exposed by this crate, making it readily applicable for even more
usage scenarios, without affecting compression. A high-level overview
of the new technique is given through code comments.

A downside I can see of this change is that it requires a sliding window
ZOPFLI_MASTER_BLOCK_SIZE bytes (≈ 1 MiB) bigger than before due to the
need to temporarily store an additional uncompressed master block in
memory. However, I think that the better API design makes this trade-off
worth it: the additional memory usage is insignificant for the kind
of computers that are most likely to run a compression algorithm as
demanding as Zopfli anyway.

This is also a breaking change, as the `compress_seekable` function was
removed from the public API. As a minor note, the sliding window for
the `deflate` function is no longer allocated in the stack, which avoids
running out of stack memory in practical scenarios I've encountered:
allocating ZOPFLI_MASTER_BLOCK_SIZE bytes on the stack is a lot, and the
cost of calling the memory allocator pales in comparison to actually
compressing data with Zopfli.
The idea for the new compression algorithm was suggested by @mqudsi.

I've replaced the several_master_blocks.bin test file with a more
realistic one extracted from the well-known Calgary dataset, as an
all-zeros data may not be enough to expose some implementation problems.
Other test files separate words in their names by dashes instead of
underscores. Let's use them in the new test files too.
@AlexTMjugador AlexTMjugador force-pushed the refactor/drop-seek-requirement branch from b8c756a to 47d2ae7 Compare July 25, 2022 13:17
@AlexTMjugador
Copy link
Member Author

I've successfully tested the changes both manually and with cargo test. I also can't see anything that may be improved right now, so I'm merging this. If anyone has something to discuss about this PR, I'd be glad to hear it 😄

@AlexTMjugador AlexTMjugador merged commit 318bf34 into main Aug 3, 2022
@AlexTMjugador AlexTMjugador deleted the refactor/drop-seek-requirement branch August 3, 2022 10:47
@mqudsi
Copy link
Collaborator

mqudsi commented Aug 3, 2022

Thanks for your work on this!

@AlexTMjugador AlexTMjugador mentioned this pull request Aug 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants