Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WiP] Rewrite zstd decoder to use an API that supports multiple frames (fix issue #3008) #3021

Closed
wants to merge 1 commit into from

Conversation

grossag
Copy link

@grossag grossag commented May 8, 2023

Compressed zstd data is composed of one or more frames. python-zstandard has several APIs that only support decoding one frame. This is not useful for urllib3 because it will result in decompressed data that is truncated to 1048576 bytes.

This patch changes the zstd decoder to use an API that is known to be able to decompress multiple frames at a time. It uses ZstdDecompressionReader, which takes any object that fulfills the interface of io.RawBase. The approach here is to have ZstdDecoder create an io.BytesIo object, feed data to it, and have python-zstandard consume data when it can.

With this change, two zstd tests are broken:

  • test_decode_zstd_incomplete seems to be an incorrect test because it assumes that slicing the last element will cause zstd decompression to fail, but the current compressed package doesn't behave that way. TODO: I need to change the test to create data with checksums and slice in a way that breaks the checksum validation.
  • test_chunked_decoding_zstd is expecting an exception to be thrown but it isn't. This seems to be because the HTTPResponse.read() is misbehaving. The issue is that there is a loop while len(self._decoded_buffer) < amt and data: which drains enough data to grab headers. But this test is draining the full response buffer and thus causing _fp to close, so ZstdDecoder.flush() is never called. TODO: figure out how to fix HTTPResponse.read() so ZstdDecoder.flush() is called in this case.

This change also adds a new test test_decode_zstd_multiple_frames which is a work-in-progress. I'd like to find a way to generate this data programmatically instead of adding the file text.txt.zstd, but I haven't been able to do it so far.

Compressed zstd data is composed of one or more frames. python-zstandard has
several APIs that only support decoding one frame. This is not useful for
urllib3 because it will result in decompressed data that is truncated to
1048576 bytes.

This patch changes the zstd decoder to use an API that is known to be able to
decompress multiple frames at a time. It uses ZstdDecompressionReader, which
takes any object that fulfills the interface of io.RawBase. The approach here
is to have ZstdDecoder create an io.BytesIo object, feed data to it, and have
python-zstandard consume data when it can.
@grossag
Copy link
Author

grossag commented May 8, 2023

I'm going to close this because #3022 is a better solution than mine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant