[WiP] Rewrite zstd decoder to use an API that supports multiple frames (fix issue #3008) #3021
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Compressed zstd data is composed of one or more frames. python-zstandard has several APIs that only support decoding one frame. This is not useful for urllib3 because it will result in decompressed data that is truncated to 1048576 bytes.
This patch changes the zstd decoder to use an API that is known to be able to decompress multiple frames at a time. It uses ZstdDecompressionReader, which takes any object that fulfills the interface of io.RawBase. The approach here is to have ZstdDecoder create an io.BytesIo object, feed data to it, and have python-zstandard consume data when it can.
With this change, two zstd tests are broken:
test_decode_zstd_incomplete
seems to be an incorrect test because it assumes that slicing the last element will cause zstd decompression to fail, but the current compressed package doesn't behave that way. TODO: I need to change the test to create data with checksums and slice in a way that breaks the checksum validation.test_chunked_decoding_zstd
is expecting an exception to be thrown but it isn't. This seems to be because theHTTPResponse.read()
is misbehaving. The issue is that there is a loopwhile len(self._decoded_buffer) < amt and data:
which drains enough data to grab headers. But this test is draining the full response buffer and thus causing_fp
to close, soZstdDecoder.flush()
is never called. TODO: figure out how to fixHTTPResponse.read()
soZstdDecoder.flush()
is called in this case.This change also adds a new test
test_decode_zstd_multiple_frames
which is a work-in-progress. I'd like to find a way to generate this data programmatically instead of adding the filetext.txt.zstd
, but I haven't been able to do it so far.