Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement compress and decompress operators #3443

Merged
merged 5 commits into from
Aug 10, 2023

Conversation

dominiklohmann
Copy link
Member

@dominiklohmann dominiklohmann commented Aug 7, 2023

compress [--level <level>] <format> and its dual decompress <format> make it easy to work with compressed data in pipeline directly.

This comes with support for all formats that Apache Arrow ships with and that support streaming compression and decompression. As of Arrow 12.0, the supported formats are brotli, bz2, gzip, lz4, and zstd.

Tasks

@dominiklohmann dominiklohmann added feature New functionality operator Source, transformation, and sink labels Aug 7, 2023
The (intentionally) undocumented `discard` operator is mostly useful for
benchmarking. This extends it to also accept bytes as input.
@dominiklohmann dominiklohmann force-pushed the topic/compress-and-decompress branch 2 times, most recently from 7476679 to bfb1ea7 Compare August 8, 2023 21:14
`compress [--level <level>] <format>` and its dual `decompress [--level <level>]
<format>` make it easy to work with compressed data in pipeline directly.

This comes with support for all formats that Apache Arrow ships with and that
support streaming compression and decompression. As of Arrow 12.0, the supported
formats are `brotli`, `bz2`, `gzip`, `lz4`, and `zstd`.
@dominiklohmann dominiklohmann marked this pull request as ready for review August 8, 2023 21:15
@dominiklohmann dominiklohmann requested a review from a team August 8, 2023 21:16
Copy link

@Dakostu Dakostu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some quick compression-decompression tests with every codec and the operators are doing their job.

I say, let's ship it.

Of course I left some minor comments.

@dominiklohmann dominiklohmann merged commit 409d094 into main Aug 10, 2023
@dominiklohmann dominiklohmann deleted the topic/compress-and-decompress branch August 10, 2023 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality operator Source, transformation, and sink
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants