New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Zstandard compression for sinks #2302
Comments
We definitely want to build on our compression feature in the near future! I think giving folks the option can be done similar to how we do encoding. |
For whoever wants to tackle this: I think adding an |
@bruceg before we begin work, we should identify sinks where this is compatible. |
So it looks like |
Just to be clear, I don't think we have to implement anything for kafka beyond passing the configs down and enabling the relevant features on the crate. |
The Though, compression.level is still zlib-specific, where zstd has range 1-21 and default at 3. cf #3032 regarding other algorithms' levels. |
n00b here - Does vector support |
Hey @venkat-sneller ! That's correct, we only support |
@hdhoang Sorry for the so late pinging. Did you try to send a Zstd-related PR? If no, could you please try to do it? Thanks in advance! |
no worry! i'll try it again this month. (currently we still do td-agent + exec zstd as a flush, fwiw) |
Related: #14349 |
Has there been any traction on this? I currently have multiple chained vector instances. One of them is running with the HTTP Server sink using |
#17371 added support for zstd compression to a good number of sinks (including |
Vector mostly only supports gzip compression in its sinks, which is to say, a compressor specified in 1990 based on already 20 year old methods, that performs string deduplication over a tiny 32KiB window. Deflate is neither quick to compress nor decompress, has tragic ratios for bulk data, and has been roundly obsoleted in every metric except mass adoption by numerous compressors over the past 30 years.
Of those modern compressors, LZMA and Zstandard have some level of adoption and are fit for general use, but for logs analysis in particular, Zstandard hits a massive sweet spot with state of the art compression ratios combined with best in class decompression speed.
It's possible to get 20x compression of logs with Zstandard and decompress those logs for analysis at almost 2GiB/sec with a single thread. This allows a 20 core machine (theoretically) to process 40 GiB/s of decompressed logs while saturating an underlying 2 GiB/s NVMe storage device (assuming no other work except decompression was being performed).
LZMA is competitive with Zstandard in ratio and overall performance, but Zstandard still enjoys a significant lead in terms of absolute decompression performance, which for me is a major deciding factor in long term logs storage.
This is a request to consider modern gzip alternatives, or if there is no time for that, perhaps consider only my suggestion to go the Zstandard route. ;)
Thanks
The text was updated successfully, but these errors were encountered: