Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose zstd compression #1326

Merged
merged 1 commit into from Aug 29, 2022
Merged

expose zstd compression #1326

merged 1 commit into from Aug 29, 2022

Conversation

heinerstilz
Copy link
Contributor

@heinerstilz heinerstilz commented Aug 24, 2022

Database name

PostgreSQL

Pull request description

Describe what this PR fix

The zstd compression library is already supported internally in WAL-G. It seems to have been disabled because of a data corruption bug in zstd that has meanwhile been resolved (DataDog/zstd#39).
It has been mentioned that it might be good to bring zstd back (see discussion on #300).

This PR

  • exposes the zstd compression option to the user
  • upgrades to a more recent version
  • adjusts some docs.

In our measurements with PostgreSQL and WAL-G on a ~270 GB dataset, zstd was significantly less CPU intensive than brotli:
1.6x less real time and 1.25x less user time backing up/compressing with WALG_UPLOAD_CONCURRENCY=8
1.6x less real time and 2.8x less user time restoring/decompressing.
Compression ratio was around 2.4x for both.

Another measurement by a different party that seems to support zstd's performance advantage over brotli: https://peazip.github.io/fast-compression-benchmark-brotli-zstandard.html#:~:text=Comparing%20Brotli%20and%20Zstandard%20extraction,twice%20as%20fast%20as%20Brotli.

Please provide steps to test this PR

Which of the tests makes most sense to be adapted for zstd?

* expose upgraded zstd compressor

* mention zstd in docs

* zstd 1.5.2 + patches

* remove unused lines in go.sum
@heinerstilz heinerstilz requested a review from a team as a code owner August 24, 2022 14:03
@x4m
Copy link
Collaborator

x4m commented Aug 24, 2022

FWIW I saw one report of brotli corruption: a file properly decrypted by gpg, but unexctractable. Probably, due to cosmic rays or something.
Zstd is great codec, best on Pareto frontier. It would be very good to expose it. But...

@heinerstilz do you validate you backups? Will you use Zstd? We need someone who will warn us if something is still wrong with Zstd.

@heinerstilz
Copy link
Contributor Author

Implementing our backup solution for PostgreSQL, the idea is to go with zstd unless we find an issue with it.
We'll likely soon have a large volume of frequent Postgres backups in production. Usable restores are needed on a regular basis.
So should zstd still corrupt data, there is quite a good chance we'll notice (and of course flag it here).

@x4m x4m merged commit 25616a7 into wal-g:master Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants