Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: inline DecodeBit function for improved performance #55

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

orisano
Copy link

@orisano orisano commented Jun 17, 2023

Description:

With immense respect for the work done so far, I am writing this pull request to propose an optimization that would enhance the performance of the github.com/bodgit/sevenzip project, which relies on the lzma module from github.com/ulikunitz/xz.

Currently, I am working on the decompression of large-scale 7z archives (using lzma2) in the GB range. As part of this effort, I have noticed that the repetitive calls to the DecodeBit function within the rangeDecoder module lead to frequent memory writes and potential performance bottlenecks.

To address this issue, I propose inlining the DecodeBit function, which would reduce the overhead of function calls and make efficient use of registers for intermediate states. By doing so, we can significantly improve the decompression performance of large-scale 7z archives.

The key advantages of this optimization include:

  • Efficient usage of registers to store intermediate states, resulting in fewer memory accesses.
  • Reducing function call overhead by eliminating the need to save and restore the caller's context.

During my local testing, I observed a notable performance improvement of approximately 19% in scenarios involving repetitive calls to DecodeBit. This suggests the potential benefits it can bring to the github.com/bodgit/sevenzip project, particularly when decompressing GB-sized 7z archives.

before:

goos: linux
goarch: amd64
pkg: github.com/bodgit/sevenzip
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkLargeLZMA2-8   	       1	400362537495 ns/op

after:

goos: linux
goarch: amd64
pkg: github.com/bodgit/sevenzip
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkLargeLZMA2-8   	       1	323613023997 ns/op

I kindly request your review and feedback on this pull request. Thank you.

@ulikunitz
Copy link
Owner

Many thanks for the PR. I will review it this week.

@orisano
Copy link
Author

orisano commented Jun 20, 2023

benchstat:

goos: linux
goarch: amd64
pkg: github.com/bodgit/sevenzip
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
             │  base.txt   │             xz#55.txt              │
             │   sec/op    │   sec/op    vs base                │
LargeLZMA2-8   10.775 ± 0%   8.704 ± 0%  -19.22% (p=0.000 n=10)

@orisano
Copy link
Author

orisano commented Jul 3, 2023

@ulikunitz How can I help?

@brancz
Copy link

brancz commented Jul 21, 2023

Just want to add, we just benchmarked this against our workload, and we even see a 25% improvement. Awesome work @orisano!

@ulikunitz
Copy link
Owner

Hi, please adapt the change for the rewrite branch as I suggested for the other changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants