Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SIMD helpers to speed up Rust get_sad #3050

Merged
merged 4 commits into from
Oct 31, 2022

Conversation

shssoichiro
Copy link
Collaborator

@shssoichiro shssoichiro commented Oct 26, 2022

The Rust version of get_sad is still used during ME for regions that are not of an exact block size.
This was consuming about 2.3% of runtime at speed 2.

This change causes the function to iterate over chunks of 4 that can easily be vectorized,
and then only the remainder will go through the slower non-vectorized SAD process.

After the change, the time spend in Rust get_sad
appears to be reduced by about 2/3, and the overall runtime of a speed 2 is decreased by about 1% on an AVX2 CPU.

@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2022

Codecov Report

Base: 86.36% // Head: 86.32% // Decreases project coverage by -0.03% ⚠️

Coverage data is based on head (42cfde1) compared to base (8417409).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head 42cfde1 differs from pull request most recent head e23c2c5. Consider uploading reports for the commit e23c2c5 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3050      +/-   ##
==========================================
- Coverage   86.36%   86.32%   -0.04%     
==========================================
  Files          83       83              
  Lines       33004    33019      +15     
==========================================
  Hits        28505    28505              
- Misses       4499     4514      +15     
Impacted Files Coverage Δ
src/dist.rs 98.98% <100.00%> (+0.03%) ⬆️
src/context/frame_header.rs 66.06% <0.00%> (-2.27%) ⬇️
src/context/transform_unit.rs 89.78% <0.00%> (-0.72%) ⬇️
src/rdo.rs 85.24% <0.00%> (-0.42%) ⬇️
src/context/partition_unit.rs 89.44% <0.00%> (-0.28%) ⬇️
src/encoder.rs 87.15% <0.00%> (-0.07%) ⬇️
src/me.rs 95.69% <0.00%> (+0.09%) ⬆️
src/asm/x86/lrf.rs 94.11% <0.00%> (+0.63%) ⬆️
src/scenechange/fast.rs 45.00% <0.00%> (+1.00%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

src/dist.rs Outdated Show resolved Hide resolved
@barrbrain
Copy link
Collaborator

I don't think the wide crate is necessary here. If you use arrays and write the kernel idiomatically, LLVM will do the right thing.

src/dist.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated
@@ -108,6 +108,7 @@ new_debug_unreachable = "1.0.4"
once_cell = "1.13.0"
av1-grain = { version = "0.2.0", features = ["serialize"] }
serde-big-array = { version = "0.4.1", optional = true }
wide = "0.7.5"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop this new dependency for now, as it is unused.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry, I'll pull it back in shortly 🙂

shssoichiro and others added 4 commits October 31, 2022 13:04
The Rust version of get_sad is still used during ME
for regions that are not of an exact block size.
This was consuming about 2.3% of runtime at speed 2.
After the change, the time spend in Rust get_sad
appears to be reduced by about 2/3, and the overall
runtime of a speed 2 is decreased by about 1% on an AVX2 CPU.
Co-authored-by: David Michael Barr <b@rr-dav.id.au>
@shssoichiro shssoichiro merged commit 02110c5 into xiph:master Oct 31, 2022
@shssoichiro shssoichiro deleted the speed-up-get-sad branch October 31, 2022 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants