Skip to content

Commit a420bc3

Browse files
committed
Enable AVX2 avg HBD function.
Speed-up of 0.5% to 2.88% for 10-bit encoding as measured on the first 10 frames of Bosphorus 3840x2160 10bit using --tiles 16 --threads 16 when compared to commit 8b930d2 on a 32-core 3970x: Before After Delta FPS Real (s) User (s) Real (s) User (s) Real (%) User (%) s0 358.144 4041.872 354.193 3991.381 1.12 1.26 s1 57.760 616.687 57.023 608.007 1.29 1.43 s2 36.050 345.250 35.594 340.669 1.28 1.34 s3 31.198 303.964 30.886 300.703 1.01 1.08 s4 30.988 302.335 30.664 299.023 1.06 1.11 s5 29.363 279.151 29.051 275.367 1.07 1.37 s6 17.947 157.552 17.706 154.048 1.36 2.27 s7 17.384 148.874 17.149 146.637 1.37 1.53 s8 16.863 148.554 16.676 144.990 1.12 2.46 s9 14.415 130.140 14.344 129.308 0.50 0.64 s10 10.974 66.348 10.774 64.491 1.86 2.88
1 parent 3834925 commit a420bc3

File tree

1 file changed

+10
-1
lines changed

1 file changed

+10
-1
lines changed

src/asm/x86/mc.rs

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -475,6 +475,11 @@ extern {
475475
dst: *mut u8, dst_stride: libc::ptrdiff_t, tmp1: *const i16,
476476
tmp2: *const i16, w: i32, h: i32,
477477
);
478+
479+
fn rav1e_avg_16bpc_avx2(
480+
dst: *mut u16, dst_stride: libc::ptrdiff_t, tmp1: *const i16,
481+
tmp2: *const i16, w: i32, h: i32, bitdepth_max: i32,
482+
);
478483
}
479484

480485
cpu_function_lookup_table!(
@@ -483,7 +488,11 @@ cpu_function_lookup_table!(
483488
[(SSSE3, Some(rav1e_avg_ssse3)), (AVX2, Some(rav1e_avg_avx2))]
484489
);
485490

486-
cpu_function_lookup_table!(AVG_HBD_FNS: [Option<AvgHBDFn>], default: None, []);
491+
cpu_function_lookup_table!(
492+
AVG_HBD_FNS: [Option<AvgHBDFn>],
493+
default: None,
494+
[(AVX2, Some(rav1e_avg_16bpc_avx2))]
495+
);
487496

488497
#[cfg(test)]
489498
mod test {

0 commit comments

Comments
 (0)