Merge dav1d assembly for AArch64, by release #1754

vibhoothi · 2019-10-11T12:40:16Z

Releases of dav1d

Extract dav1d subtree history

for TAG in `git -C dav1d tag -l`
do git -C dav1d subtree split -P src/arm/64 -b "src-arm-64-$TAG" "$TAG"
done

Apply to rav1e, restoring the prefix

git -C dav1d format-patch --no-stat --keep-subject --stdout --full-index -M100% \
--src-prefix=a/src/arm/64/ --dst-prefix=b/src/arm/64/ ..src-arm-64-0.1.0 -- '*.S' |
git -C rav1e am --keep

Reference #1750

The text was updated successfully, but these errors were encountered:

shssoichiro · 2019-10-11T12:42:01Z

This process will be a little bit different since we don't have ARM assembly in rav1e yet. But adding dav1d's ARM assembly is a great plan!

vibhoothi · 2019-10-11T12:45:08Z

@shssoichiro Yeah,
We have been trying to make the build system better for AArch64
and with this we are slowy having the dav1d ARM Assembly.

EwoutH · 2019-10-14T09:22:18Z

@vibhoothiiaanand Great work, it would be amazing if rav1e works faster on ARMv8 CPUs, especially as rav1e's multithreading gets better.

vibhoothi · 2019-11-19T15:49:41Z

Ok,

Here are the new updates from this week

We have tried to incorporate CDEF changes but it is not so doable easily, we have to rework on memory buffers to incorporate changeable stride for input buffer which would take time and would be a blocker. So what we are going to do now is, have all x86 integrated changes for AArch64 also and have it pass all tests and do a benchmark.

Here is the summary of changes from upstream

0.2.0 has CDEF, ARM Optimisations for MC, LR Improvements
0.2.1 has smart padding for CDEF
0.2.2 has SGR Looprestoration, Loopfiltering
0.3.0 No changes for src/arm/
0.3.1 has cold attribute changes+ msac_decode_symbol_adapt
0.4.0 has msac_decode_bool, blend, w_mask, inv_txfm_add
0.5.0 has msac optimisations, blend_h,blend_h,w_mask,intra_pred_(dc/h/v),paeth,smooth, palette,filter,cfl_pred,cfl_ac.
0.5.1 has looprestoration improvements

So interesting parts comes from 0.4.0
List of commits including 0.2.0 till 0.5.0 which are relevant for integrating are in green and the commits which will be added but not going to be integrated is in red, this is made in accordance based on x86 integration from dav1d.

- arm64: looprestoration: Minimal scheduling improvements
- arm64: looprestoration: Fix a typo  …
- arm64: looprestoration: Fix register references in comments
- arm64: looprestoration: Use ld2r instead of ld1+dup+dup
+ arm64: ipred: Make sure all symbols are aligned 
+ arm: util: Split movrel into movrel and movrel_local
- arm64: ipred: NEON implementation of the cfl_ac functions  
+ arm64: ipred: NEON implementation of the cfl_pred functions  
- arm64: ipred: NEON implementation of the filter function  
- arm64: ipred: NEON implementation of palette prediction  
+ arm64: ipred: NEON implementation of smooth prediction  
+ arm64: ipred: NEON implementation of paeth prediction  
+ arm64: mc: Use addp instead of addv+trn1 in warp  
- arm64: cdef: Improve find_dir  
- arm64: cdef: Calculate two initial parameters in the same vector  
- arm64: cdef: Use loads with postincrement in more places in the padding function
- arm64: cdef: Rewrite an expression slightly  
+ arm64: mc: Schedule instructions better in the warp8x8 functions  
+ arm64: mc: Use sbfx instead of ubfx+sxth in the warp function
+ arm64: ipred: NEON implementation of dc/h/v prediction modes  
+ arm64: itx: Fix overflows in idct  
+ arm64: itx: Consistently use the factor 2896 in adst  
+ arm64: itx: Use smull+smlal instead of addl+mul  
+ arm64: itx: Do the final calculation of adst4/adst8/adst16 in 32 bit to avoid too narrow clipping  
- arm64: mc: NEON implementation of w_mask_444/422/420 function  
- arm64: mc: NEON implementation of blend, blend_h and blend_v function  
- Add msac optimizations  
+ arm64: itx: Add NEON optimized inverse transforms  
+ arm64: Consistently name macro arguments tX for temporaries in transposes
- arm64: msac: Add handwritten versions of msac_decode_bool functions  
- arm64: msac: Fix a typo in a comment
- Add __attribute__((cold)) to rarely used functions
- arm64: remove invalid macro argument delimiter
- arm64: msac: Implement NEON msac_decode_symbol_adapt  
- arm64: loopfilter: Implement NEON loop filters  
- arm64: looprestoration: Add a NEON implementation of SGR  
- arm64: cdef: Clarify a slightly confusing comment  
- arm64: cdef: Use a smarter padding constant  
- arm64: cdef: Do saturating subtractions to avoid max operations with 0  
+ fix dav1d spelling
- arm64/ios: use prefixed dav1d_mc_warp_filter symbol
- arm64: mc: NEON implementation of warp8x8{,t}  
- arm64: cdef: NEON implementation of the dir function  
- arm64: cdef: NEON optimized cdef filter function  
- arm64: looprestoration: Optimize loop termination checks in copy_narrow_neon
- arm64: looprestoration: Simplify the horizontal filtering of one pixel at a time
- arm64: looprestoration: Simplify the setup of wiener_filter_v_neon
- arm64: looprestoration: Fix the loop condition in copy_narrow_neon  
- arm64: looprestoration: Fix comment typos
- arm64: looprestoration: Avoid unnecessary alignment of the mid buffer  
+ arm64: mc: Optimize mc_8tap_regular_w4_hv_8bpc for A53  
+ arm64: mc: Simplify the 8tap_2w_hv code slightly  
+ arm64: mc: Optimize the mul_mla_8_* macros for Cortex A53  
+ arm64: mc: Improve a comment
+ arm64: mc: Remove unused/unnecessary macro args
- arm64: mc: Use ubfx instead of ubfm, for consistency with arm  
- arm64: looprestoration: NEON optimized wiener filter  
- arm64: mc: Implement 8tap and bilin functions  ```

barrbrain added this to To do in Hacktoberfest 2019 via automation Oct 11, 2019

barrbrain added the hacktoberfest label Oct 11, 2019

shssoichiro added SIMD Architecture-specific SIMD optimization speed performance labels Oct 14, 2019

This was referenced Oct 19, 2019

Merge dav1d 0.1.0 AArch64 assembly #1784

Merged

Merge dav1d 0.3.0 AArch64 Assembly #1791

Merged

barrbrain assigned vibhoothi Oct 23, 2019

barrbrain moved this from To do to In progress in Hacktoberfest 2019 Oct 23, 2019

This was referenced Nov 21, 2019

Merge dav1d 0.4.0 AArch64 Assembly #1865

Merged

Merge dav1d 0.5.1 AArch64 Assembly #1868

Merged

barrbrain closed this as completed in #1868 Nov 23, 2019

Hacktoberfest 2019 automation moved this from In progress to Done Nov 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge dav1d assembly for AArch64, by release #1754

Merge dav1d assembly for AArch64, by release #1754

vibhoothi commented Oct 11, 2019 •

edited

shssoichiro commented Oct 11, 2019

vibhoothi commented Oct 11, 2019

EwoutH commented Oct 14, 2019

vibhoothi commented Nov 19, 2019 •

edited

Merge dav1d assembly for AArch64, by release #1754

Merge dav1d assembly for AArch64, by release #1754

Comments

vibhoothi commented Oct 11, 2019 • edited

Releases of dav1d

Extract dav1d subtree history

Apply to rav1e, restoring the prefix

shssoichiro commented Oct 11, 2019

vibhoothi commented Oct 11, 2019

EwoutH commented Oct 14, 2019

vibhoothi commented Nov 19, 2019 • edited

vibhoothi commented Oct 11, 2019 •

edited

vibhoothi commented Nov 19, 2019 •

edited