Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import x86 assembly for dav1d 0.9.1 #2769

Merged
merged 54 commits into from
Aug 9, 2021
Merged

Commits on Aug 6, 2021

  1. x86: itx: Add 12-bit wht

    anotherwon authored and shssoichiro committed Aug 6, 2021
    Configuration menu
    Copy the full SHA
    a89b1ec View commit details
    Browse the repository at this point in the history
  2. x86: itx: Add 10/12-bit SSE2 WHT

    anotherwon authored and shssoichiro committed Aug 6, 2021
    Configuration menu
    Copy the full SHA
    970691e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    3cf93ec View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    8b011e0 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    716764a View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    0e05d98 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    130dd7c View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    be93308 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    7bfa514 View commit details
    Browse the repository at this point in the history
  10. x86: Add high bitdepth wiener filter SSSE3 asm

    Victorien Le Couviour--Tuffet authored and shssoichiro committed Aug 6, 2021
    Configuration menu
    Copy the full SHA
    55e0e04 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    2533871 View commit details
    Browse the repository at this point in the history
  12. Add 10/12-bit deblock SSSE3 implementation

    Currently 64-bit only.
    rbultje authored and shssoichiro committed Aug 6, 2021
    Configuration menu
    Copy the full SHA
    6a65694 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    01a837a View commit details
    Browse the repository at this point in the history
  14. x86inc: Support memory operands in src1 in 3-operand instructions

    Particularly in code that makes heavy use of macros it's possible
    to end up with 3-operand instructions with a memory operand in src1.
    In the case of SSE this works fine due to automatic move insertions,
    but in AVX that fails since memory operands are only allowed in src2.
    
    The main purpose of this feature is to minimize the amount of code
    changes required to facilitate conversion of existing SSE code to AVX.
    gramner-twoorioles authored and shssoichiro committed Aug 6, 2021
    Configuration menu
    Copy the full SHA
    1000dbb View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    78d4849 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    94e07e7 View commit details
    Browse the repository at this point in the history
  17. x86: Fix warp_affine_8x8t_16bpc_ssse3 on 64-bit Windows + LLVM

    The stack size calculation ended up being incorrect when the stack
    alignment was larger than 16 due to auto-generated alignment padding.
    gramner-twoorioles authored and shssoichiro committed Aug 6, 2021
    Configuration menu
    Copy the full SHA
    4f688fc View commit details
    Browse the repository at this point in the history
  18. x86: itx: wht: Minor fixes

    * Rename macro for consistency. WHT has exactly one line per register.
    * Use REPX to make code more readable.
    anotherwon authored and shssoichiro committed Aug 6, 2021
    Configuration menu
    Copy the full SHA
    eab2fd6 View commit details
    Browse the repository at this point in the history
  19. x86: itx: Port 10-bit 4x4 transforms to SSE4

                                                     64-bit  32-bit
    inv_txfm_add_4x4_adst_adst_0_10bpc_c:            257.0   346.3
    inv_txfm_add_4x4_adst_adst_0_10bpc_sse4:          47.1    51.7
    inv_txfm_add_4x4_adst_adst_0_10bpc_avx2:          57.4
    inv_txfm_add_4x4_adst_adst_1_10bpc_c:            259.8   345.6
    inv_txfm_add_4x4_adst_adst_1_10bpc_sse4:          47.1    52.0
    inv_txfm_add_4x4_adst_adst_1_10bpc_avx2:          56.9
    inv_txfm_add_4x4_adst_dct_0_10bpc_c:             284.6   369.9
    inv_txfm_add_4x4_adst_dct_0_10bpc_sse4:           42.2    46.0
    inv_txfm_add_4x4_adst_dct_0_10bpc_avx2:           51.9
    inv_txfm_add_4x4_adst_dct_1_10bpc_c:             285.2   369.8
    inv_txfm_add_4x4_adst_dct_1_10bpc_sse4:           42.4    45.9
    inv_txfm_add_4x4_adst_dct_1_10bpc_avx2:           51.9
    inv_txfm_add_4x4_adst_flipadst_0_10bpc_c:        262.9   345.0
    inv_txfm_add_4x4_adst_flipadst_0_10bpc_sse4:      46.8    50.1
    inv_txfm_add_4x4_adst_flipadst_0_10bpc_avx2:      57.0
    inv_txfm_add_4x4_adst_flipadst_1_10bpc_c:        262.1   345.6
    inv_txfm_add_4x4_adst_flipadst_1_10bpc_sse4:      46.8    50.3
    inv_txfm_add_4x4_adst_flipadst_1_10bpc_avx2:      57.1
    inv_txfm_add_4x4_adst_identity_0_10bpc_c:        225.6   302.9
    inv_txfm_add_4x4_adst_identity_0_10bpc_sse4:      38.0    42.3
    inv_txfm_add_4x4_adst_identity_0_10bpc_avx2:      41.4
    inv_txfm_add_4x4_adst_identity_1_10bpc_c:        225.7   303.1
    inv_txfm_add_4x4_adst_identity_1_10bpc_sse4:      37.8    42.3
    inv_txfm_add_4x4_adst_identity_1_10bpc_avx2:      41.4
    inv_txfm_add_4x4_dct_adst_0_10bpc_c:             274.6   378.0
    inv_txfm_add_4x4_dct_adst_0_10bpc_sse4:           44.8    48.5
    inv_txfm_add_4x4_dct_adst_0_10bpc_avx2:           50.7
    inv_txfm_add_4x4_dct_adst_1_10bpc_c:             274.0   377.4
    inv_txfm_add_4x4_dct_adst_1_10bpc_sse4:           44.6    48.6
    inv_txfm_add_4x4_dct_adst_1_10bpc_avx2:           51.0
    inv_txfm_add_4x4_dct_dct_0_10bpc_c:               39.2    50.6
    inv_txfm_add_4x4_dct_dct_0_10bpc_sse4:            29.1    33.8
    inv_txfm_add_4x4_dct_dct_0_10bpc_avx2:            29.3
    inv_txfm_add_4x4_dct_dct_1_10bpc_c:              300.6   399.0
    inv_txfm_add_4x4_dct_dct_1_10bpc_sse4:            39.7    44.3
    inv_txfm_add_4x4_dct_dct_1_10bpc_avx2:            48.6
    inv_txfm_add_4x4_dct_flipadst_0_10bpc_c:         278.6   377.8
    inv_txfm_add_4x4_dct_flipadst_0_10bpc_sse4:       45.3    49.6
    inv_txfm_add_4x4_dct_flipadst_0_10bpc_avx2:       50.2
    inv_txfm_add_4x4_dct_flipadst_1_10bpc_c:         277.1   378.3
    inv_txfm_add_4x4_dct_flipadst_1_10bpc_sse4:       45.0    49.7
    inv_txfm_add_4x4_dct_flipadst_1_10bpc_avx2:       50.2
    inv_txfm_add_4x4_dct_identity_0_10bpc_c:         246.9   335.8
    inv_txfm_add_4x4_dct_identity_0_10bpc_sse4:       37.1    41.7
    inv_txfm_add_4x4_dct_identity_0_10bpc_avx2:       37.4
    inv_txfm_add_4x4_dct_identity_1_10bpc_c:         247.2   336.2
    inv_txfm_add_4x4_dct_identity_1_10bpc_sse4:       37.1    41.6
    inv_txfm_add_4x4_dct_identity_1_10bpc_avx2:       37.3
    inv_txfm_add_4x4_flipadst_adst_0_10bpc_c:        259.4   351.7
    inv_txfm_add_4x4_flipadst_adst_0_10bpc_sse4:      47.1    51.8
    inv_txfm_add_4x4_flipadst_adst_0_10bpc_avx2:      57.9
    inv_txfm_add_4x4_flipadst_adst_1_10bpc_c:        258.7   350.8
    inv_txfm_add_4x4_flipadst_adst_1_10bpc_sse4:      47.1    51.8
    inv_txfm_add_4x4_flipadst_adst_1_10bpc_avx2:      57.4
    inv_txfm_add_4x4_flipadst_dct_0_10bpc_c:         282.3   375.4
    inv_txfm_add_4x4_flipadst_dct_0_10bpc_sse4:       42.2    45.8
    inv_txfm_add_4x4_flipadst_dct_0_10bpc_avx2:       52.5
    inv_txfm_add_4x4_flipadst_dct_1_10bpc_c:         283.0   375.8
    inv_txfm_add_4x4_flipadst_dct_1_10bpc_sse4:       42.5    45.9
    inv_txfm_add_4x4_flipadst_dct_1_10bpc_avx2:       52.4
    inv_txfm_add_4x4_flipadst_flipadst_0_10bpc_c:    258.8   356.1
    inv_txfm_add_4x4_flipadst_flipadst_0_10bpc_sse4:  47.3    50.1
    inv_txfm_add_4x4_flipadst_flipadst_0_10bpc_avx2:  57.4
    inv_txfm_add_4x4_flipadst_flipadst_1_10bpc_c:    259.0   355.3
    inv_txfm_add_4x4_flipadst_flipadst_1_10bpc_sse4:  47.8    50.2
    inv_txfm_add_4x4_flipadst_flipadst_1_10bpc_avx2:  57.4
    inv_txfm_add_4x4_flipadst_identity_0_10bpc_c:    228.6   309.4
    inv_txfm_add_4x4_flipadst_identity_0_10bpc_sse4:  37.8    42.0
    inv_txfm_add_4x4_flipadst_identity_0_10bpc_avx2:  41.4
    inv_txfm_add_4x4_flipadst_identity_1_10bpc_c:    229.1   309.6
    inv_txfm_add_4x4_flipadst_identity_1_10bpc_sse4:  37.9    42.2
    inv_txfm_add_4x4_flipadst_identity_1_10bpc_avx2:  41.3
    inv_txfm_add_4x4_identity_adst_0_10bpc_c:        200.8   275.8
    inv_txfm_add_4x4_identity_adst_0_10bpc_sse4:      39.0    43.9
    inv_txfm_add_4x4_identity_adst_0_10bpc_avx2:      47.4
    inv_txfm_add_4x4_identity_adst_1_10bpc_c:        200.8   276.5
    inv_txfm_add_4x4_identity_adst_1_10bpc_sse4:      39.0    44.0
    inv_txfm_add_4x4_identity_adst_1_10bpc_avx2:      47.2
    inv_txfm_add_4x4_identity_dct_0_10bpc_c:         226.4   300.3
    inv_txfm_add_4x4_identity_dct_0_10bpc_sse4:       36.9    41.7
    inv_txfm_add_4x4_identity_dct_0_10bpc_avx2:       42.8
    inv_txfm_add_4x4_identity_dct_1_10bpc_c:         229.0   300.6
    inv_txfm_add_4x4_identity_dct_1_10bpc_sse4:       36.8    41.6
    inv_txfm_add_4x4_identity_dct_1_10bpc_avx2:       42.7
    inv_txfm_add_4x4_identity_flipadst_0_10bpc_c:    202.6   278.9
    inv_txfm_add_4x4_identity_flipadst_0_10bpc_sse4:  39.2    43.7
    inv_txfm_add_4x4_identity_flipadst_0_10bpc_avx2:  47.1
    inv_txfm_add_4x4_identity_flipadst_1_10bpc_c:    202.6   279.3
    inv_txfm_add_4x4_identity_flipadst_1_10bpc_sse4:  39.2    43.8
    inv_txfm_add_4x4_identity_flipadst_1_10bpc_avx2:  47.0
    inv_txfm_add_4x4_identity_identity_0_10bpc_c:    168.7   235.9
    inv_txfm_add_4x4_identity_identity_0_10bpc_sse4:  31.7    37.6
    inv_txfm_add_4x4_identity_identity_0_10bpc_avx2:  33.9
    inv_txfm_add_4x4_identity_identity_1_10bpc_c:    169.1   235.7
    inv_txfm_add_4x4_identity_identity_1_10bpc_sse4:  31.7    37.4
    inv_txfm_add_4x4_identity_identity_1_10bpc_avx2:  33.8
    anotherwon authored and shssoichiro committed Aug 6, 2021
    Configuration menu
    Copy the full SHA
    eb40a72 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    1050c1d View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    ecf8bed View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    4243642 View commit details
    Browse the repository at this point in the history
  23. x86: itx4: Inline transpose

    Saves one move.
    anotherwon authored and shssoichiro committed Aug 6, 2021
    Configuration menu
    Copy the full SHA
    e6a7c42 View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    9464e7b View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    23c57d1 View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    f4d9939 View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    86d027b View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    95a5c6c View commit details
    Browse the repository at this point in the history
  29. Configuration menu
    Copy the full SHA
    55e7c79 View commit details
    Browse the repository at this point in the history
  30. Configuration menu
    Copy the full SHA
    47e957a View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    3741624 View commit details
    Browse the repository at this point in the history
  32. Configuration menu
    Copy the full SHA
    3a7b019 View commit details
    Browse the repository at this point in the history
  33. Configuration menu
    Copy the full SHA
    ee3c498 View commit details
    Browse the repository at this point in the history
  34. x86: Add minor improvements to wiener16 SSSE3 asm

    Victorien Le Couviour--Tuffet authored and shssoichiro committed Aug 6, 2021
    Configuration menu
    Copy the full SHA
    2dcdbf0 View commit details
    Browse the repository at this point in the history
  35. x86: Add high bitdepth (10-bit) sgr SSSE3 asm

    Victorien Le Couviour--Tuffet authored and shssoichiro committed Aug 6, 2021
    Configuration menu
    Copy the full SHA
    786f44a View commit details
    Browse the repository at this point in the history
  36. Configuration menu
    Copy the full SHA
    52748e3 View commit details
    Browse the repository at this point in the history
  37. Configuration menu
    Copy the full SHA
    bf9d4a4 View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    8df07e4 View commit details
    Browse the repository at this point in the history
  39. Configuration menu
    Copy the full SHA
    e2f6e5f View commit details
    Browse the repository at this point in the history
  40. Configuration menu
    Copy the full SHA
    0dabf25 View commit details
    Browse the repository at this point in the history
  41. Configuration menu
    Copy the full SHA
    ade10cc View commit details
    Browse the repository at this point in the history
  42. x86/itx: change function signatures of itx_4x4 to 0 GPRs

    The wrapper function already backs up GPRs, and declaring 7 here means
    we will backup/restore twice on x86-32.
    rbultje authored and shssoichiro committed Aug 6, 2021
    Configuration menu
    Copy the full SHA
    969d156 View commit details
    Browse the repository at this point in the history
  43. Configuration menu
    Copy the full SHA
    5c2ec59 View commit details
    Browse the repository at this point in the history
  44. Configuration menu
    Copy the full SHA
    ff3b64c View commit details
    Browse the repository at this point in the history
  45. Configuration menu
    Copy the full SHA
    d84a00b View commit details
    Browse the repository at this point in the history
  46. Configuration menu
    Copy the full SHA
    681aa00 View commit details
    Browse the repository at this point in the history
  47. Configuration menu
    Copy the full SHA
    e8d249a View commit details
    Browse the repository at this point in the history
  48. Configuration menu
    Copy the full SHA
    3b7def0 View commit details
    Browse the repository at this point in the history
  49. Configuration menu
    Copy the full SHA
    c66786b View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    c422356 View commit details
    Browse the repository at this point in the history
  51. Configuration menu
    Copy the full SHA
    05cc987 View commit details
    Browse the repository at this point in the history
  52. Configuration menu
    Copy the full SHA
    abba2c2 View commit details
    Browse the repository at this point in the history
  53. Configuration menu
    Copy the full SHA
    3014f67 View commit details
    Browse the repository at this point in the history
  54. Configuration menu
    Copy the full SHA
    76d531e View commit details
    Browse the repository at this point in the history