{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":154475642,"defaultBranch":"master","name":"dav1d","ownerLogin":"videolan","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2018-10-24T09:37:02.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/1389585?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1718200542.0","currentOid":""},"activityList":{"items":[{"before":"01b94cc33ba1ac5d53b085e57af2902a5054de7a","after":"ca83ee6d9dd2c2210deb8e285de4fd72e929e390","ref":"refs/heads/master","pushedAt":"2024-06-17T16:53:42.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"itx: restrict number of columns iterated over based on EOB","shortMessageHtmlLink":"itx: restrict number of columns iterated over based on EOB"}},{"before":"92f592ed104ba92ad35c781ee93f354525eef503","after":"01b94cc33ba1ac5d53b085e57af2902a5054de7a","ref":"refs/heads/master","pushedAt":"2024-06-11T13:22:03.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"cli: Prevent buffer over-read","shortMessageHtmlLink":"cli: Prevent buffer over-read"}},{"before":"da2cc7817cff218b30f2c813a8a142a43f9376bd","after":"92f592ed104ba92ad35c781ee93f354525eef503","ref":"refs/heads/master","pushedAt":"2024-06-10T11:13:24.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"AArch64: Fix potential out of bounds access in DotProd H/HV filters\n\nThe DotProd/I8MM horizontal and HV/2D subpel filters use -4 offset\nfor sampling instead of -3 to be better aligned in some cases. This\nresulted in an out of bounds access, which led to crashes.\n\nThis patch fixes it.","shortMessageHtmlLink":"AArch64: Fix potential out of bounds access in DotProd H/HV filters"}},{"before":"ca156d90b8745273604674967186ae5b38208a3d","after":"da2cc7817cff218b30f2c813a8a142a43f9376bd","ref":"refs/heads/master","pushedAt":"2024-05-28T10:52:57.000Z","pushType":"push","commitsCount":4,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"x86: Eliminate hardcoded struct offsets in refmvs load_tmvs() asm","shortMessageHtmlLink":"x86: Eliminate hardcoded struct offsets in refmvs load_tmvs() asm"}},{"before":"805d9e5a8ffce3ef78cebde4bfedf3642907b2d3","after":"ca156d90b8745273604674967186ae5b38208a3d","ref":"refs/heads/master","pushedAt":"2024-05-27T15:24:59.000Z","pushType":"push","commitsCount":4,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"x86: Add 6-tap variants of 8bpc mc SSSE3 functions","shortMessageHtmlLink":"x86: Add 6-tap variants of 8bpc mc SSSE3 functions"}},{"before":"3623543c4117f413110b27b5c20c6ae1638a22f9","after":"805d9e5a8ffce3ef78cebde4bfedf3642907b2d3","ref":"refs/heads/master","pushedAt":"2024-05-25T08:26:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"Update NEWS for 1.4.2","shortMessageHtmlLink":"Update NEWS for 1.4.2"}},{"before":"bb948769e351374b5cc16565cc309ea31f4cf360","after":"3623543c4117f413110b27b5c20c6ae1638a22f9","ref":"refs/heads/master","pushedAt":"2024-05-20T13:18:18.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"ARM64: Minor improvement to symbol decode\n\nUse a slightly shorter series of instructions to compute cdf update\nrate.","shortMessageHtmlLink":"ARM64: Minor improvement to symbol decode"}},{"before":"9469e18458c92d1e606b138cc58d8158994e9234","after":"bb948769e351374b5cc16565cc309ea31f4cf360","ref":"refs/heads/master","pushedAt":"2024-05-20T12:42:15.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"tests: Verify dav1d command line in dav1d_argon.bash\n\nError out early instead of producing bogus mismatch errors in case\nof an incorrect cpu mask for example.","shortMessageHtmlLink":"tests: Verify dav1d command line in dav1d_argon.bash"}},{"before":"37155c11474c1812de6c96656fb167a67d307a37","after":"9469e18458c92d1e606b138cc58d8158994e9234","ref":"refs/heads/master","pushedAt":"2024-05-20T12:07:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"arm64: msac: Explicitly use the ldur instruction\n\nThe ldr instruction can take an immediate offset which is a multiple\nof the loaded element size. If the ldr instruction is given an\nimmediate offset which isn't a multiple of the element size,\nmost assemblers implicitly generate a \"ldur\" instruction instead.\n\nOlder versions of MS armasm64.exe don't do this, but instead error\nout with \"error A2518: operand 2: Memory offset must be aligned\".\n(Current versions don't do this but correctly generate \"ldur\"\nimplicitly.)\n\nSwitch this instruction to an explicit \"ldur\", like we do elsewhere,\nto fix building with these older tools.","shortMessageHtmlLink":"arm64: msac: Explicitly use the ldur instruction"}},{"before":"7f68f23c2731fe13b6c7d26c7d0c38b874be4c5d","after":"37155c11474c1812de6c96656fb167a67d307a37","ref":"refs/heads/master","pushedAt":"2024-05-18T10:25:21.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"CI: Update Android image\n\nNDK 26 dropped support for API versions 19 and 20 (KitKat, Android 4.4).\nThe minimum supported API is now 21 (Lollipop, Android 5.0).","shortMessageHtmlLink":"CI: Update Android image"}},{"before":"d835c6bf69d074c57b416c867c2586940a39adbf","after":"7f68f23c2731fe13b6c7d26c7d0c38b874be4c5d","ref":"refs/heads/master","pushedAt":"2024-05-18T10:02:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"ARM64: Various optimizations for symbol decode\n\nChanges stem from redesigning the reduction stage of the multisymbol\ndecode function.\n* No longer use adapt4 for 5 possible symbol values\n* Specialize reduction for 4/8/16 decode functions\n* Modify control flow\n\n+------------------------+--------------+--------------+---------------+\n|                        |  Neoverse V1 |  Neoverse N1 |   Cortex A72  |\n|                        | (Graviton 3) | (Graviton 2) |  (Graviton 1) |\n+------------------------+-------+------+-------+------+-------+-------+\n|                        |  Old  |  New |  Old  |  New |  Old  |  New  |\n+------------------------+-------+------+-------+------+-------+-------+\n| decode_bool_neon       |  13.0 | 12.9 |  14.9 | 14.0 |  39.3 |  29.0 |\n+------------------------+-------+------+-------+------+-------+-------+\n| decode_bool_adapt_neon |  15.4 | 15.6 |  17.5 | 16.8 |  41.6 |  33.5 |\n+------------------------+-------+------+-------+------+-------+-------+\n| decode_bool_equi_neon  |  11.3 | 12.0 |  14.0 | 12.2 |  35.0 |  26.3 |\n+------------------------+-------+------+-------+------+-------+-------+\n| decode_hi_tok_c        |  73.7 | 57.8 |  73.4 | 60.5 | 130.1 | 103.9 |\n+------------------------+-------+------+-------+------+-------+-------+\n| decode_hi_tok_neon     |  63.3 | 48.2 |  65.2 | 51.2 | 119.0 | 105.3 |\n+------------------------+-------+------+-------+------+-------+-------+\n| decode_symbol_\\        |  28.6 | 22.5 |  28.4 | 23.5 |  67.8 |  55.1 |\n| adapt4_neon            |       |      |       |      |       |       |\n+------------------------+-------+------+-------+------+-------+-------+\n| decode_symbol_\\        |  29.5 | 26.6 |  29.0 | 28.8 |  76.6 |  74.0 |\n| adapt8_neon            |       |      |       |      |       |       |\n+------------------------+-------+------+-------+------+-------+-------+\n| decode_symbol_\\        |  31.6 | 31.2 |  33.3 | 33.0 |  77.5 |  68.1 |\n| adapt16_neon           |       |      |       |      |       |       |\n+------------------------+-------+------+-------+------+-------+-------+","shortMessageHtmlLink":"ARM64: Various optimizations for symbol decode"}},{"before":"841853031b41ddf84cb7c7a183c14450c17df054","after":"d835c6bf69d074c57b416c867c2586940a39adbf","ref":"refs/heads/master","pushedAt":"2024-05-14T15:17:33.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"AArch64: Optimize prep_neon function\n\nOptimize the widening copy part of subpel filters (the prep_neon\nfunction). In this patch we combine widening shifts with widening\nmultiplications in the inner loops to get maximum throughput.\n\nThe change will increase .text by 36 bytes.\n\nRelative performance of micro benchmarks (lower is better):\n\nCortex-A55:\n  mct_w4:   0.795x\n  mct_w8:   0.913x\n  mct_w16:  0.912x\n  mct_w32:  0.838x\n  mct_w64:  1.025x\n  mct_w128: 1.002x\n\nCortex-A510:\n  mct_w4:   0.760x\n  mct_w8:   0.636x\n  mct_w16:  0.640x\n  mct_w32:  0.854x\n  mct_w64:  0.864x\n  mct_w128: 0.995x\n\nCortex-A72:\n  mct_w4:   0.616x\n  mct_w8:   0.854x\n  mct_w16:  0.756x\n  mct_w32:  1.052x\n  mct_w64:  1.044x\n  mct_w128: 0.702x\n\nCortex-A76:\n  mct_w4:   0.837x\n  mct_w8:   0.797x\n  mct_w16:  0.841x\n  mct_w32:  0.804x\n  mct_w64:  0.948x\n  mct_w128: 0.904x\n\nCortex-A78:\n  mct_w16:  0.542x\n  mct_w32:  0.725x\n  mct_w64:  0.741x\n  mct_w128: 0.745x\n\nCortex-A715:\n  mct_w16:  0.561x\n  mct_w32:  0.720x\n  mct_w64:  0.740x\n  mct_w128: 0.748x\n\nCortex-X1:\n  mct_w32:  0.886x\n  mct_w64:  0.882x\n  mct_w128: 0.917x\n\nCortex-X3:\n  mct_w32:  0.835x\n  mct_w64:  0.803x\n  mct_w128: 0.808x","shortMessageHtmlLink":"AArch64: Optimize prep_neon function"}},{"before":"8141546da973cec34f69fc28314aa83624966e11","after":"841853031b41ddf84cb7c7a183c14450c17df054","ref":"refs/heads/master","pushedAt":"2024-05-14T13:44:33.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"x86: Update x86inc.asm\n\nhttps://code.videolan.org/videolan/x86inc.asm/-/commit/b6ba1e3045d758fd6c6e24591dac21a3dc812e1d","shortMessageHtmlLink":"x86: Update x86inc.asm"}},{"before":"cc1137c85b5cc7e082e7040f4160e5a6da1f06ff","after":"8141546da973cec34f69fc28314aa83624966e11","ref":"refs/heads/master","pushedAt":"2024-05-13T19:51:30.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"AArch64: Optimize put_neon function\n\nOptimize the copy part of subpel filters (the put_neon function).\nFor small block sizes (<16) the usage of general purpose registers\nis usually the best way to do the copy.\n\nRelative performance of micro benchmarks (lower is better):\n\nCortex-A55:\n  w2:   0.991x\n  w4:   0.992x\n  w8:   0.999x\n  w16:  0.875x\n  w32:  0.775x\n  w64:  0.914x\n  w128: 0.998x\n\nCortex-A510:\n  w2:   0.159x\n  w4:   0.080x\n  w8:   0.583x\n  w16:  0.588x\n  w32:  0.966x\n  w64:  1.111x\n  w128: 0.957x\n\nCortex-A76:\n  w2:   0.903x\n  w4:   0.683x\n  w8:   0.944x\n  w16:  0.948x\n  w32:  0.919x\n  w64:  0.855x\n  w128: 0.991x\n\nCortex-A78:\n  w32:  0.867x\n  w64:  0.820x\n  w128: 1.011x\n\nCortex-A715:\n  w32:  0.834x\n  w64:  0.778x\n  w128: 1.000x\n\nCortex-X1:\n  w32:  0.809x\n  w64:  0.762x\n  w128: 1.000x\n\nCortex-X3:\n  w32: 0.733x\n  w64: 0.720x\n  w128: 0.999x","shortMessageHtmlLink":"AArch64: Optimize put_neon function"}},{"before":"a6d57b11401f1ac3f65dfa20995c296a08a4802f","after":"cc1137c85b5cc7e082e7040f4160e5a6da1f06ff","ref":"refs/heads/master","pushedAt":"2024-05-13T12:39:11.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"checkasm: Eliminate unreachable code in the Windows exception handler","shortMessageHtmlLink":"checkasm: Eliminate unreachable code in the Windows exception handler"}},{"before":"2d2c6c65a5e6a07f349f127168d98abe1ffe26ce","after":"a6d57b11401f1ac3f65dfa20995c296a08a4802f","ref":"refs/heads/master","pushedAt":"2024-05-12T14:49:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"AArch64: Optimize the init of DotProd+ 2D subpel filters\n\nRemoved some unnecessary vector register copies from the initial\nhorizontal filter parts of the HV subpel filters. The performance\nimprovements are better for the smaller filter block sizes.\n\nThe narrowing shifts were also rewritten at the end of the *filter8*\nbecause it was only beneficial for the Cortex-A55 among the DotProd\ncapable CPU cores. On other out-of-order or newer CPUs the UZP1+SHRN\ninstruction combination is better.\n\nRelative performance of micro benchmarks (lower is better):\n\nCortex-A55:\n  mct regular w4:  0.980x\n  mct regular w8:  1.007x\n  mct regular w16: 1.007x\n\n  mct sharp w4:    0.983x\n  mct sharp w8:    1.012x\n  mct sharp w16:   1.005x\n\nCortex-A510:\n  mct regular w4:  0.935x\n  mct regular w8:  0.984x\n  mct regular w16: 0.986x\n\n  mct sharp w4:    0.927x\n  mct sharp w8:    0.983x\n  mct sharp w16:   0.987x\n\nCortex-A78:\n  mct regular w4:  0.974x\n  mct regular w8:  0.988x\n  mct regular w16: 0.991x\n\n  mct sharp w4:    0.971x\n  mct sharp w8:    0.987x\n  mct sharp w16:   0.979x\n\nCortex-715:\n  mct regular w4:  0.958x\n  mct regular w8:  0.993x\n  mct regular w16: 0.998x\n\n  mct sharp w4:    0.974x\n  mct sharp w8:    0.991x\n  mct sharp w16:   0.997x\n\nCortex-X1:\n  mct regular w4:  0.983x\n  mct regular w8:  0.993x\n  mct regular w16: 0.996x\n\n  mct sharp w4:    0.974x\n  mct sharp w8:    0.990x\n  mct sharp w16:   0.995x\n\nCortex-X3:\n  mct regular w4:  0.953x\n  mct regular w8:  0.993x\n  mct regular w16: 0.997x\n\n  mct sharp w4:    0.981x\n  mct sharp w8:    0.993x\n  mct sharp w16:   0.995x","shortMessageHtmlLink":"AArch64: Optimize the init of DotProd+ 2D subpel filters"}},{"before":"643195f5468bf7e35ad9c0f4fdbd59de0ba6925d","after":"2d2c6c65a5e6a07f349f127168d98abe1ffe26ce","ref":"refs/heads/master","pushedAt":"2024-05-10T19:19:07.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"ppc: Loopfilter targeting pwr9\n\nIt relies on vec_absd and vec_xst_len.","shortMessageHtmlLink":"ppc: Loopfilter targeting pwr9"}},{"before":"b2eca1aca7b055ec6255ebb286edab080a377526","after":"643195f5468bf7e35ad9c0f4fdbd59de0ba6925d","ref":"refs/heads/master","pushedAt":"2024-05-09T09:06:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"AArch64: Optimize 2D i8mm subpel filters\n\nRewrite the accumulator initializations of the horizontal part of the\n2D filters with zero register fills. It can improve the performance\non out-of-order CPUs which can fill vector registers by zero with\nzero latency. Zeroed accumulators imply the usage of the rounding\nshifts at the end of filters.\n\nThe only exception is the very short *hv_filter4*, where the longer\nlatency of rounding shift could decrease the performance.\n\nThe *filter8* function uses a different (alternating) dot product\ncomputation order for DotProd+ feature level, it gives a better\noverall performance for out-of-order and some in-order CPU cores.\n\nThe i8mm version does not need to use bias for the loaded samples, so\na different instruction scheduling is beneficial mostly affecting the\norder of TBL instructions in the 8-tap case.\n\nRelative performance of micro benchmarks (lower is better):\n\nCortex-X3:\n  mct_8tap_regular_w16_hv_8bpc_i8mm:  0.982x\n  mct_8tap_sharp_w16_hv_8bpc_i8mm:    0.979x\n  mct_8tap_regular_w8_hv_8bpc_i8mm:   0.972x\n  mct_8tap_sharp_w8_hv_8bpc_i8mm:     0.969x\n  mct_8tap_regular_w4_hv_8bpc_i8mm:   0.942x\n  mct_8tap_sharp_w4_hv_8bpc_i8mm:     0.935x\n  mc_8tap_regular_w16_hv_8bpc_i8mm:   0.988x\n  mc_8tap_sharp_w16_hv_8bpc_i8mm:     0.982x\n  mc_8tap_regular_w8_hv_8bpc_i8mm:    0.981x\n  mc_8tap_sharp_w8_hv_8bpc_i8mm:      0.975x\n  mc_8tap_regular_w4_hv_8bpc_i8mm:    0.998x\n  mc_8tap_sharp_w4_hv_8bpc_i8mm:      0.996x\n  mc_8tap_regular_w2_hv_8bpc_i8mm:    1.006x\n  mc_8tap_sharp_w2_hv_8bpc_i8mm:      0.993x\n\nCortex-A715:\n  mct_8tap_regular_w16_hv_8bpc_i8mm:  0.883x\n  mct_8tap_sharp_w16_hv_8bpc_i8mm:    0.931x\n  mct_8tap_regular_w8_hv_8bpc_i8mm:   0.882x\n  mct_8tap_sharp_w8_hv_8bpc_i8mm:     0.928x\n  mct_8tap_regular_w4_hv_8bpc_i8mm:   0.969x\n  mct_8tap_sharp_w4_hv_8bpc_i8mm:     0.934x\n  mc_8tap_regular_w16_hv_8bpc_i8mm:   0.881x\n  mc_8tap_sharp_w16_hv_8bpc_i8mm:     0.925x\n  mc_8tap_regular_w8_hv_8bpc_i8mm:    0.879x\n  mc_8tap_sharp_w8_hv_8bpc_i8mm:      0.925x\n  mc_8tap_regular_w4_hv_8bpc_i8mm:    0.917x\n  mc_8tap_sharp_w4_hv_8bpc_i8mm:      0.976x\n  mc_8tap_regular_w2_hv_8bpc_i8mm:    0.915x\n  mc_8tap_sharp_w2_hv_8bpc_i8mm:      0.972x\n\nCortex-A510:\n  mct_8tap_regular_w16_hv_8bpc_i8mm:  0.994x\n  mct_8tap_sharp_w16_hv_8bpc_i8mm:    0.949x\n  mct_8tap_regular_w8_hv_8bpc_i8mm:   0.987x\n  mct_8tap_sharp_w8_hv_8bpc_i8mm:     0.947x\n  mct_8tap_regular_w4_hv_8bpc_i8mm:   1.002x\n  mct_8tap_sharp_w4_hv_8bpc_i8mm:     0.999x\n  mc_8tap_regular_w16_hv_8bpc_i8mm:   0.989x\n  mc_8tap_sharp_w16_hv_8bpc_i8mm:     1.003x\n  mc_8tap_regular_w8_hv_8bpc_i8mm:    0.986x\n  mc_8tap_sharp_w8_hv_8bpc_i8mm:      1.000x\n  mc_8tap_regular_w4_hv_8bpc_i8mm:    1.007x\n  mc_8tap_sharp_w4_hv_8bpc_i8mm:      1.000x\n  mc_8tap_regular_w2_hv_8bpc_i8mm:    1.005x\n  mc_8tap_sharp_w2_hv_8bpc_i8mm:      1.000x","shortMessageHtmlLink":"AArch64: Optimize 2D i8mm subpel filters"}},{"before":"d1bdf4f1ff4bae70834d9e5391bb68b75c1c9111","after":"b2eca1aca7b055ec6255ebb286edab080a377526","ref":"refs/heads/master","pushedAt":"2024-05-08T21:38:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"AArch64: Optimize vertical i8mm subpel filters\n\nReplace the accumulator initializations of the vertical subpel\nfilters with register fills by zeros (which are usually zero latency\noperations in this feature class), this implies the usage of rounding\nshifts at the end in the prep cases. Out-of-order CPU cores can\nbenefit from this change.\n\nThe width=16 case uses a simpler register duplication scheme that\nrelies on MOV instructions for the subsequent shuffles. This approach\nuses a different register to load the data into for better instruction\nscheduling and data dependency chain.\n\nRelative performance of micro benchmarks (lower is better):\n\nCortex-X3:\nmct_8tap_sharp_w16_v_8bpc_i8mm:\t0.910x\nmct_8tap_sharp_w8_v_8bpc_i8mm: \t0.986x\n\nmc_8tap_sharp_w16_v_8bpc_i8mm: \t0.864x\nmc_8tap_sharp_w8_v_8bpc_i8mm:  \t0.882x\nmc_8tap_sharp_w4_v_8bpc_i8mm:  \t0.933x\nmc_8tap_sharp_w2_v_8bpc_i8mm:  \t0.926x\n\nCortex-A715:\nmct_8tap_sharp_w16_v_8bpc_i8mm:\t0.855x\nmct_8tap_sharp_w8_v_8bpc_i8mm: \t0.784x\nmct_8tap_sharp_w4_v_8bpc_i8mm:  1.069x\n\nmc_8tap_sharp_w16_v_8bpc_i8mm: \t0.850x\nmc_8tap_sharp_w8_v_8bpc_i8mm:  \t0.779x\nmc_8tap_sharp_w4_v_8bpc_i8mm:  \t0.971x\nmc_8tap_sharp_w2_v_8bpc_i8mm:  \t0.975x\n\nCortex-A510:\nmct_8tap_sharp_w16_v_8bpc_i8mm: 1.001x\nmct_8tap_sharp_w8_v_8bpc_i8mm: \t0.979x\nmct_8tap_sharp_w4_v_8bpc_i8mm: \t0.998x\n\nmc_8tap_sharp_w16_v_8bpc_i8mm: \t0.998x\nmc_8tap_sharp_w8_v_8bpc_i8mm:   1.004x\nmc_8tap_sharp_w4_v_8bpc_i8mm:   1.003x\nmc_8tap_sharp_w2_v_8bpc_i8mm:  \t0.996x","shortMessageHtmlLink":"AArch64: Optimize vertical i8mm subpel filters"}},{"before":"fc4763c5a4d31aa08f3a0671b2111fbcb67f378d","after":"d1bdf4f1ff4bae70834d9e5391bb68b75c1c9111","ref":"refs/heads/master","pushedAt":"2024-05-08T20:23:15.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"AArch64: Optimize horizontal i8mm prep filters\n\nReplace the accumulator initializations of the horizontal prep\nfilters with register fills by zeros. Most i8mm capable CPUs can do\nthese with zero latency, but we also need to use rounding shifts at\nthe end of the filter. We can see better performance with this\nchange on out-of-order CPUs.\n\nRelative performance of micro benchmarks (lower is better):\n\nCortex-X3:\nmct_8tap_sharp_w32_h_8bpc_i8mm:  0.914x\nmct_8tap_sharp_w16_h_8bpc_i8mm:  0.906x\nmct_8tap_sharp_w8_h_8bpc_i8mm:   0.877x\n\nCortex-A715:\nmct_8tap_sharp_w32_h_8bpc_i8mm:  0.819x\nmct_8tap_sharp_w16_h_8bpc_i8mm:  0.805x\nmct_8tap_sharp_w8_h_8bpc_i8mm:   0.779x\n\nCortex-A510:\nmct_8tap_sharp_w32_h_8bpc_i8mm:  0.999x\nmct_8tap_sharp_w16_h_8bpc_i8mm:  1.001x\nmct_8tap_sharp_w8_h_8bpc_i8mm:   0.996x\nmct_8tap_sharp_w4_h_8bpc_i8mm:   0.915x","shortMessageHtmlLink":"AArch64: Optimize horizontal i8mm prep filters"}},{"before":"c7df9a3e65a8d74c57131d379db93856c38ae47c","after":"fc4763c5a4d31aa08f3a0671b2111fbcb67f378d","ref":"refs/heads/master","pushedAt":"2024-05-06T18:44:12.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"riscv: Check for standards compliant RVV 1.0+","shortMessageHtmlLink":"riscv: Check for standards compliant RVV 1.0+"}},{"before":"223901243c17b504ca93cb49dc797733f9fe4876","after":"c7df9a3e65a8d74c57131d379db93856c38ae47c","ref":"refs/heads/master","pushedAt":"2024-05-01T13:34:12.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"CI: Improve coverage for argon samples using different thread counts\n\nSimilar to 4796b59fc0a459588183dc2ea199ba1074befc67.","shortMessageHtmlLink":"CI: Improve coverage for argon samples using different thread counts"}},{"before":"236e1d19125c41c5bd978bd55838a74611448cba","after":"223901243c17b504ca93cb49dc797733f9fe4876","ref":"refs/heads/master","pushedAt":"2024-04-29T16:30:37.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"x86: Add 6-tap variants of high bit-depth mc AVX-512 (Ice Lake) functions","shortMessageHtmlLink":"x86: Add 6-tap variants of high bit-depth mc AVX-512 (Ice Lake) funct…"}},{"before":"1776c45a087845efad28e893ce4414a2a91786a3","after":"236e1d19125c41c5bd978bd55838a74611448cba","ref":"refs/heads/master","pushedAt":"2024-04-26T15:46:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"tools: Make ARM cpu flags imply relevant lower level flags\n\nThe --cpumask flag only takes one single flag name, one can't set\na combination like neon+dotprod.\n\nTherefore, apply the same pattern as for x86, by adding mask values\nthat contain all the implied lower level flags.\n\nThis is somewhat complicated, as the set of features isn't entirely\nlinear - in particular, SVE doesn't imply either dotprod or i8mm,\nand SVE2 only implies dotprod, but not i8mm.\n\nThis makes sure that \"dav1d --cpumask dotprod\" actually uses any\nSIMD at all, as it previously only set the dotprod flag but not\nneon, which essentially opted out from all SIMD.","shortMessageHtmlLink":"tools: Make ARM cpu flags imply relevant lower level flags"}},{"before":"fbf23637ceae37a55ab8624ec31a705db6b6dea9","after":"1776c45a087845efad28e893ce4414a2a91786a3","ref":"refs/heads/master","pushedAt":"2024-04-26T15:07:11.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"AArch64: Add basic i8mm support for convolutions\n\nAdd an Armv8.6-A i8mm code path for standard bitdepth convolutions.\nOnly horizontal-vertical (HV) convolutions have 6-tap specialisations\nof their vertical passes. All other convolutions are 4- or 8-tap\nfilters which fit well with the 4-element USDOT instruction.\n\nBenchmarks show 4-9% FPS increase relative to the Armv8.4-A\ncode path depending on the input video and the CPU used.\n\nThis patch will increase the .text by around 5.7 KiB.\n\nRelative performance to the C reference on some Cortex CPU cores:\n\n                       Cortex-A715   Cortex-X3  Cortex-A510\nregular w4 hv neon:          7.20x      11.20x        4.40x\nregular w4 hv dotprod:      12.77x      18.35x        6.21x\nregular w4 hv i8mm:         14.50x      21.42x        6.16x\n\n  sharp w4 hv neon:          6.24x       9.77x        3.96x\n  sharp w4 hv dotprod:       9.76x      14.02x        5.20x\n  sharp w4 hv i8mm:         10.84x      16.09x        5.42x\n\nregular w8 hv neon:          2.17x       2.46x        3.17x\nregular w8 hv dotprod:       3.04x       3.11x        3.03x\nregular w8 hv i8mm:          3.57x       3.40x        3.27x\n\n  sharp w8 hv neon:          1.72x       1.93x        2.75x\n  sharp w8 hv dotprod:       2.49x       2.54x        2.62x\n  sharp w8 hv i8mm:          2.80x       2.79x        2.70x\n\nregular w16 hv neon:         1.90x       2.17x        2.02x\nregular w16 hv dotprod:      2.59x       2.64x        1.93x\nregular w16 hv i8mm:         3.01x       2.85x        2.05x\n\n  sharp w16 hv neon:         1.51x       1.72x        1.74x\n  sharp w16 hv dotprod:      2.17x       2.22x        1.70x\n  sharp w16 hv i8mm:         2.42x       2.42x        1.72x\n\nregular w32 hv neon:         1.80x       1.96x        1.81x\nregular w32 hv dotprod:      2.43x       2.36x        1.74x\nregular w32 hv i8mm:         2.83x       2.51x        1.83x\n\n  sharp w32 hv neon:         1.42x       1.54x        1.56x\n  sharp w32 hv dotprod:      2.07x       2.00x        1.55x\n  sharp w32 hv i8mm:         2.29x       2.16x        1.55x\n\nregular w64 hv neon:         1.82x       1.89x        1.70x\nregular w64 hv dotprod:      2.43x       2.25x        1.65x\nregular w64 hv i8mm:         2.84x       2.39x        1.73x\n\n  sharp w64 hv neon:         1.43x       1.47x        1.49x\n  sharp w64 hv dotprod:      2.08x       1.91x        1.49x\n  sharp w64 hv i8mm:         2.30x       2.07x        1.48x\n\nregular w128 hv neon:        1.77x       1.84x        1.75x\nregular w128 hv dotprod:     2.37x       2.18x        1.70x\nregular w128 hv i8mm:        2.76x       2.33x        1.78x\n\n  sharp w128 hv neon:        1.40x       1.45x        1.42x\n  sharp w128 hv dotprod:     2.04x       1.87x        1.43x\n  sharp w128 hv i8mm:        2.24x       2.02x        1.42x\n\nregular w8 h neon:           3.16x       3.51x        3.43x\nregular w8 h dotprod:        4.97x       7.43x        4.95x\nregular w8 h i8mm:           7.28x      10.38x        5.69x\n\n  sharp w8 h neon:           2.71x       2.77x        3.10x\n  sharp w8 h dotprod:        4.92x       7.14x        4.94x\n  sharp w8 h i8mm:           7.21x      10.11x        5.70x\n\nregular w16 h neon:          2.79x       2.76x        3.53x\nregular w16 h dotprod:       3.81x       4.77x        3.13x\nregular w16 h i8mm:          5.21x       6.04x        3.56x\n\n  sharp w16 h neon:          2.31x       2.38x        3.12x\n  sharp w16 h dotprod:       3.80x       4.74x        3.13x\n  sharp w16 h i8mm:          5.20x       5.98x        3.56x\n\nregular w64 h neon:          2.49x       2.46x        2.94x\nregular w64 h dotprod:       3.17x       3.60x        2.41x\nregular w64 h i8mm:          4.22x       4.40x        2.72x\n\n  sharp w64 h neon:          2.07x       2.06x        2.60x\n  sharp w64 h dotprod:       3.16x       3.58x        2.40x\n  sharp w64 h i8mm:          4.20x       4.38x        2.71x\n\nregular w8 v neon:           6.11x       8.05x        4.07x\nregular w8 v dotprod:        5.45x       8.15x        4.01x\nregular w8 v i8mm:           7.30x       9.46x        4.19x\n\n  sharp w8 v neon:           4.23x       5.46x        3.09x\n  sharp w8 v dotprod:        5.43x       7.96x        4.01x\n  sharp w8 v i8mm:           7.26x       9.12x        4.19x\n\nregular w16 v neon:          3.44x       4.33x        2.40x\nregular w16 v dotprod:       3.20x       4.53x        2.85x\nregular w16 v i8mm:          4.09x       5.27x        2.87x\n\n  sharp w16 v neon:          2.50x       3.14x        1.82x\n  sharp w16 v dotprod:       3.20x       4.52x        2.86x\n  sharp w16 v i8mm:          4.09x       5.15x        2.86x\n\nregular w64 v neon:          2.74x       3.11x        1.53x\nregular w64 v dotprod:       2.63x       3.30x        1.84x\nregular w64 v i8mm:          3.31x       3.73x        1.84x\n\n  sharp w64 v neon:          2.01x       2.29x        1.16x\n  sharp w64 v dotprod:       2.61x       3.27x        1.83x\n  sharp w64 v i8mm:          3.29x       3.68x        1.84x","shortMessageHtmlLink":"AArch64: Add basic i8mm support for convolutions"}},{"before":"cb8151c969a13c353ba5818d47b628633d1ebe23","after":"fbf23637ceae37a55ab8624ec31a705db6b6dea9","ref":"refs/heads/master","pushedAt":"2024-04-25T20:18:50.000Z","pushType":"push","commitsCount":6,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"AArch64: Simplify DotProd path of 2D subpel filters\n\nSimplify the DotProd code path of the 2D (horizontal-vertical) subpel\nfilters. It contains some instruction reordering and some macro\nsimplifications to be more similar to the upcoming i8mm version.\n\nThese changes have negligible effect on performance.\n\nCortex-A510:\nmc_8tap_regular_w2_hv_8bpc_dotprod:   8.3769 ->  8.3380\nmc_8tap_sharp_w2_hv_8bpc_dotprod:     9.5441 ->  9.5457\nmc_8tap_regular_w4_hv_8bpc_dotprod:   8.3422 ->  8.3444\nmc_8tap_sharp_w4_hv_8bpc_dotprod:     9.5441 ->  9.5367\nmc_8tap_regular_w8_hv_8bpc_dotprod:   9.9852 ->  9.9666\nmc_8tap_sharp_w8_hv_8bpc_dotprod:    12.5554 -> 12.5314\n\nCortex-A55:\nmc_8tap_regular_w2_hv_8bpc_dotprod:  6.4504  ->  6.4892\nmc_8tap_sharp_w2_hv_8bpc_dotprod:    7.5732  ->  7.6078\nmc_8tap_regular_w4_hv_8bpc_dotprod:  6.5088  ->  6.4760\nmc_8tap_sharp_w4_hv_8bpc_dotprod:    7.5796  ->  7.5763\nmc_8tap_regular_w8_hv_8bpc_dotprod:  9.3384  ->  9.3078\nmc_8tap_sharp_w8_hv_8bpc_dotprod:   11.1159  -> 11.1401\n\nCortex-A78:\nmc_8tap_regular_w2_hv_8bpc_dotprod:  1.4122  ->  1.4250\nmc_8tap_sharp_w2_hv_8bpc_dotprod:    1.7696  ->  1.7821\nmc_8tap_regular_w4_hv_8bpc_dotprod:  1.4243  ->  1.4243\nmc_8tap_sharp_w4_hv_8bpc_dotprod:    1.7866  ->  1.7863\nmc_8tap_regular_w8_hv_8bpc_dotprod:  2.5304  ->  2.5171\nmc_8tap_sharp_w8_hv_8bpc_dotprod:    3.0815  ->  3.0632\n\nCortex-X1:\nmc_8tap_regular_w2_hv_8bpc_dotprod:  0.8195  ->  0.8194\nmc_8tap_sharp_w2_hv_8bpc_dotprod:    1.0092  ->  1.0081\nmc_8tap_regular_w4_hv_8bpc_dotprod:  0.8197  ->  0.8166\nmc_8tap_sharp_w4_hv_8bpc_dotprod:    1.0089  ->  1.0068\nmc_8tap_regular_w8_hv_8bpc_dotprod:  1.5230  ->  1.5166\nmc_8tap_sharp_w8_hv_8bpc_dotprod:    1.8683  ->  1.8625","shortMessageHtmlLink":"AArch64: Simplify DotProd path of 2D subpel filters"}},{"before":"a9feab9bc18093d0a758b2dbfa6b4591e90bf3ed","after":"cb8151c969a13c353ba5818d47b628633d1ebe23","ref":"refs/heads/master","pushedAt":"2024-04-22T09:38:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"aarch64: Avoid unaligned jump tables\n\nManually add a padding 0 entry to make the odd number of .hword\nentries align with the instruction size.\n\nThis fixes assembling with GAS, with the --gdwarf2 option, where\nit previously produced the error message \"unaligned opcodes detected\nin executable segment\".\n\nThe message is slightly misleading, as the error is printed even\nif there actually are no opcodes that are misaligned, as the jump\ntable is the last thing within the .text section. The issue can\nbe reproduced with an input as small as this, assembled with\n\"as --gdwarf2 -c test.s\".\n\n        .text\n        nop\n        .hword 0\n\nSee a6228f47f0eebcdfebb1753a786e3e1654b51ea4 for earlier cases of\nthe same error - although in those cases, we actually did have more\ncode and labels following the unaligned jump tables.\n\nThis error is present with binutils 2.39 and earlier; in\nbinutils 2.40, this input no longer is considered an error, fixed\nin https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=6f6f5b0adc9efd103c434fd316e8c880a259775d.","shortMessageHtmlLink":"aarch64: Avoid unaligned jump tables"}},{"before":"585190177241ef4e77f462decdc6ad5b2cc5e5e6","after":"a9feab9bc18093d0a758b2dbfa6b4591e90bf3ed","ref":"refs/heads/master","pushedAt":"2024-04-21T11:04:24.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"ARM64: Minor msac improvements\n\nOne addressing optimization and fix some missing changes to a previous\ncommit that ported improvements from hi tok to other decode tok\nfunctions.","shortMessageHtmlLink":"ARM64: Minor msac improvements"}},{"before":"37d52435d1e839546e725b7b4116334d3b3a5bac","after":"585190177241ef4e77f462decdc6ad5b2cc5e5e6","ref":"refs/heads/master","pushedAt":"2024-04-21T10:30:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"CI: Move llvm crossfiles from image to project\n\nSince dav1d was the only user of these crossfiles, it was agreed upon to\nremove them from the image [0] and move to dav1d directly. [1]\n\n[0] https://code.videolan.org/videolan/docker-images/-/merge_requests/293\n[1] https://code.videolan.org/videolan/docker-images/-/merge_requests/294#note_434720","shortMessageHtmlLink":"CI: Move llvm crossfiles from image to project"}},{"before":"5b5399911dd24703de641d65eda5b7f1e845d060","after":"37d52435d1e839546e725b7b4116334d3b3a5bac","ref":"refs/heads/master","pushedAt":"2024-04-15T13:13:33.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"vlc-mirrorer","name":null,"path":"/vlc-mirrorer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/20596405?s=80&v=4"},"commit":{"message":"ARM64: Port msac improvements to more functions\n\nPort improvements from the hi token functions to the rest of the symbol\nadaption functions. These weren't originally ported since they didn't\nwork with arbitrary padding. In practice, zero padding is already used\nand only the tests need to be updated.\n\nResults - Neoverse N1\n\nOld:\nmsac_decode_symbol_adapt4_c:         41.4 ( 1.00x)\nmsac_decode_symbol_adapt4_neon:      31.0 ( 1.34x)\nmsac_decode_symbol_adapt8_c:         54.5 ( 1.00x)\nmsac_decode_symbol_adapt8_neon:      32.2 ( 1.69x)\nmsac_decode_symbol_adapt16_c:        85.6 ( 1.00x)\nmsac_decode_symbol_adapt16_neon:     37.5 ( 2.28x)\n\nNew:\nmsac_decode_symbol_adapt4_c:         41.5 ( 1.00x)\nmsac_decode_symbol_adapt4_neon:      27.7 ( 1.50x)\nmsac_decode_symbol_adapt8_c:         55.7 ( 1.00x)\nmsac_decode_symbol_adapt8_neon:      30.1 ( 1.85x)\nmsac_decode_symbol_adapt16_c:        82.4 ( 1.00x)\nmsac_decode_symbol_adapt16_neon:     35.2 ( 2.34x)","shortMessageHtmlLink":"ARM64: Port msac improvements to more functions"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEZ5rjagA","startCursor":null,"endCursor":null}},"title":"Activity · videolan/dav1d"}