resample: port resample_neon.h to aarch64 #8

fbarchard · 2016-07-21T23:04:13Z

port optimized inner_product_single and WORD2INT(x) for fixed
and floating point from 32 bit armv7 NEON to aarch64 NEON.

port optimized inner_product_single and WORD2INT(x) for fixed and floating point from 32 bit armv7 NEON to aarch64 NEON.

tmatth · 2016-08-09T03:26:58Z

libspeexdsp/resample_neon.h

+#if defined(__aarch64__)
+static inline int32_t saturate_float_to_16bit(float a) {
+    int32_t ret;
+    asm ("fmov s0, %w[a]\n"


I'm getting an assembler error with gcc 4.9 here:
Error: operand 2 should be an integer register -- fmov s0,v1
Why not:

asm ("fcvtas s0, %s[a]\n" "sqxtn h0, s0\n" "sxtl v0.4s, v0.4h\n" "fmov %w[ret], s0\n" : [ret] "=r" (ret) : [a] "w" (a) : "v0");

On Mon, Aug 8, 2016 at 8:26 PM, Tristan Matthews notifications@github.com
wrote:

In libspeexdsp/resample_neon.h
#8 (comment):

+#elif defined(FLOATING_POINT)
+#if defined(aarch64)
+static inline int32_t saturate_float_to_16bit(float a) {

int32_t ret;

asm ("fmov s0, %w[a]\n"

I'm getting an assembler error with gcc 4.9 here:
Error: operand 2 should be an integer register -- fmov s0,v1

doh! My bad. I've reproduced the error with gcc 4.9. It was supposed to
be:

asm ("fmov s0, *%s[a]\*n"

..

Why not:

asm ("fcvtas s0, %s[a]\n" "sqxtn h0, s0\n" "sxtl v0.4s, v0.4h\n" "fmov %w[ret], s0\n" : [ret] "=r" (ret) : [a] "w" (a) : "v0");

at the time I had trouble with clang... it said it couldnt allocate a
register. perhaps because the float is in s0, and I used s0. Reguardless,
I'm able to build your code with clang 3.8 so should be fine.
Changing it to use v1 saves a mov

asm ("fcvtas s1, %s[a]\n" "sqxtn h1, s1\n" "sxtl v1.4s, v1.4h\n" "fmov %w[ret], s1\n" : [ret] "=r" (ret) : [a] "w" (a) : "v1");

0: 5e21c801 fcvtas s1, s0
4: 5e614821 sqxtn h1, s1
8: 0f10a421 sxtl v1.4s, v1.4h
c: 1e260020 fmov w0, s1
10: d65f03c0 ret

—

You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/xiph/speexdsp/pull/8/files/7c20961d4688ca1a486557d88fefcfe4f6c11412#r73993202,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AOOYDR-ch7wMFbOcSTADN7F1NWm2nvKeks5qd_ODgaJpZM4JSTPL
.

lu-zero · 2018-01-20T10:38:17Z

Probably @sasshka might help on that.

tmatth · 2018-01-22T17:30:18Z

@lu-zero besides this, the existing NEON could also use some magic from @sasshka

fbarchard · 2018-01-22T20:17:48Z

The 64 bit NEON in this pull request, is a direct port of the 32 bit version.
Both float and short resamplers are implemented and performance is identical to arm32.
The short version is faster due to small stores and the mla pipelines nicely to 1 register.
The thumb2 32 bit version runs faster on current cpus due to smaller instructions than arm32 and aarch64.

I tested it with clang, but if you're still seeing saturate_float_to_16bit build errors (compiler bug), try a simple '=r' for the output, which will generate 2 mov's. Or use tristans suggestion, which failed on older clang versions, but the bug is fixed on current versions of clang.

tmatth · 2018-01-22T20:24:32Z

@fbarchard are you able to reproduce this issue by any chance?
#13

tmatth · 2018-01-26T03:49:57Z

@fbarchard I squashed your patch and subsequent fix for GCC into this branch:
https://git.xiph.org/?p=speexdsp.git;a=shortlog;h=refs/heads/resample-aarch64
Let me know if you're able to test.

lu-zero · 2018-02-27T13:45:26Z

I tested on the odroid and opened #17 to make the experience a little nicer.

tmatth · 2018-07-18T03:56:18Z

Merged as 8ce055a (finally!), thanks

For speexdsp, support for NEON on aarch64 was added in 1.2.0[1]. [1]: xiph/speexdsp#8 Signed-off-by: Jeffery To <jeffery.to@gmail.com>

For speexdsp, support for NEON on aarch64 was added in 1.2.0[1]. [1]: xiph/speexdsp#8 Signed-off-by: Jeffery To <jeffery.to@gmail.com> (cherry picked from commit dec17eb)

For speexdsp, support for NEON on aarch64 was added in 1.2.0[1]. [1]: xiph/speexdsp#8 Signed-off-by: Jeffery To <jeffery.to@gmail.com>

resample: port resample_neon.h to aarch64

7c20961

port optimized inner_product_single and WORD2INT(x) for fixed and floating point from 32 bit armv7 NEON to aarch64 NEON.

fbarchard mentioned this pull request Jul 21, 2016

port resample_neon.h to aarch64 #7

Closed

tmatth reviewed Aug 9, 2016
View reviewed changes

tmatth mentioned this pull request Jan 1, 2017

new release #9

Closed

tmatth closed this Jul 18, 2018

jefferyto mentioned this pull request May 12, 2020

treewide: Enable VFP/NEON optimizations for aarch64 openwrt/packages#12152

Merged

1715173329 pushed a commit to immortalwrt/packages that referenced this pull request May 16, 2020

treewide: Enable VFP/NEON optimizations for aarch64

dce5809

For speexdsp, support for NEON on aarch64 was added in 1.2.0[1]. [1]: xiph/speexdsp#8 Signed-off-by: Jeffery To <jeffery.to@gmail.com>

jekkos pushed a commit to jekkos/packages that referenced this pull request May 22, 2020

treewide: Enable VFP/NEON optimizations for aarch64

65724c5

For speexdsp, support for NEON on aarch64 was added in 1.2.0[1]. [1]: xiph/speexdsp#8 Signed-off-by: Jeffery To <jeffery.to@gmail.com>

farmergreg pushed a commit to farmergreg/packages that referenced this pull request Sep 8, 2020

treewide: Enable VFP/NEON optimizations for aarch64

95e07ae

For speexdsp, support for NEON on aarch64 was added in 1.2.0[1]. [1]: xiph/speexdsp#8 Signed-off-by: Jeffery To <jeffery.to@gmail.com>

farmergreg pushed a commit to farmergreg/packages that referenced this pull request Sep 8, 2020

treewide: Enable VFP/NEON optimizations for aarch64

7cb972a

For speexdsp, support for NEON on aarch64 was added in 1.2.0[1]. [1]: xiph/speexdsp#8 Signed-off-by: Jeffery To <jeffery.to@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resample: port resample_neon.h to aarch64 #8

resample: port resample_neon.h to aarch64 #8

fbarchard commented Jul 21, 2016

tmatth Aug 9, 2016

fbarchard Aug 10, 2016

lu-zero commented Jan 20, 2018

tmatth commented Jan 22, 2018

fbarchard commented Jan 22, 2018

tmatth commented Jan 22, 2018

tmatth commented Jan 26, 2018

lu-zero commented Feb 27, 2018

tmatth commented Jul 18, 2018

resample: port resample_neon.h to aarch64 #8

resample: port resample_neon.h to aarch64 #8

Conversation

fbarchard commented Jul 21, 2016

tmatth Aug 9, 2016

Choose a reason for hiding this comment

fbarchard Aug 10, 2016

Choose a reason for hiding this comment

lu-zero commented Jan 20, 2018

tmatth commented Jan 22, 2018

fbarchard commented Jan 22, 2018

tmatth commented Jan 22, 2018

tmatth commented Jan 26, 2018

lu-zero commented Feb 27, 2018

tmatth commented Jul 18, 2018