Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resample: port resample_neon.h to aarch64 #8

Closed
wants to merge 1 commit into from

Conversation

fbarchard
Copy link
Contributor

port optimized inner_product_single and WORD2INT(x) for fixed
and floating point from 32 bit armv7 NEON to aarch64 NEON.

port optimized inner_product_single and WORD2INT(x) for fixed
    and floating point from 32 bit armv7 NEON to aarch64 NEON.
#if defined(__aarch64__)
static inline int32_t saturate_float_to_16bit(float a) {
int32_t ret;
asm ("fmov s0, %w[a]\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting an assembler error with gcc 4.9 here:
Error: operand 2 should be an integer register -- fmov s0,v1
Why not:

asm ("fcvtas s0, %s[a]\n"
     "sqxtn h0, s0\n"
     "sxtl v0.4s, v0.4h\n"
     "fmov %w[ret], s0\n"
     : [ret] "=r" (ret)
     : [a] "w" (a)
     : "v0");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, Aug 8, 2016 at 8:26 PM, Tristan Matthews notifications@github.com
wrote:

In libspeexdsp/resample_neon.h
#8 (comment):

+#elif defined(FLOATING_POINT)
+#if defined(aarch64)
+static inline int32_t saturate_float_to_16bit(float a) {

  • int32_t ret;
  • asm ("fmov s0, %w[a]\n"

I'm getting an assembler error with gcc 4.9 here:
Error: operand 2 should be an integer register -- fmov s0,v1

doh! My bad. I've reproduced the error with gcc 4.9. It was supposed to
be:

asm ("fmov s0, *%s[a]\*n"

..

Why not:

asm ("fcvtas s0, %s[a]\n" "sqxtn h0, s0\n" "sxtl v0.4s, v0.4h\n" "fmov %w[ret], s0\n" : [ret] "=r" (ret) : [a] "w" (a) : "v0");

at the time I had trouble with clang... it said it couldnt allocate a
register. perhaps because the float is in s0, and I used s0. Reguardless,
I'm able to build your code with clang 3.8 so should be fine.
Changing it to use v1 saves a mov

     asm ("fcvtas s1, %s[a]\n"
     "sqxtn h1, s1\n"
     "sxtl v1.4s, v1.4h\n"
     "fmov %w[ret], s1\n"
     : [ret] "=r" (ret)
     : [a] "w" (a)
     : "v1");

0: 5e21c801 fcvtas s1, s0
4: 5e614821 sqxtn h1, s1
8: 0f10a421 sxtl v1.4s, v1.4h
c: 1e260020 fmov w0, s1
10: d65f03c0 ret

You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/xiph/speexdsp/pull/8/files/7c20961d4688ca1a486557d88fefcfe4f6c11412#r73993202,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AOOYDR-ch7wMFbOcSTADN7F1NWm2nvKeks5qd_ODgaJpZM4JSTPL
.

@tmatth tmatth mentioned this pull request Jan 1, 2017
@lu-zero
Copy link
Contributor

lu-zero commented Jan 20, 2018

Probably @sasshka might help on that.

@tmatth
Copy link
Member

tmatth commented Jan 22, 2018

@lu-zero besides this, the existing NEON could also use some magic from @sasshka

@fbarchard
Copy link
Contributor Author

The 64 bit NEON in this pull request, is a direct port of the 32 bit version.
Both float and short resamplers are implemented and performance is identical to arm32.
The short version is faster due to small stores and the mla pipelines nicely to 1 register.
The thumb2 32 bit version runs faster on current cpus due to smaller instructions than arm32 and aarch64.

I tested it with clang, but if you're still seeing saturate_float_to_16bit build errors (compiler bug), try a simple '=r' for the output, which will generate 2 mov's. Or use tristans suggestion, which failed on older clang versions, but the bug is fixed on current versions of clang.

@tmatth
Copy link
Member

tmatth commented Jan 22, 2018

@fbarchard are you able to reproduce this issue by any chance?
#13

@tmatth
Copy link
Member

tmatth commented Jan 26, 2018

@fbarchard I squashed your patch and subsequent fix for GCC into this branch:
https://git.xiph.org/?p=speexdsp.git;a=shortlog;h=refs/heads/resample-aarch64
Let me know if you're able to test.

@lu-zero
Copy link
Contributor

lu-zero commented Feb 27, 2018

I tested on the odroid and opened #17 to make the experience a little nicer.

@tmatth
Copy link
Member

tmatth commented Jul 18, 2018

Merged as 8ce055a (finally!), thanks

@tmatth tmatth closed this Jul 18, 2018
jefferyto added a commit to jefferyto/openwrt-packages that referenced this pull request May 12, 2020
For speexdsp, support for NEON on aarch64 was added in 1.2.0[1].

[1]: xiph/speexdsp#8

Signed-off-by: Jeffery To <jeffery.to@gmail.com>
jefferyto added a commit to jefferyto/openwrt-packages that referenced this pull request May 14, 2020
For speexdsp, support for NEON on aarch64 was added in 1.2.0[1].

[1]: xiph/speexdsp#8

Signed-off-by: Jeffery To <jeffery.to@gmail.com>
(cherry picked from commit dec17eb)
jefferyto added a commit to jefferyto/openwrt-packages that referenced this pull request May 14, 2020
For speexdsp, support for NEON on aarch64 was added in 1.2.0[1].

[1]: xiph/speexdsp#8

Signed-off-by: Jeffery To <jeffery.to@gmail.com>
(cherry picked from commit dec17eb)
1715173329 pushed a commit to immortalwrt/packages that referenced this pull request May 16, 2020
For speexdsp, support for NEON on aarch64 was added in 1.2.0[1].

[1]: xiph/speexdsp#8

Signed-off-by: Jeffery To <jeffery.to@gmail.com>
jekkos pushed a commit to jekkos/packages that referenced this pull request May 22, 2020
For speexdsp, support for NEON on aarch64 was added in 1.2.0[1].

[1]: xiph/speexdsp#8

Signed-off-by: Jeffery To <jeffery.to@gmail.com>
farmergreg pushed a commit to farmergreg/packages that referenced this pull request Sep 8, 2020
For speexdsp, support for NEON on aarch64 was added in 1.2.0[1].

[1]: xiph/speexdsp#8

Signed-off-by: Jeffery To <jeffery.to@gmail.com>
farmergreg pushed a commit to farmergreg/packages that referenced this pull request Sep 8, 2020
For speexdsp, support for NEON on aarch64 was added in 1.2.0[1].

[1]: xiph/speexdsp#8

Signed-off-by: Jeffery To <jeffery.to@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants