Add Aarch64 support #183

coreyjjames · 2019-12-02T03:14:19Z

This pull request adds Aarch64 support to the FLAC project.
fixes #156

What is included in this pull request:

Edited configure script to check for aarch64 CPU and define FLAC__CPU_AARCH64 if aarch64 is detected.
Edited configure script to check for arm_neon.h and define FLAC__HAS_NEONINTRIN if arm_neon.h is detected.
Add the logic required to choose the aarch64 versions of the lpc_compute_autocorrelation function. When on an aarch64 machine.
Translated SSE intrinsics from the lpc_intrin_sse.c file into neon intrinsics and put them into a file called lpc_intrin_neon.c.
Added the new file lpc_intrin_neon.c to the src/libFLAC/Makefile.am.

Performance Boost to encoding:
I tested the performance with two aarch64 machines. The test I ran was encoding a .wav file to .flac. The size of the wave file was 1.57 gigabytes.
The machine, with a cortex-a57 8 threads, I got a performance increase of 106.57% for the compute autocorrelation function.
A savings of 0m8.281 seconds.
The machine, with a cortex-a53 24 threads, I got a performance increase of 254.6% for the compute autocorrelation function.
A savings of 1m40.166 seconds.

petterreinholdtsen · 2019-12-02T05:41:26Z

Will this patch do CPU feature detection at runtime or compile time? It look like it will do it at compile time, and I wonder if it is better to do it at run time?

coreyjjames · 2019-12-02T05:46:33Z

From what I could see all CPU detection for this project is done at compile time. So I followed the convention that the project is currently doing.

petterreinholdtsen · 2019-12-02T06:35:38Z

To me it look like the sorting into x86 or ppc is done at compile time, and the feature detection is done at run time. Does something like that make sense for aarch64?

coreyjjames · 2019-12-02T06:52:42Z

Oh sorry I miss understood you. Yes, I implemented the featured detection for aarch64 at runtime. I followed the same pattern as the x86 and ppc. The program will decide what feature to run based on conditions at runtime. For example, it will choose which auto correlation function to run based on the variable encoder->protected_->max_lpc_order.

erikd · 2020-01-04T03:46:12Z

Sorry, forgot about / lost track of this PR.

@coreyjjames would you be able to squash this down to a single commit? Also, is there anyway we can set up CI for Aarch64?

coreyjjames · 2020-01-04T05:00:58Z

@erikd I squashed the commits into a single commit and check out this link from the Travis CI documentation. It looks like Travis CI should be able to do Arrch64 testing. https://docs.travis-ci.com/user/multi-cpu-architectures/

coreyjjames · 2020-01-07T05:38:55Z

@erikd I added your PR #191 to this branch. I think the issue may be a problem with the GCC compiler version. Travis is using GCC version 5.4.1 and I tested with GCC version 8.3.1.

erikd · 2020-01-07T09:44:17Z

@coreyjjames What Linux distro are you running? Are you able to figure out which header file provides these functions and what package provides that header file?

coreyjjames · 2020-01-20T02:02:15Z

@erikd Been busy just got some time to look into this issue again. So it seems like the intrinsic "vcopyq_laneq_f32" causes the problem.

From my research, the reason for the issue is the "vcopyq_laneq_f32" intrinsic it is one of the Aarch64 exclusive intrinsic and Travis CI is running arm64 (ARMv8) that is why we are getting an error.

I am going to look into a substitute for the "vcopyq_laneq_f32" intrinsic. I am going to see if I can find a solution that is more compatible with the different versions of ARM.

NotTsunami · 2020-01-23T17:14:39Z

Can you update the CMake configuration as well?

NotTsunami · 2020-05-21T17:49:55Z

Can you rebase this again? The CPU detection for CMake was recently changed so that might be the fault of the falsely reported optimizations. @coreyjjames

coreyjjames · 2020-05-21T17:54:59Z

@NotTsunami

Ya, I can rebase again.

Also, are you seeing the issues on the arm64 and OSX builds?
I just noticed it is building with arm64 OSX that is a bug with the merge. On my fork, Travis did not do that. It is not supposed to build with arm64 and OSX.

NotTsunami · 2020-05-21T17:55:54Z

Exactly that.

coreyjjames · 2020-05-21T21:52:48Z

I just pushed the rebase, and Travis is still building with arm64 and OSX. On my fork, it does not do that. Does anyone know why it is doing this?

coreyjjames · 2020-05-24T04:25:22Z

I believe I fixed the problem with Travis building with arm and osx.

I am still looking into why the Autotools tests are considerably slower than the Cmake tests. From my debugging so far, the Autotools tests seem to be running the optimization code, but I am not getting the speed increase as I see with Cmake tests.

I am thinking its something with the build files if anybody wants to take a look, I would appreciate a second perspective.

NotTsunami · 2020-05-24T22:33:53Z

configure.ac

@@ -156,6 +156,12 @@ case "$host_cpu" in
 		AH_TEMPLATE(FLAC__CPU_PPC, [define if building for PowerPC])
 		asm_optimisation=$asm_opt
 		;;
+	arm*|aarch64*)


I believe all we need is:
arm64|aarch64)

Otherwise, armv7 gets caught up here too (and is still very common). I don't believe we need anything else here, but someone with more architecture knowledge can chime in.

NotTsunami · 2020-05-24T22:37:13Z

src/libFLAC/CMakeLists.txt

+elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "([xX]|i[346])86")
+    set(IA32 TRUE)
+    add_definitions(-DFLAC__CPU_IA32 -DFLAC__ALIGN_MALLOC_DATA)
+elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "(arm|aarch64)")


Should probably be:
elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "(arm64|aarch64)")

Otherwise you will catch armv7, which I don't believe you are supporting.

NotTsunami · 2020-07-16T02:58:18Z

src/libFLAC/include/private/cpu.h

@@ -154,6 +154,7 @@ typedef enum {
 	FLAC__CPUINFO_TYPE_IA32,
 	FLAC__CPUINFO_TYPE_X86_64,
 	FLAC__CPUINFO_TYPE_PPC,
+	FLAC__CPUINFO_TYPE_ARM,


Revisited this today, and I think FLAC__CPUINFO_TYPE_ARM64 is a more accurate name

NotTsunami · 2020-07-16T02:59:13Z

Any progress on debugging? Hopefully this can get merged soon.

@coreyjjames Friendly ping

coreyjjames · 2020-07-16T13:09:01Z

@NotTsunami

Hey,

Still working on the auto tools build configuration. I am thinking there is a compiler option that needs to be set or unset. That is causing the tests to run slow.

Either that or everything is correct and it just runs slower building with autotools vs CMake.

I should have some time tonight to work on it.

coreyjjames · 2020-08-25T01:16:40Z

Update on progress

I cannot find any issue that would explain why the Autotools test suite is significantly slower than the CMake test suite. I did notice that on all platforms, the Autotools test suite is slower than the CMake test suite; this is leading me to think that it is correct, and if we want it to build faster, more optimizations need to be added to improve the performance.

I believe I am ready for this pull request to be reviewed, please let me know if you would like any changes to be made.

nnghuy · 2020-11-21T18:38:10Z

I found myself here after seeing the native Apple M1 ARM64 vs Rosetta translated FLAC encoding benchmark on Phoronix.

FWIW, this PR compiles and runs on an M1 Air.

As a sanity check, I ran the benchmark above with native 1.3.3 and got a similar 28.x sec. I reran with this PR and got 14.x sec.

Rosetta translation is still slightly faster (!!!), but the current improvements are large.

coreyjjames · 2020-11-21T18:51:39Z

@nnghuy Cool, Thanks for testing this PR on the new M1 chip! Glad to hear it compiles and runs.

This PR basically only adds one optimization to the FLAC project. It would be interesting if we could beat Rosetta's score with more optimizations.

moisespr123 · 2020-12-21T19:43:34Z

Seems to be running fine on Termux on a Samsung Galaxy S20 Ultra. Great job!

ktmf01 · 2022-04-29T19:45:43Z

Closing because of the merge of #270 and #332

coreyjjames force-pushed the addAarch64Support branch from f1d79f2 to 9f08b02 Compare January 4, 2020 04:53

coreyjjames force-pushed the addAarch64Support branch from 9f08b02 to 9242aab Compare January 7, 2020 04:04

NotTsunami mentioned this pull request May 21, 2020

Aarch64 support #156

Closed

Add Aarch64 Support

f592baa

coreyjjames force-pushed the addAarch64Support branch from 8143fe5 to f592baa Compare May 21, 2020 21:47

coreyjjames added 3 commits May 22, 2020 08:53

update configure

cfbde3e

test

7dbb28a

test

605dccf

coreyjjames force-pushed the addAarch64Support branch from 878afa7 to 605dccf Compare May 24, 2020 04:09

Fix Travis

6223ab6

coreyjjames force-pushed the addAarch64Support branch from fadc721 to 6223ab6 Compare May 24, 2020 04:12

NotTsunami suggested changes May 24, 2020

View reviewed changes

NotTsunami reviewed Jul 16, 2020

View reviewed changes

Fix cpu conditions

80ae044

tmkk added a commit to tmkk/flac that referenced this pull request Dec 9, 2020

Apply patches from xiph#183

2623b74

ktmf01 mentioned this pull request Jul 7, 2021

Change autoc[] to double #245

Closed

ktmf01 mentioned this pull request Jan 5, 2022

Adding ARM64 support and optimized Neon implementation #270

Merged

ktmf01 mentioned this pull request Apr 29, 2022

Add ARM64 NEON intrinsics lpc_compute_autocorrelation routines #332

Merged

ktmf01 closed this Apr 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Aarch64 support #183

Add Aarch64 support #183

coreyjjames commented Dec 2, 2019 •

edited

Loading

petterreinholdtsen commented Dec 2, 2019

coreyjjames commented Dec 2, 2019

petterreinholdtsen commented Dec 2, 2019

coreyjjames commented Dec 2, 2019

erikd commented Jan 4, 2020

coreyjjames commented Jan 4, 2020

coreyjjames commented Jan 7, 2020 •

edited

Loading

erikd commented Jan 7, 2020

coreyjjames commented Jan 20, 2020

NotTsunami commented Jan 23, 2020 •

edited

Loading

NotTsunami commented May 21, 2020

coreyjjames commented May 21, 2020

NotTsunami commented May 21, 2020

coreyjjames commented May 21, 2020

coreyjjames commented May 24, 2020

NotTsunami May 24, 2020

NotTsunami May 24, 2020

NotTsunami Jul 16, 2020

NotTsunami commented Jul 16, 2020

coreyjjames commented Jul 16, 2020

coreyjjames commented Aug 25, 2020

nnghuy commented Nov 21, 2020 •

edited

Loading

coreyjjames commented Nov 21, 2020

moisespr123 commented Dec 21, 2020

ktmf01 commented Apr 29, 2022

Add Aarch64 support #183

Add Aarch64 support #183

Conversation

coreyjjames commented Dec 2, 2019 • edited Loading

petterreinholdtsen commented Dec 2, 2019

coreyjjames commented Dec 2, 2019

petterreinholdtsen commented Dec 2, 2019

coreyjjames commented Dec 2, 2019

erikd commented Jan 4, 2020

coreyjjames commented Jan 4, 2020

coreyjjames commented Jan 7, 2020 • edited Loading

erikd commented Jan 7, 2020

coreyjjames commented Jan 20, 2020

NotTsunami commented Jan 23, 2020 • edited Loading

NotTsunami commented May 21, 2020

coreyjjames commented May 21, 2020

NotTsunami commented May 21, 2020

coreyjjames commented May 21, 2020

coreyjjames commented May 24, 2020

NotTsunami May 24, 2020

Choose a reason for hiding this comment

NotTsunami May 24, 2020

Choose a reason for hiding this comment

NotTsunami Jul 16, 2020

Choose a reason for hiding this comment

NotTsunami commented Jul 16, 2020

coreyjjames commented Jul 16, 2020

coreyjjames commented Aug 25, 2020

Update on progress

nnghuy commented Nov 21, 2020 • edited Loading

coreyjjames commented Nov 21, 2020

moisespr123 commented Dec 21, 2020

ktmf01 commented Apr 29, 2022

coreyjjames commented Dec 2, 2019 •

edited

Loading

coreyjjames commented Jan 7, 2020 •

edited

Loading

NotTsunami commented Jan 23, 2020 •

edited

Loading

nnghuy commented Nov 21, 2020 •

edited

Loading