Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Aarch64 support #183

Open
wants to merge 2 commits into
base: master
from
Open

Conversation

@coreyjjames
Copy link

coreyjjames commented Dec 2, 2019

This pull request adds Aarch64 support to the FLAC project.
fixes #156

What is included in this pull request:

  • Edited configure script to check for aarch64 CPU and define FLAC__CPU_AARCH64 if aarch64 is detected.
  • Edited configure script to check for arm_neon.h and define FLAC__HAS_NEONINTRIN if arm_neon.h is detected.
  • Add the logic required to choose the aarch64 versions of the lpc_compute_autocorrelation function. When on an aarch64 machine.
  • Translated SSE intrinsics from the lpc_intrin_sse.c file into neon intrinsics and put them into a file called lpc_intrin_neon.c.
  • Added the new file lpc_intrin_neon.c to the src/libFLAC/Makefile.am.

Performance Boost to encoding:
I tested the performance with two aarch64 machines. The test I ran was encoding a .wav file to .flac. The size of the wave file was 1.57 gigabytes.
The machine, with a cortex-a57 8 threads, I got a performance increase of 106.57% for the compute autocorrelation function.
A savings of 0m8.281 seconds.
The machine, with a cortex-a53 24 threads, I got a performance increase of 254.6% for the compute autocorrelation function.
A savings of 1m40.166 seconds.

@petterreinholdtsen

This comment has been minimized.

Copy link

petterreinholdtsen commented Dec 2, 2019

Will this patch do CPU feature detection at runtime or compile time? It look like it will do it at compile time, and I wonder if it is better to do it at run time?

@coreyjjames

This comment has been minimized.

Copy link
Author

coreyjjames commented Dec 2, 2019

From what I could see all CPU detection for this project is done at compile time. So I followed the convention that the project is currently doing.

@petterreinholdtsen

This comment has been minimized.

Copy link

petterreinholdtsen commented Dec 2, 2019

To me it look like the sorting into x86 or ppc is done at compile time, and the feature detection is done at run time. Does something like that make sense for aarch64?

@coreyjjames

This comment has been minimized.

Copy link
Author

coreyjjames commented Dec 2, 2019

Oh sorry I miss understood you. Yes, I implemented the featured detection for aarch64 at runtime. I followed the same pattern as the x86 and ppc. The program will decide what feature to run based on conditions at runtime. For example, it will choose which auto correlation function to run based on the variable encoder->protected_->max_lpc_order.

@erikd

This comment has been minimized.

Copy link
Member

erikd commented Jan 4, 2020

Sorry, forgot about / lost track of this PR.

@coreyjjames would you be able to squash this down to a single commit? Also, is there anyway we can set up CI for Aarch64?

@coreyjjames coreyjjames force-pushed the coreyjjames:addAarch64Support branch from f1d79f2 to 9f08b02 Jan 4, 2020
@coreyjjames

This comment has been minimized.

Copy link
Author

coreyjjames commented Jan 4, 2020

@erikd I squashed the commits into a single commit and check out this link from the Travis CI documentation. It looks like Travis CI should be able to do Arrch64 testing. https://docs.travis-ci.com/user/multi-cpu-architectures/

@coreyjjames coreyjjames force-pushed the coreyjjames:addAarch64Support branch from 9f08b02 to 9242aab Jan 7, 2020
@coreyjjames

This comment has been minimized.

Copy link
Author

coreyjjames commented Jan 7, 2020

@erikd I added your PR #191 to this branch. I think the issue may be a problem with the GCC compiler version. Travis is using GCC version 5.4.1 and I tested with GCC version 8.3.1.

@erikd

This comment has been minimized.

Copy link
Member

erikd commented Jan 7, 2020

@coreyjjames What Linux distro are you running? Are you able to figure out which header file provides these functions and what package provides that header file?

@coreyjjames

This comment has been minimized.

Copy link
Author

coreyjjames commented Jan 20, 2020

@erikd Been busy just got some time to look into this issue again. So it seems like the intrinsic "vcopyq_laneq_f32" causes the problem.

From my research, the reason for the issue is the "vcopyq_laneq_f32" intrinsic it is one of the Aarch64 exclusive intrinsic and Travis CI is running arm64 (ARMv8) that is why we are getting an error.

I am going to look into a substitute for the "vcopyq_laneq_f32" intrinsic. I am going to see if I can find a solution that is more compatible with the different versions of ARM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.