Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LACE / NoLACE and DRED on Fixed Point implementations? #318

Open
expresspotato opened this issue Mar 5, 2024 · 10 comments
Open

LACE / NoLACE and DRED on Fixed Point implementations? #318

expresspotato opened this issue Mar 5, 2024 · 10 comments

Comments

@expresspotato
Copy link

Hi,

The new ML algorithms in v1.5 are really impressive. It looks like they're only for implementations of OPUS that are Floating Point.

I'm compiling here for Xtensa LX6 (ESP32) which doesn't have a hard FPU and thus need the Fixed Point implementation to have any real-time audio encoding / decoding.

I haven't really dug into the code, but my guess is the networks are represented in and presented with floating point values.

make clean && ./configure CC=/Users/kevin/.espressif/tools/xtensa-esp32-elf/esp-2021r2-patch3-8.4.0/xtensa-esp32-elf/bin/xtensa-esp32-elf-gcc --host=xtensa --disable-extra-programs --enable-osce --disable-hardening --disable-doc --enable-asm --enable-fixed-point && make CC=/Users/kevin/.espressif/tools/xtensa-esp32-elf/esp-2021r2-patch3-8.4.0/xtensa-esp32-elf/bin/xtensa-esp32-elf-gcc

configure:
------------------------------------------------------------------------
  opus 1.5.1-dirty:  Automatic configuration OK.

    Compiler support:

      C99 var arrays: ................ yes
      C99 lrintf: .................... yes
      Use alloca: .................... no (using var arrays)

    General configuration:

      Floating point support: ........ no
      Fast float approximations: ..... no
      Fixed point debugging: ......... no
      Inline Assembly Optimizations: . No inline ASM for your platform, please send patches
      External Assembly Optimizations: 
      Intrinsics Optimizations: ...... no
      Run-time CPU detection: ........ no
      Custom modes: .................. no
      Assertion checking: ............ no
      Hardening: ..................... no
      Fuzzing: ....................... no
      Check ASM: ..................... no

      API documentation: ............. no
      Extra programs: ................ no
------------------------------------------------------------------------

 Type "make; make install" to compile and install
 Type "make check" to run the test suite

/Applications/Xcode.app/Contents/Developer/usr/bin/make  all-recursive
  CC       celt/bands.lo
  CC       celt/celt.lo
  CC       celt/celt_encoder.lo
  CC       celt/celt_decoder.lo
In file included from /Users/kevin/.espressif/tools/xtensa-esp32-elf/esp-2021r2-patch3-8.4.0/xtensa-esp32-elf/xtensa-esp32-elf/sys-include/string.h:180,
                 from celt/os_support.h:41,
                 from celt/celt_decoder.c:37:
celt/celt_decoder.c: In function 'celt_decode_lost':
celt/os_support.h:79:83: error: invalid operands to binary - (have 'float *' and 'celt_sig *' {aka 'int *'})
 #define OPUS_COPY(dst, src, n) (memcpy((dst), (src), (n)*sizeof(*(dst)) + 0*((dst)-(src)) ))
                                                                              ~~~~~^~~~~~
celt/celt_decoder.c:914:13: note: in expansion of macro 'OPUS_COPY'
             OPUS_COPY(buf_copy+c*overlap, &decode_mem[c][DECODE_BUFFER_SIZE-N], overlap);
             ^~~~~~~~~
celt/os_support.h:79:83: error: invalid operands to binary - (have 'float *' and 'celt_sig *' {aka 'int *'})
 #define OPUS_COPY(dst, src, n) (memcpy((dst), (src), (n)*sizeof(*(dst)) + 0*((dst)-(src)) ))
                                                                              ~~~~~^~~~~~
celt/celt_decoder.c:914:13: note: in expansion of macro 'OPUS_COPY'
             OPUS_COPY(buf_copy+c*overlap, &decode_mem[c][DECODE_BUFFER_SIZE-N], overlap);
             ^~~~~~~~~
make[2]: *** [celt/celt_decoder.lo] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

@jmvalin
Copy link
Member

jmvalin commented Mar 5, 2024

Correct. All new DNN-based features are floating-point only. The reasoning is that most of the chips that are powerful enough to run that DNN code will also have an FPU. So at least for now (things can change) there's no plan to implement those in fixed-point.

@bateyejoe
Copy link

Here's my vote for fixed point support of all features. In my testing (mainly speech), the fixed point build of 1.5 uses only about 2/3 the cpu time of the floating point build when encoding complexity is above 5. My application is for a high-density server, so in moving to 1.5, I have to make a choice between a decrease in density to get PLC and LACE/NoLACE, or a significant increase in density if I used 1.5 fixed point and lose the new features.

@jmvalin
Copy link
Member

jmvalin commented Mar 12, 2024

On most modern chips floating-point should actually be faster than fixed-point. Maybe there's some optimization that isn't getting enabled.

@xnorpx
Copy link
Contributor

xnorpx commented Mar 12, 2024

Try enable fast-math, float-approx and if you run a server with known hardware from this decade you can presume avx2 and sse 4.2 of your opus build.

@bateyejoe
Copy link

bateyejoe commented Mar 13, 2024

Try enable fast-math, float-approx and if you run a server with known hardware from this decade you can presume avx2 and sse 4.2 of your opus build.

Where do I find the "fast-math" option? float-approx is enabled. I have MAY_HAVE_SSE4_1 and MAY_HAVE_AVX2 enabled, but only presume up to SSE2. Could the run-time dispatching account for such a big difference? I can set those to PRESUME and give it a try. Testing on a Core i9-13900, btw.

@xnorpx
Copy link
Contributor

xnorpx commented Mar 13, 2024

Try enable fast-math, float-approx and if you run a server with known hardware from this decade you can presume avx2 and sse 4.2 of your opus build.

Where do I find the "fast-math" option? float-approx is enabled. I have MAY_HAVE_SSE4_1 and MAY_HAVE_AVX2 enabled, but only presume up to SSE2. Could the run-time dispatching account for such a big difference? I can set those to PRESUME and give it a try. Testing on a Core i9-13900, btw.

What build system are you using? Autotools, CMake or Meson?

@bateyejoe
Copy link

bateyejoe commented Mar 13, 2024

What build system are you using? Autotools, CMake or Meson?

Using our own cmake-based system. I started off with the linux build and generated a Makefile with configure. I used that to build our CMakeLists.txt with just the options we need. The only difference in options between the windows and linux builds was linux had VAR_ARRAY enabled and windows has ALLOCA enabled instead.

I just completed rebuilding with PRESUME for sse4.1 and avx2 and re-ran the benchmarks and now, to my surprise, the 1.5-fixed and 1.5-float results are much closer. Either my initial test run was flawed, or the PRESUME makes a pretty large difference. Will try going back to MAY_HAVE for sse4.1 and avx2 and let you know if that was really the difference.

@xnorpx
Copy link
Contributor

xnorpx commented Mar 13, 2024

@bateyejoe if you have custom then you are on your own :) you can look at the opus CMakefiles and see how it is enabling the following options.

OPUS_FLOAT_APPROX, enable floating point approximations (Ensure your platform supports IEEE 754 before enabling).
OPUS_FAST_MATH, enable fast math (unsupported and discouraged use, as code is not well tested with this build option).
OPUS_X86_PRESUME_SSE4_1, assume target CPU has SSE4.1 support (override runtime check).
OPUS_X86_PRESUME_AVX2, assume target CPU has AVX FMA AVX2 support (override runtime check).

It's some defines and some compiler flags.

@jmvalin
Copy link
Member

jmvalin commented Mar 13, 2024

It's possible you never actually enabled the RTCD, which would prevent the code from taking advantage of any of the MAY_HAVEs.

@bateyejoe
Copy link

It's possible you never actually enabled the RTCD, which would prevent the code from taking advantage of any of the MAY_HAVEs.

I think you're right. Switching back to MAY_HAVE-only still performs on par with the fixed version, so I obviously missed something in that first config.

In any case, I don't think fixed support is completely worthless on modern processors. As I understand it, with multithread cores, simultaneous execution of integer and float operations is possible, so having workloads with both integer and float math is beneficial. In our case, we already have quite a bit of float math going on which is one of the reasons we chose the Opus fixed build in the past.

Thanks for assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants