ARM performance #17

awesie · 2017-06-27T07:54:02Z

I added a couple of patches to the experimental branch to hopefully improve CPU usage on ARM. It may degrade receiver performance however.

@mrbubble62 In #15 you mentioned your ARM R8 platform was not fast enough, could you test with a new build using these options: cmake -DUSE_THREADS=ON -DUSE_NEON=ON -DUSE_FAST_MATH=ON ..

On a Raspberry Pi 3, using only 1 CPU core, the average CPU usage is 60~70%.

The text was updated successfully, but these errors were encountered:

mrbubble62 · 2017-06-28T02:54:42Z

On Allwinner R8 (C.H.I.P) still more work to do. Definite improvement, playing back sample.xz, dropping out <50% vs >90%
CMAKE_C_FLAGS "-mcpu=cortex-a8 -mfloat-abi=hard -mfpu=neon"
-DUSE_THREADS=ON -DUSE_NEON=ON -DUSE_FAST_MATH=ON

Have to say works brilliantly on i386 :) TY

awesie · 2017-06-28T03:01:09Z

If you are testing with sample.xz, make sure that you decompress it first, and then test the performance. The xz tool itself will use quite a bit of CPU.

mrbubble62 · 2017-06-28T03:17:36Z

decompressed sample but no detectable difference with nrsc5 -r ../support/sample 0

awesie · 2017-06-28T03:24:47Z

Great to know, thanks!

awesie · 2017-06-29T07:35:39Z

I decreased the number of taps in the filters when USE_FAST_MATH is set. This should shave off another 10~20% of CPU usage. I would be curious if this makes things any better.

Useful metrics for performance would be:

time src/nrsc5 -r sample -o /dev/null -f wav -q 0
time src/nrsc5 -r sample -o /dev/null -f adts -q 0

This will tell how much time is required to process the data, and how much time is required to process the data and decode to audio.

mrbubble62 · 2017-07-01T12:06:38Z

results

chip@chip:~/nrsc5/build$ time src/nrsc5 -r sample -o /dev/null -f adts -q 0
real    0m0.238s
user    0m0.215s
sys     0m0.020s
chip@chip:~/nrsc5/build$ time src/nrsc5 -r sample -o /dev/null -f wav -q 0
real    0m0.218s
user    0m0.205s
sys     0m0.015s

mrbubble62 · 2017-07-01T12:15:32Z

Performance has definitely improved, from strong signal audio decodes occasionally.

chip@chip:~/nrsc5/build$ nrsc5 -p 12  88500000 0
12:10:30 INFO  main.c:176: [0] Generic RTL2832U OEM
Found Rafael Micro R820T tuner
Exact sample rate is: 1488375.071248 Hz
12:10:31 INFO  main.c:63: Gain: 0.0 dB, CNR: 13.824152 dB
12:10:31 INFO  main.c:63: Gain: 0.9 dB, CNR: 14.034353 dB
12:10:32 INFO  main.c:63: Gain: 1.4 dB, CNR: 14.064837 dB
12:10:32 INFO  main.c:63: Gain: 2.7 dB, CNR: 14.218107 dB
12:10:32 INFO  main.c:63: Gain: 3.7 dB, CNR: 14.165344 dB
12:10:33 INFO  main.c:63: Gain: 7.7 dB, CNR: 13.962760 dB
12:10:33 INFO  main.c:63: Gain: 8.7 dB, CNR: 13.858078 dB
12:10:33 INFO  main.c:63: Gain: 12.5 dB, CNR: 13.359507 dB
12:10:34 INFO  main.c:63: Gain: 14.4 dB, CNR: 13.144488 dB
12:10:34 INFO  main.c:63: Gain: 15.7 dB, CNR: 12.828616 dB
12:10:35 INFO  main.c:63: Gain: 16.6 dB, CNR: 12.347807 dB
12:10:35 INFO  main.c:63: Gain: 19.7 dB, CNR: 10.950316 dB
12:10:35 DEBUG main.c:67: Best gain: 27
12:10:38 INFO  input.c:154: CFO: 1090.118408 Hz (12 ppm)
12:10:38 DEBUG sync.c:244: First block @ 15
12:10:39 INFO  sync.c:222: Synchronized!
12:10:41 INFO  sync.c:298: MER: 7.237570 dB (lower), 7.242609 dB (upper)
12:10:41 INFO  decode.c:74: BER: 0.000027, avg: 0.000027, min: 0.000027, max: 0.000027
12:10:41 DEBUG frame.c:168: pdu_seq: 1, seq: 32, nop: 33
12:10:41 DEBUG frame.c:197: ignoring partial pdu
12:10:43 INFO  sync.c:298: MER: 7.404940 dB (lower), 7.376904 dB (upper)
12:10:44 INFO  decode.c:74: BER: 0.000022, avg: 0.000025, min: 0.000022, max: 0.000027
12:10:44 DEBUG frame.c:168: pdu_seq: 0, seq: 0, nop: 33
12:10:46 INFO  sync.c:298: MER: -1.457466 dB (lower), -3.960696 dB (upper)
12:10:47 INFO  decode.c:74: BER: 0.062330, avg: 0.020793, min: 0.000022, max: 0.062330
12:10:47 DEBUG frame.c:168: pdu_seq: 1, seq: 32, nop: 33
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 DEBUG sync.c:199: lost sync (-1, -1)!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 DEBUG sync.c:244: First block @ 11
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:49 DEBUG sync.c:244: First block @ 30
12:10:50 DEBUG sync.c:244: First block @ 3
12:10:50 DEBUG sync.c:244: First block @ 1
12:10:51 DEBUG sync.c:244: First block @ 0
12:10:51 INFO  sync.c:222: Synchronized!
12:10:52 INFO  acquire.c:98: Timing offset: 642.187500, slope: -4.199219 (adjust)
12:10:52 INFO  sync.c:298: MER: 6.963532 dB (lower), 6.934787 dB (upper)
12:10:53 INFO  decode.c:74: BER: 0.000022, avg: 0.015600, min: 0.000022, max: 0.062330
12:10:53 DEBUG frame.c:168: pdu_seq: 0, seq: 0, nop: 33
12:10:53 ERROR output.c:125: Decode error: Array index out of range
12:10:54 ERROR input.c:265: input buffer overflow!
12:10:54 ERROR input.c:265: input buffer overflow!
12:10:54 ERROR input.c:265: input buffer overflow!
12:10:54 ERROR input.c:265: input buffer overflow!
12:10:54 ERROR input.c:265: input buffer overflow!
12:10:54 ERROR input.c:265: input buffer overflow!
12:10:56 INFO  sync.c:298: MER: 0.483052 dB (lower), -0.037717 dB (upper)
12:10:56 INFO  decode.c:74: BER: 0.207394, avg: 0.053959, min: 0.000022, max: 0.207394

argilo · 2017-11-14T02:03:24Z

#95, #106 and #107 have made significant improvements in ARM performance, and it looks like USE_FAST_MATH is no longer required. 15-minute load average is around 0.55 on a Raspberry Pi 3 with USE_NEON. I have some further improvement in mind, but I think I'll close this issue for now as ARM performance seems to be adequate already.

argilo added the enhancement label Sep 6, 2017

argilo closed this as completed Nov 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM performance #17

ARM performance #17

awesie commented Jun 27, 2017

mrbubble62 commented Jun 28, 2017

awesie commented Jun 28, 2017

mrbubble62 commented Jun 28, 2017

awesie commented Jun 28, 2017

awesie commented Jun 29, 2017

mrbubble62 commented Jul 1, 2017 •

edited

Loading

mrbubble62 commented Jul 1, 2017 •

edited

Loading

argilo commented Nov 14, 2017

ARM performance #17

ARM performance #17

Comments

awesie commented Jun 27, 2017

mrbubble62 commented Jun 28, 2017

awesie commented Jun 28, 2017

mrbubble62 commented Jun 28, 2017

awesie commented Jun 28, 2017

awesie commented Jun 29, 2017

mrbubble62 commented Jul 1, 2017 • edited Loading

mrbubble62 commented Jul 1, 2017 • edited Loading

argilo commented Nov 14, 2017

mrbubble62 commented Jul 1, 2017 •

edited

Loading

mrbubble62 commented Jul 1, 2017 •

edited

Loading