Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up serial GPS message decoding #13469

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tbolin
Copy link
Contributor

@tbolin tbolin commented Mar 21, 2024

This PR improves the performance of the serial GPS message decoding. The execute time of the GPS task now usually stays bellow 20 us on an F722. Usually only one more fast cycle is required process a nav-pvt message compared to current master. Previously the processing time was just bellow 30.

Limit gps processing time with clock cycles instead of micros
Replacing the repeated calls to micros with checking the number of CPU cycles speeds up the message processing by about 25-30%. The max processing time is lowered to 15 us (from 25) to take advantage of the faster processing.

Defer ublox message processing if low on time
Processing an ublox message after the last byte is received takes about 4-5 us. The task will now use a lower time limit if the ublox parser is on the last byte to prevent the task time from spiking at the end of a message.

Logs

gps_processing_logs.zip
The logs were recorder at 115200 kbit to create a worst case scenario.

Copy link

Do you want to test this code? You can flash it directly from Betaflight Configurator:

  • Simply put #13469 (this pull request number) in the Select commit field of the Configurator firmware flasher tab (you need to Enable expert mode, Show release candidates and Development).

WARNING: It may be unstable. Use only for testing!

@haslinghuis
Copy link
Member

Test unit needs some changes:

 /usr/bin/ld: ../../obj/test/cli_unittest/io/gps.c.o: in function `gpsUpdate':
/home/runner/work/betaflight/betaflight/src/test/../main/io/gps.c:1348: undefined reference to `getCycleCounter'
/usr/bin/ld: /home/runner/work/betaflight/betaflight/src/test/../main/io/gps.c:1360: undefined reference to `getCycleCounter'
/usr/bin/ld: ../../obj/test/cli_unittest/io/gps.c.o: in function `cpuCycleLimit':
/home/runner/work/betaflight/betaflight/src/test/../main/io/gps.c:1335: undefined reference to `clockMicrosToCycles'
/usr/bin/ld: /home/runner/work/betaflight/betaflight/src/test/../main/io/gps.c:1337: undefined reference to `clockMicrosToCycles'

@tbolin
Copy link
Contributor Author

tbolin commented Mar 21, 2024

Test unit needs some changes:

The problem is the cli test, which pulls in the io/gps.c which now requires drivers/system.c. Pulling in drivers/system.c directly doesn't work. Creating mocks of the required functions would be easy, but I'm not sure where to put them.

@tbolin
Copy link
Contributor Author

tbolin commented Mar 21, 2024

Unit tests should build now

Copy link
Contributor

@ledvinap ledvinap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another hack to avoid fixing the real problem ..

while (serialRxBytesWaiting(gpsPort)) {
wait = 0;
if (!isFast) {
rescheduleTask(TASK_SELF, TASK_PERIOD_HZ(TASK_GPS_RATE_FAST));
isFast = true;
}
if (cmpTimeUs(micros(), currentTimeUs) > GPS_RECV_TIME_MAX) {
if ((getCycleCounter() - initialCycleCount) > cpuCycleLimit()) {
Copy link
Contributor

@ledvinap ledvinap Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmp32() here. Subtraction result is unsigned and it will fail on wraparound. And in this case, it may happen every few seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some kind of Betaflight or MCU specific weirdness going on with unsigned int arithmetics? As far as I can tell the current code should handle wraparound as expected.

I did a small test to be sure

unsigned int val1 = 2U;
unsigned int val2 = 4294967295U;  // 2^32 - 1
printf("value: %lld \n", val1 - val2);

which resulted in

value: 3

as expected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CmpTimeUs macro handles wrap. See cmpTimeCycles for cycle comparison. Used in the scheduler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be missing something here. Why should the result of the subtraction be converted to a signed value if it is known which value that is older? As far as I can tell the result will be the same in most cases, and if the difference is large it will result in a negative value, which we do not want here (then, if the difference ever become that large there are bigger problems elsewhere).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbolin : Sorry, I was wrong. The wrap does work as intended.

Why should the result of the subtraction be converted to a signed value if it is known which value that is older?

For ms/us, 31bits is considerably longer than any intended delay. Signed value does work independent of value ordering, so there is lesser chance to get it wrong. ( cmp32(a, b) > tout / cmp32(a, b+tout) > 0). Also, cmpTimeUs / cmp32 marks calculation with values that are expected to overflow (so cmpTimeCyclesU(a, b) instead of a - b).

For tick counter, the situation is more complicated due to range. Personally, I'd use signed conversion and limit maximum delay to some small value (<1s). This should be enforced (compile-time asserts / runtime saturation). Without this, code may behave differently depending on clock speed, leading to support nightmare. We will still run in problems with 2GHz chips, but that will take some time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: It may be good idea to use

uint32_t getCycleCounter(void)
{
    return DWT->CYCCNT | 1;
}

This way, you can use if (timeout && cmp32(getCycleCounter(), timeout) < 0) safely and resolution loss is negligible.

@ledvinap ledvinap added the Pinned Pinned items are excluded from automatically being marked as stale label Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pinned Pinned items are excluded from automatically being marked as stale RN: IMPROVEMENT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants