Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce RX task minimum update rate to 22Hz, to improve 25Hz link stability #13435

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

ctzsnooze
Copy link
Member

@ctzsnooze ctzsnooze commented Mar 10, 2024

Users had noted a high incidence of RXLOSS warnings when using ELRS 25Hz, despite 100% LQ being reported by the ELRS code.

ELRS code uses a 100 sample moving average, effectively, to return LQ (with telemetry packets counted as 'good' despite no data being received); the internal Betaflight LQ method is not used. It seems that single packet loss, and telemetry loss, isn't counted.

Previously we would announce RXLOSS if there were no good frames over a continuous 100ms period. For 25Hz users this meant that dropping just two frames in a row or one frame either side of a telemetry packet would result in an RXLOSS warning. Logging showed single and double-frame random drops were not all that uncommon at range, but rarely interfere significantly with flight. By increasing the 'no data' window to 150ms we can reduce 'false alarms' quite a lot, not only at 25Hz but also at 50Hz.

Currently the minimum Rx task interval is 33Hz, ie intervals of ~30ms. This could cause a detection error when the intervals are 40ms long, as in a 25Hz link.

This PR makes quite a few small changes to improve the tendency for false RXLOSS warnings at low Rx rates:

  • increasing the allowed minimum Rx task interval to about 45ms (by reducing the minimum rate to 22Hz)
  • requiring a gap > 150ms of no valid Rx frames, up from 100ms, before announcingRXLOSS
  • removing the FrameAge parameter, which was used in a manner that seemed quite confusing
  • simplifying the calculation of frame delta when based on values provided by the Rx driver
  • using frame delta values from the Rx driver, if available.
  • some variables were re-named for consistency and to make the code more readable.
  • removes three separate places where rxRate is calculated independently, with one single calculation to derive rxRateHz in rc.c, and a get function to return it where needed in cli.c via getCurrentRxRateHz.

Overall these changes should reduce inappropriate RXLOSS indications for 25Hz links, and help a bit with 50Hz links also. Most pilots won't notice any significant difference. RXLOSS should appear whenever a sustained loss of data for more than 150ms occurs.

The refactoring should reduce CPU load a fraction.

Please test with the RX_TIMING debug, to which LQ and signalReceived values are logged; please compare Master to this PR.

This is the debug content of RX_TIMING debug in this PR:
Screen Shot 2024-03-26 at 12 01 15 pm

vs master:
Screen Shot 2024-03-26 at 12 02 20 pm

Testing appears OK, has been rebased to master.

Here is a graphic showing simulated link loss (put the quad into the microwave oven and shut the door). Axes are not labeled properly. The spikes coming up from the bottom indicate the duration of the dropout; the log is done at 50Hz so each discrete step up is 20ms of lag from one lost packet.

The very top trace is the SignalReceived indicator that triggers RXLOSS warning when it goes false, which happens on the left on four occasions when the LQ falls to around 50%. On the less severe link problem to the right, where LQ drops to around 70, there is no RXLOSS warning.

Screen Shot 2024-03-12 at 20 43 01

I do appreciate that quite a few long-range pilots felt that the RXLOSS warning was too intrusive. This perhaps shifts the balance back a bit away from it being a false alarm. When it appears, you have had no packets at all for 150ms, and that's cause for concern regardless of link speed.

Copy link

Do you want to test this code? You can flash it directly from Betaflight Configurator:

  • Simply put #13435 (this pull request number) in the Select commit field of the Configurator firmware flasher tab (you need to Enable expert mode, Show release candidates and Development).

WARNING: It may be unstable. Use only for testing!

@ctzsnooze ctzsnooze self-assigned this Mar 10, 2024
@ctzsnooze ctzsnooze force-pushed the Reduce-min-Rx-packet-interval-to-22ms branch 3 times, most recently from dcf81f0 to 99c97c6 Compare March 12, 2024 10:23
src/main/fc/rc.c Outdated Show resolved Hide resolved
src/main/rx/rx.c Outdated Show resolved Hide resolved
src/main/rx/rx.c Outdated Show resolved Hide resolved
@ledvinap
Copy link
Contributor

@ctzsnooze : Last image is with this PR applied?

@ctzsnooze
Copy link
Member Author

ctzsnooze commented Mar 13, 2024

Yes, with commit [Handle NULL input as before, log frame time

At least with ELRS/CRSF, there is less than 20uS difference in the interval measurements from frameDeltaUsRx and frameDeltaUs (except of course when frameDeltaUsRx is zero). We could save some CPU cycles by not worrying about rxRuntimeState.lastRcFrameTimeUs. If both MSP and SPI should set rxIsReceivingSignal correctly, I don't think we need rxRuntimeState.lastRcFrameTimeUs at all. And both MSP and SPI must set rxIsReceivingSignal correctly, otherwise failsafe and RC smoothing won't work.

What I don't know about is if commands are received over MSP or SPI, exactly what would happen then

Observing the RXLOSS marker, I think that 150ms is better. It will trigger fast enough on absolute loss, and probably triggers when it is bad enough that the pilot should get that warning appear.

Also, the final frame delta value (red line in traces) does do a pretty good job of tracking frame delta, I think, at the moment. If OK for SPI and MSP then I'm happy.

I'll make another log at higher speed. My great fear is accidentally powering up the microwave accidentally after I put the Tx in it :-) It is a perfect test situation, closing the door reliably wrecks the Rx link, but there is an automatic

@ctzsnooze
Copy link
Member Author

ctzsnooze commented Mar 13, 2024

This is at 250Hz with 1:32 telemetry, a standard kind of configuration, ELRS/CRSF 50mw.
First image is 'normal' showing telemetry episodes at 128ms intervals. Difference between direct and driver timing is still of the order of max 20us, typically 5us, suggesting that either method for calculating link time would be OK.

Quite a few dropped packets, which are enough to lower LQ to 97 - 98 while walking to the kitchen.
Screen Shot 2024-03-13 at 15 13 31.

Here is an image of link failure.... same method as above, Tx in the microwave, door closed. Similar time scale as previous tests.
Screen Shot 2024-03-13 at 15 20 37

Because LQ is based on a count over 100samples, it reacts MUCH faster to the loss of data - 5x faster.

Center cursor is at the 150ms mark on the first major loss.

Because link is weaker at 250Hz than 50Hz, the dropouts are more frequent and longer duration. LQ falls to 2 at 250Hz, whereas at 50Hz it was difficult to get it below 30.

RXLOSS appears multiple times, reflecting multiple long dropout periods. Longest dropout was 600ms, the typical big ones were 250-300ms duration.

To simplify comparisons, here is a 50Hz log, all display settings identical. The door closed just as fast!

Screen Shot 2024-03-13 at 15 38 08

Cool to see how much weaker the link is at 250Hz. RXLOSS onset seems fast enough despite the 150ms lag (vs 100ms previously)

Copy link
Contributor

@ledvinap ledvinap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some (random) comments

It would be best to cleanup this code a bit - document expected result of important functions and then try to implement this specification. Also, hidden state variables shall be removed where possible.

src/main/rx/rx.c Outdated Show resolved Hide resolved
src/main/rx/rx.c Outdated Show resolved Hide resolved
src/main/fc/rc.c Outdated Show resolved Hide resolved
@ctzsnooze
Copy link
Member Author

With current PR code, this is looking closely at frameDeltaUsRx (orange) vs frameDeltaUs (purple).

The value derived from Rx time stamps is more stable, and probably better suited for the correction of the derivative of stick travel in feedforward.

Finally I have some data to support using frameDeltaUsRx, if it is available as we do in the PR.

Screen Shot 2024-03-13 at 22 39 02

src/main/rx/rx.c Outdated Show resolved Hide resolved
@ledvinap
Copy link
Contributor

ctzsnooze#21

@ctzsnooze
Copy link
Member Author

ctzsnooze commented Mar 14, 2024

@ledvinap Thank you, sincerely, for your refactoring suggestions via PR21 - much nicer code! I'm very grateful.

Tested... first by comparing the 'noise' in the estimate of interval with a historical log of the same code but without PR21.

Both logs are 250Hz; the value in red is the output value for Frame Delta in both cases.
The purple trace was the old value for delta calculated by time stamps in updateRcRefreshRate.

Screen Shot 2024-03-15 at 10 26 59 am

Comparing the two in low LQ tests, the responses appear identical in time and form, in relation to signal lost.

Screen Shot 2024-03-15 at 10 42 48 am

Behaviour when powering the transmitter down, to failsafe, is exactly the same in both cases.

Update - tested with ELRS SPI, no logging, but in Sensors tab, appeared to work well. The link was extremely precise, reporting 40.0ms at 250Hz nearly every packet. The only thing of note was that with the same LQ drop test as before, there were no gradual interruptions, the link just died abruptly, and recovered equally quickly. And the link would only work at 250 and 500Hz, anything else failed.

Petr: Would it be OK to for me to add your refactoring changes from PR21 into this PR?

@ledvinap
Copy link
Contributor

And the link would only work at 250 and 500Hz, anything else failed.

Is this different from old code? I don't see why it should be related, but I may have overlooked something.

Would it be OK to for me to add your refactoring changes from PR21 into this PR?

Sure, you are welcome!

@ctzsnooze
Copy link
Member Author

ctzsnooze commented Mar 16, 2024

Thanks, @ledvinap! Really appreciate your direct proposal, I could not have done that. It is definitely much more clear, now.

I've committed and pushed your changes to this PR, re-named one constant as suggested, and logged the time stamp (because I quite like seeing it ramp up, and when it goes flat, I know what that means).

Anything else we should be doing within the context of this PR?

One thing I noticed is that we calculate rxRateHz from currentRxIntervalUs in processRcSmoothingFilter(), and also do much the same, but with rerate in calculateFeedforward, both in rc.c. And a similar conversion is done in cli.c at 4758, that one checks for zero on the interval. I think units returned are in Hz.

I wondered how best to just do the maths once, perhaps where we calculate currentRxIntervalUs?

The feedforward and rcSmoothing applications can use the constrained RxRate value (between Min and Max).

The CLI would report a limited frequency range of 15-1000Hz, and as a result would not report zero, even if no packets were received, which is perhaps not ideal; maybe we should check there if the interval is valid with isRxIntervalValid and somehow show invalid or something in the CLI? Don't need to over-think this though.

ELRS devs say they have identified a reason for the 'more frequent than expected' dropouts, with perhaps a solution at their end, and will look into their expectations of the SPI code also. I can go back and flash older versions down to 4.4 and test ELRS 3.x over SPI, but. that will take time. I have no clue about it to be honest, I never tried it at 50Hz before. Very unlikely to be something we have done.

@ctzsnooze ctzsnooze requested a review from ledvinap March 19, 2024 02:44
@ctzsnooze
Copy link
Member Author

@ledvinap In the most recent commit, I have tried to refactor out the unnecessary repetitions of the rxRate calculation, and to avoid unnecessary translations into seconds. I'm not sure about the best way to set previousRxIntervalUs, I think what I've done is OK but am not sure if it is the best way.

cliPrint("# Detected Rx frequency: ");
if (getCurrentRxIntervalUs() == 0) {
cliPrintLine("NO SIGNAL");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check could not have worked since currentRxIntervalUs was constrained to a min value that was >0.

@ctzsnooze ctzsnooze force-pushed the Reduce-min-Rx-packet-interval-to-22ms branch 2 times, most recently from 3af28c5 to eaaf220 Compare March 19, 2024 04:05
@ctzsnooze ctzsnooze force-pushed the Reduce-min-Rx-packet-interval-to-22ms branch from bd6588c to cff7f8c Compare March 26, 2024 00:49
@ctzsnooze ctzsnooze added this to the 4.6 milestone Mar 26, 2024
@ledvinap ledvinap added the Pinned Pinned items are excluded from automatically being marked as stale label Apr 20, 2024
Copy link
Contributor

@ledvinap ledvinap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any obvious problem and it is code improvement overall. Let's finish it

Comment on lines +4949 to 4950
const uint16_t smoothedRxRateHz = lrintf(rcSmoothingData->smoothedRxRateHz);
cliPrintLinef("%dHz", smoothedRxRateHz);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const uint16_t smoothedRxRateHz = lrintf(rcSmoothingData->smoothedRxRateHz);
cliPrintLinef("%dHz", smoothedRxRateHz);
cliPrintLinef("%dHz", lrintf(rcSmoothingData->smoothedRxRateHz));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pinned Pinned items are excluded from automatically being marked as stale RN: IMPROVEMENT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants