Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Numerical accuracy] Change data type for accumulators #351

Closed

Conversation

kubark42
Copy link
Contributor

@kubark42 kubark42 commented Oct 8, 2021

A float is limited enough in resolution that it is worth using a double in the context of accumulators. A float can only handle a value with 2^23 = 8.4 million, which means almost 7 significant digits. That initially sounds like a lot, but since the R calibration uses 10,000 measurements, and the L calibration uses 20,000 measurements, then 8.4e6/10e4 = 840. So this is fewer than three significant digits for raw measurements, and barely more than two for L. Converting to doubles ensures that we can keep numerical accuracy for statistical measurements.

Likewise, it doesn't make sense to have a float used as a counter, as it will run out of precision long before an integer.

THIS CODE HAS NOT BEEN TESTED

Old:

  text	   data	    bss	    dec	    hex	filename
 275200	   3196	 140900	 419296	  665e0	build/BLDC_4_ChibiOS.elf

New:

   text	   data	    bss	    dec	    hex	filename
 275376	   3196	 140908	 420008	  666a0	build/BLDC_4_ChibiOS.elf

Notice that it uses a little more memory, and this is because while some floats became uint16_t, this didn't completely balance out the floats which became double.

@kalvdans
Copy link
Contributor

kalvdans commented Oct 8, 2021

Please use int for small counters in registers; the size is then 16 or 32 - whichever is faster on the hardware, and by using a signed type tells the compiler from assuming wraparound to assuming it never overflows, saves some bit masking instructions.

@kubark42
Copy link
Contributor Author

kubark42 commented Oct 8, 2021

Please use int for small counters in registers; the size is then 16 or 32 - whichever is faster on the hardware, and by using a signed type tells the compiler from assuming wraparound to assuming it never overflows, saves some bit masking instructions.

I think this should be an explicit size, because we don't want the compiler to be the reason why a statistical technique fails on a different target. I would prioritize reliability over speed in this case esp. since most of these accumulators are used in parameter identification, so they would presumably only run once each time a new motor is brought online.

I'm happy to change it to a signed value of an int32_t so that we have max performance on a 32-bit ARM, but I don't quite understand the impact of "tells the compiler from assuming wraparound to assuming it never overflows". Why does this make a difference?

@kalvdans
Copy link
Contributor

kalvdans commented Oct 8, 2021

but I don't quite understand the impact of "tells the compiler from assuming wraparound to assuming it never overflows".

Let's assume you have variables uint16_t ucounter and int16_t scounter, when you do ucounter++ the compiler (on 32-bit architectures) basically have to do reg32 = (reg32 + 1) & 0xffff, whereas the signed scounter++ the compiler is free to emit reg32 = reg32 + 1 since signed overflow is undefined behaviour in C. Unsighed overflow must always wraparound.

@kubark42
Copy link
Contributor Author

kubark42 commented Oct 8, 2021

So tell me if I have this right. As I understand it, in either case things have gone really wrong because your counter is now either 0 or negative. So since if we ever hit that point we really don't care anymore, we might as well not burn through extra processing power to look for the wrap-around case every time the value is incremented. Alright, makes sense.

I'm still somewhat partial to a uint16_t if only because I don't see processor efficiency in a calibration routine being as important as reducing memory footprint for an RTOS thread, but I'm good to go to an int32_t if you think it's best. Just lemme know.

@kalvdans
Copy link
Contributor

kalvdans commented Oct 8, 2021

As I understand it, in either case things have gone really wrong because your counter is now either 0 or negative. So since if we ever hit that point we really don't care anymore, we might as well not burn through extra processing power to look for the wrap-around case every time the value is incremented.

That's a very clever way to phrase it. Completely right.

I'm still somewhat partial to a uint16_t if only because I don't see processor efficiency in a calibration routine being as important as reducing memory footprint for an RTOS thread, but I'm good to go to an int32_t if you think it's best. Just lemme know.

When it comes to local variables, we are only taking about reducing the stack usage (which is not visible in the size output). Since we have to overcommit stack space anyway (gcc has no way to calculate max stack usage) I don't think it is worth the trouble to optimize every bit out of it.

@kubark42 kubark42 force-pushed the tweak_accumulator_data_types branch 2 times, most recently from 3feb749 to 81d13ba Compare October 8, 2021 13:24
@kubark42
Copy link
Contributor Author

kubark42 commented Oct 8, 2021

@kalvdans Updated.

New build:

   text	   data	    bss	    dec	    hex	filename
 275536	   3196	 140916	 419968	  66740	build/BLDC_4_ChibiOS.elf

Looks like BSS crept up a teensy bit, but DEC actually dropped. Nice.

A `float` is limited enough in resolution that it is worth using a double
in this context. Likewise, it doesn't make sense to have a float used as
a counter, as it will run out of precision long before an integer.
The argument for doing this is that the processing is more efficient as
when we do `ucounter++` the compiler (on 32-bit architectures) basically
has to do `reg32 = (reg32 + 1) & 0xffff`, whereas the signed `scounter++`
the compiler is free to emit `reg32 = reg32 + 1` since signed overflow is
undefined behaviour in C. Unsighed overflow must always wraparound.
@kubark42 kubark42 force-pushed the tweak_accumulator_data_types branch from aef2f8e to 154714b Compare October 8, 2021 13:43
@kalvdans
Copy link
Contributor

kalvdans commented Oct 8, 2021

I'm opposing the float -> double changes, I think they should be postponed and only added when the need arises for more than 23 bit precision.

The float -> integer changes looks good!

@kubark42
Copy link
Contributor Author

kubark42 commented Oct 8, 2021

I'm opposing the float -> double changes, I think they should be postponed and only added when the need arises for more than 23 bit precision.

I targeted only the variables which looked like they could suffer from imprecision. Basically, anything which was averaging 1000 values or more.

I believe that all doubles are in calibration functions, so aside from the brief memory impact and the additional required flash they have no visible impact on the CPU. Do you see a greater impact which I've missed?

From experience, imprecision and numerical instability sneaks up on us faster than we realize. If there are just 1k elements being summed together, then the resulting resolution is 13 bits. If there is a 12-bit ADC, then this barely squeaks by. If the ADC is 14- or 16-bit, then the float will fail to capture both very large and very small ADC values.

The problem can become quite acute with 10k or 20k measurements, at which point there is less than 9-bits of precision. I don't know offhand what resolution the onboard ADCs use, this would be something to investigate. However, it seems fair to think that in the future, boards will be using continually higher-precision ADCs which will gather data more quickly, both of which put pressure on the 23 bits of precision.

These numerical instability problems are hard to catch in the act, as they will only manifest themselves when there is a wide range in the input values. If we're measuring a constant V_bus, then it's extremely likely all the values are the same order of magnitude and so float is perfect. If the accumulator is sampling a moving signal, such as a wave which rises from 0 and goes to some peak, then it's the luck of the draw if it bites the user or not. The trick is that what we program today is sometimes co-opted into a different use tomorrow, and that's when the gremlins come out.

I've been caught by this on flight hardware and it looked like inexplicable algorithm errors, or a flaky power supply. BLDC and autopilots share the trait that when things go wrong, things go boom! One time our fixed-wing drone rolled inverted and plummeted to the ground due to the observer using floats instead of doubles. As a result I've learned to be very careful about when numerical imprecision can sneak up on me.

@kalvdans
Copy link
Contributor

kalvdans commented Oct 8, 2021

Thanks for the thorough explanation! 👍 I agree with you!

What I'm curious on is what instructions the compiler generate since the STM32F4 chip's cortex M4 only has single precision FPU?

@kubark42
Copy link
Contributor Author

kubark42 commented Oct 8, 2021

What I'm curious on is what instructions the compiler generates...

It's a lot more instructions, that's for sure. Check it out:

Addition

double precision

rpm_sum += (double)mc_interface_get_rpm();
 8019c24:	f006 ffc4 	bl	8020bb0 <mc_interface_get_rpm>
 8019c28:	ee10 0a10 	vmov	r0, s0
 8019c2c:	f7f2 fbf4 	bl	800c418 <__aeabi_f2d>
 8019c30:	4602      	mov	r2, r0
 8019c32:	460b      	mov	r3, r1
 8019c34:	4630      	mov	r0, r6
 8019c36:	4639      	mov	r1, r7
 8019c38:	f7f2 fa90 	bl	800c15c <__adddf3>

float precision:

pm_sum += mc_interface_get_rpm();
 8019bba:	f006 fee1 	bl	8020980 <mc_interface_get_rpm>
 8019bc0:	ee78 8a80 	vadd.f32	s17, s17, s0

Division

double precision:

const float voltage_avg = motor->m_samples.avg_voltage_tot / motor->m_samples.sample_num;
 80251ce:	f8d4 00c0 	ldr.w	r0, [r4, #192]	; 0xc0
 80251d2:	f7e7 f90f 	bl	800c3f4 <__aeabi_i2d>
 80251d6:	4602      	mov	r2, r0
 80251d8:	460b      	mov	r3, r1
 80251da:	4650      	mov	r0, sl
 80251dc:	4659      	mov	r1, fp
 80251de:	f7e7 faa1 	bl	800c724 <__aeabi_ddiv>
 80251e2:	f7e7 fc55 	bl	800ca90 <__aeabi_d2f>
 80251e6:	ee08 0a90 	vmov	s17, r0

float precision:

const float voltage_avg = motor->m_samples.avg_voltage_tot / (float)motor->m_samples.sample_num;
 8024f64:	eec6 8aa7 	vdiv.f32	s17, s13, s15

This is why I'm parsimonious about using doubles on an F4!

Of course, a division is going to be one of the costliest operations in doubles, although it's nowhere near as bad as a sqrt().

@kalvdans
Copy link
Contributor

kalvdans commented Oct 8, 2021

Thank you for the investigation! I think it is fine to do costly operations during calibration, so LGTM!

@vedderb
Copy link
Owner

vedderb commented Oct 10, 2021

This needs some testing and I would like to see an example that shows that this actually is a problem before making changes that can break things. The ADC is 12 bits btw.

Also, I prefer to have the iterators as floats as they are used in floating point divisions later on. That makes it less likely that a mistake sneaks in that runs calculations with integers and creates rounding errors.

@kubark42
Copy link
Contributor Author

kubark42 commented Oct 10, 2021

This needs some testing...

I agree! What is the best way I can test this? I have a VESC 100/250, but it's connected to a 20kW motor on a thrust stand-- I'm not really keen to start playing around with the firmware on that platform.

I have a bunch of STM32F4 boards, including several STM32F4 Discovery boards. Could one of those be pressed into service in any kind of meaningful manner?

I would like to see an example that shows that this actually is a problem before making changes that can break things.

// bob.c

int main() {
   // Iterative sum is float type
   float iterativeSum = 0;

   uint32_t n = 10000;  // The calibration routine for resistance uses this many values

   for (int i=0; i<n; i++) {
      iterativeSum +=i;
   }

   float closedFormSum = (n)*(n-1)/2;

   printf("\n");
   printf("expected: %f, returned: %f\n", closedFormSum, iterativeSum);

   return 0;
}
% g++ bob.cpp && ./a.out

expected: 49995000, returned: 49992896

So on a sawtooth, we start losing data at the 5th significant digit. Because every waveform will behave uniquely, there are only so many conclusions we can make about the degree of imprecision in general. With 10k-20k loops it starts to get into an area where deliberate analysis can be more costly than using a double.

Also, I prefer to have the iterators as floats as they are used in floating point divisions later on. That makes it less likely that a mistake sneaks in that runs calculations with integers and creates rounding errors.

I bet there's a story there! That sounds like you've been bitten by this in the past. I kind of wish there were a way to instruct the compiler to warn anytime there is integer division.

I see how this solves the problem on the STM32F4. Is there any intention to be compatible with other chips as well? One thing to keep in mind for future development is that using floats is less portable, since performance depends heavily on the chips FPU. Program speed and real-time jitter stability will improve with more of a dependence on integers.

My solution to this problem has been to label iteration counters in such a way that I always ask myself before dividing by them if I intend to be doing integer math. For instance, accumCnt to mean the accumulator loop count.

@vedderb
Copy link
Owner

vedderb commented Oct 11, 2021

I was referring to practical problems. The example shows a difference of 0.005 %, which is less than the ADC resolution under ideal conditions (using the full range, no offsets, only noise distributions that can be averaged out). Also, the reason for the 20k samples in the longest loop is more of a tiimeout to wait for the motor to stop and is never reached in practice. Further, even if the conditions are ideal and that precision would be reached, it would not make any difference when it comes to running the motor - motors are not that precise.

About the integer iterators, the worry of the unaltered code running slow on some other processor without FPU one day is a much smaller worry than introducing complexity that makes it more likely to make mistakes. I know that I'm doing floating point math with the iterators, and making them floats makes them one thing less to worry about.

Please don't get me wrong here, I appreciate the effort and help, I just don't want to risk braking things that are not really problems and work as they are.

@kubark42
Copy link
Contributor Author

kubark42 commented Oct 11, 2021

I have been bitten by floating point inaccuracy an embarrassing number of times, so now I tend to be on the look out for it. The real-world has this nasty tendency of sneaking up on me. I usually find it when I'm digging into why code output differs from theory.

A lot of its relevance seems to depend on whether the number can get corrected over time or if it bakes in error. 5 sig digits on copper resistance is effectively measuring thermal noise, raising the question about whether there's any value to an estimate beyond 3 sig digits.

It sounds like you've got this in a place you'd like, and it's enough to keep a watchful eye on this situation.

I'll close this PR so it doesn't clutter up the queue. The conversation was useful and it'll stay documented here in case anyone wants to do further digging. It can always be re-opened for subsequent merge if it's determined it makes an unambiguous positive impact.

@kubark42 kubark42 closed this Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants