Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zephyr support for ABIs without a defined floating-point calling convention #32192

Open
jharris-intel opened this issue Feb 10, 2021 · 14 comments
Labels
Needs review This PR needs attention from Zephyr's maintainers RFC Request For Comments: want input from the community

Comments

@jharris-intel
Copy link
Contributor

Introduction

Even with all FPU configuration options disabled, Zephyr currently requires a defined floating-point calling convention, due to several functions in timeutil_skew_* taking and using floating-point values.

Note that this is distinct from having soft-float enabled. (There's really three alternatives here: an ABI that uses hardware floating-point operations and that passes floating-point values in hardware registers, an ABI that uses software floating-point operations and that does not use floating-point registers (instead passing floating-point values between functions using e.g. general-purpose registers)), or an ABI that neither allows floating-point operations nor allows floats to be passed between functions at all. Roughly: hard-float, soft-float, and no-float.)

Problem description

GCC and clang will error if they encounter a function defined with a floating-point type with no-float (as defined above). This results in being unable to compile or use Zephyr on any such platforms, even if the timeutil_* functions taking floats are unused and GC'd out.

Proposed change

Either:

  1. Provide a CONFIG flag to disable timeutil_* functions that use floating-point arguments.

and/or

  1. Split out timeutil_skew_* functions into their own file.

and/or

  1. Change timeutil_* functions that use floating-point values or arguments to not use floating-point values or arguments.

and/or

  1. Like 3, but "add new versions" instead of "change existing versions". Doesn't make sense without 1.

I am able to do these changes; I do not wish to go ahead with these until/unless someone pokes at any obvious holes in the above.

Detailed RFC

Option 1 would be as follows:

  1. Add a flag to lib/os/Kconfig called CONFIG_ENABLE_FLOAT_TIMEUTIL [n.b. I am terrible at naming things.]
  2. Gate function and type declarations and definitions in include/sys/timeutil.h that rely on floating-point types behind the flag defined in 1.
  3. Gate function implementations in lib/os/timeutil.c that were gated in 2 behind the flag defined in 1.
  4. Gate any tests in tests/util/timeutil/test_sync.c that use gated function definitions behind the flag defined in 1. (Is there a "proper" way to make a unit test require a feature?)

Option 2 would be as follows:

  1. Add a new file, include/sys/timeout_sync.h.
  2. Move all timeout_sync_*-related functions and types from timeout.h to timeout_sync.h
  3. Change any c-file internal includes from timeout.h to timeout_sync.h if they are only using timeout_sync.h functions. (I think this is only test_sync.c.)
  4. (Not sure about this one): add an include of timeout_sync.h to timeout.h. Required for backwards compatibility.
  5. Move all timeout_sync_*-related functions and types from timeout.c to timeout_sync.c
  6. Add timeout_sync.c to CMakeLists.txt

Option 3/4 would be as follows (Note: replace change with add a new version with as appropriate.):

  1. Change float skew to an int64_t [way overkill, but oh well], defined as a fixed-point type of absolute clockspeed in units of, oh, 2^-24th of a part per billion. (This gives a clockspeed range of of 0-1100, if you're wondering why the weird and seemingly-arbitrary unit. Units of 2^-32 would potentially be better, but would give a range of 0-4, which I could see someone overflowing. If your clock is running 1100x realtime you've got worse problems. You could choose something in the range 24-32; I just am generally lothe to pick shift values that aren't a multiple of 8 for various reasons. Ditto, the reason for having the units be a twos-based fraction of a decimal fraction (2^-24th of a part per billion) is to allow timeutil_sync_skew_to_ppb to be reasonably quick... which may be an overoptimization.)
  2. Change timeutil_sync_* functions to use said fixed-point value instead of floats or doubles.

Proposed change (Detailed)

I'm honestly not sure how this section is supposed to be different than the above section.

Dependencies

Any other APIs added that use floating-point in a non-optional manner would break this.

Concerns and Unresolved Questions

See above.

Option 3 would be a breaking change. Option 2 could be a breaking change in certain cases, if timeutil.h didn't include timeutil_skew.h

Alternatives

Listed inline above.

@jharris-intel jharris-intel added the RFC Request For Comments: want input from the community label Feb 10, 2021
@jharris-intel
Copy link
Contributor Author

BTW I'll be making another ticket soon(TM) for the specific weird platform in question, if you're wondering.

@nashif
Copy link
Member

nashif commented Feb 10, 2021

@pabigot FYI

@jharris-intel
Copy link
Contributor Author

As an aside, I am somewhat confused about what should be a question versus bug versus enhancement versus feature request. Everything I'm doing seems to fall into two (or more) of those categories at once.

Any input as to my (mis)use of these labels would be appreciated.

@jharris-intel
Copy link
Contributor Author

BTW I'll be making another ticket soon(TM) for the specific weird platform in question, if you're wondering.

See #32195.

Long story short, GCC does not have a defined softfloat ABI for AArch64, as AArch64 does not have an official softfloat ABI.

(Interestingly, recent versions of Clang do have a defined softfloat ABI for AArch64. See e.g. https://godbolt.org/z/TGe38f )

@pabigot
Copy link
Collaborator

pabigot commented Feb 11, 2021

Interesting.

We have historically variant terminology around floating point. CONFIG_FLOAT vs CONFIG_FP_SHARING were renamed to CONFIG_FPU{,_SHARING} back in 2.3.0, but in this case I don't think "FPU" is relevant. In that refactoring we lost the ability to indicate generically a desire for floating point support, however it was accomplished.

Perhaps we could benefit from a more precise taxonomy for floating point support that selects the floating point support level from the options identified in the issue introduction (isn't there a fourth, hardware floating point operations with arguments passed on stack?). So this could be a Kconfig choice like:

  • CONFIG_FLOAT_USE_HARD
  • CONFIG_FLOAT_USE_SOFT
  • CONFIG_FLOAT_UNSUPPORTED

where with CONFIG_FLOAT_UNSUPPORTED the problematic API would be excluded.

Or if that's too complex, just CONFIG_FLOAT_UNSUPPORTED, defaulted based on toolchain+arch.

As to the potential solutions: I made several attempts to use a 64-bit fixed point representation in developing this API, and they all failed because of truncation in intermediate values due to the wide range of local and remote clock speeds that need to be supported. So I don't like 3 or 4. We could revisit that if the timesync API becomes critical to Zephyr, rather than just the applications that need it.

I prefer 1 (leave the API in timeutil.h but conditionally exclude building it), but rather than use a dedicated Kconfig to filter this specific use of floating point data types I'd like to see it depend on !CONFIG_FLOAT_UNSUPPORTED, so we don't have to special case other API. Note that ptp and gptp also involve floating point calculations, and would have to be documented as not usable in this case.

GCC and clang will error if they encounter a function defined with a floating-point type with no-float (as defined above).

Only on the function definition? If they're also going to reject a declaration/prototype, we have a different (albeit minor) problem related to documentation generation.

2021-02-16: see also #32373 (comment) and surrounding context.

@galak
Copy link
Collaborator

galak commented Feb 11, 2021

@tejlmand might be good to discuss part of this in the toolchain working group and how to convey the toolchain features (runtime checks, Cmake, Kconfig, etc..)

@jharris-intel
Copy link
Contributor Author

(isn't there a fourth, hardware floating point operations with arguments passed on stack?)

I knew I shouldn't have left that out :-)

Yes, this mode exists, although I haven't really run across any architectures which have compiler flags to differentiate this mode. There are ABIs with hardware floating-point that occasionally pass arguments outside dedicated floating-point registers, but they tend to be happenstance as opposed to an explicit selectable variant ABI, if that makes sense.

So this could be a Kconfig choice like:
CONFIG_FLOAT_USE_HARD
CONFIG_FLOAT_USE_SOFT
CONFIG_FLOAT_UNSUPPORTED

+1 from me on this.

So I don't like 3 or 4.

Assuming that 3 or 4 functioned properly, would you still dislike it? Or is this "just" a statement of "I don't think 3 or 4 are feasible"?

As to the potential solutions: I made several attempts to use a 64-bit fixed point representation in developing this API, and they all failed because of truncation in intermediate values due to the wide range of local and remote clock speeds that need to be supported.

Oh, so it wasn't the skew value itself that was the issue, but the resulting arithmetic with the clock speeds? Interesting, and I could see why that could be an issue.

If I came up with a solution that handled the entire range properly (I know this is nontrivial, to put it mildly), would you be interested in taking a look at it?

rather than just the applications that need it.

I mean, we would consider it useful, although obviously not in its current state.

I prefer 1 (leave the API in timeutil.h but conditionally exclude building it), but rather than use a dedicated Kconfig to filter this specific use of floating point data types I'd like to see it depend on !CONFIG_FLOAT_UNSUPPORTED, so we don't have to special case other API.

Hmm. I suppose this comes down to the question of how exactly to use Kconfig. In particular I'm used to module features that are dependent on some global feature, rather than directly using said global feature.

E.g. CONFIG_NET_POWER_MANAGEMENT existing and depending on CONFIG_PM_DEVICE, rather than directly using CONFIG_PM_DEVICE in the networking subsystem. (Or NOCACHE_MEMORY depending on ARCH_HAS_NOCACHE_MEMORY_SUPPORT for that matter.)

Not the end of the world either way I suppose.

Note that ptp and gptp also involve floating point calculations, and would have to be documented as not usable in this case.

Good point. Ditto format string support for floats/doubles, among other things.

This mainly came up because currently timeutil is a mandatory part of Zephyr.

Only on the function definition?

So far as I can tell this is the case, luckily, on both GCC and clang: https://godbolt.org/z/n68zsG

@pabigot
Copy link
Collaborator

pabigot commented Feb 11, 2021

Assuming that 3 or 4 functioned properly, would you still dislike it? Or is this "just" a statement of "I don't think 3 or 4 are feasible"?

The latter.

If I came up with a solution that handled the entire range properly (I know this is nontrivial, to put it mildly), would you be interested in taking a look at it?

Yes, if you want to give it a try I'll certainly take a look. The solutions I came up with mostly worked, but in addition to overflow causing divergence, the reduced accuracy led to significantly longer periods to converge to the right answer (hours, instead of minutes).

The API is also used to measure the skew between independent clocks, such as a 32 KiHz crystal oscillator and the main CPU clock. The range of frequencies for which there are use cases is 1 Hz (1 s) to perhaps 4 GHz (250 ps) or even higher. Restricting frequency values to (2^N)*(5^M) might be a reasonable restriction, if that helps. The magnitude of the time differences involved in calculating skews may also be large (counts that exceed 2^32).

Floating point is really suited to this; fixed point isn't an obvious good choice. Measuring skew isn't currently critical to any core Zephyr functionality, so effort required, and potential increase in code size to the majority of systems where floats could be used, are criteria for a trade-off between making it optional or selecting between float or fixed-point implementations.

rather than just the applications that need it.

I mean, we would consider it useful, although obviously not in its current state.

That interests me; I think there are only two people, one of them me, who actively supported this capability.

Hmm. I suppose this comes down to the question of how exactly to use Kconfig. In particular I'm used to module features that are dependent on some global feature, rather than directly using said global feature.

If the part of the original PR that added a generic "convert Zephyr local clock to realtime" API had gotten any support, that'd be the situation we're in, but it didn't. Zephyr commonly uses Kconfig to allow applications to say "I don't need that feature, don't make me pay for it", which applies in this case.

Note that ptp and gptp also involve floating point calculations, and would have to be documented as not usable in this case.

Good point. Ditto format string support for floats/doubles, among other things.

That should be covered; cbprintf does it without floating point operations.

This mainly came up because currently timeutil is a mandatory part of Zephyr.

That wasn't intentional, it's just there was no reason to support disabling it. I have no problem with making it optional.

@jharris-intel
Copy link
Contributor Author

The range of frequencies for which there are use cases is 1 Hz (1 s) to perhaps 4 GHz (250 ps) or even higher.

Hmm. The current timeutil sync API has both local and remote nominal frequencies as uint32_t hertz. So can't be too much higher than 4GHz as an upper limit with the current API.

What's the desired overall accuracy?

That should be covered; cbprintf does it without floating point operations.

Yep. Just a (somewhat trivial/obvious) example of an optional feature that requires floating-point. Though I'll have a patch incoming for that - there's one spot in the current code that's unreachable at runtime but that the compiler can't figure out, and errors out in the FLOAT_UNSUPPORTED case (...and I think, emitting needless calls to softfloat APIs in the soft float case, although I haven't actually checked.). Simple fix.

That interests me; I think there are only two people, one of them me, who actively supported this capability.

It may be a case of a solution in need of a problem, but having a combination of an accurate timesource that you don't want to use for everything (due to power / latency / etc) and an inaccurate but "easy" timesource is not uncommon.


So, perhaps just 1 as a short-to-mid-term solution, leaving open the possibility of 3 or 4 down the line?

@pabigot
Copy link
Collaborator

pabigot commented Feb 11, 2021

The range of frequencies for which there are use cases is 1 Hz (1 s) to perhaps 4 GHz (250 ps) or even higher.

Hmm. The current timeutil sync API has both local and remote nominal frequencies as uint32_t hertz. So can't be too much higher than 4GHz as an upper limit with the current API.

True.

What's the desired overall accuracy?

I want to be able to correct local clocks to about 2 ppm, like I'd get from a quality external RTC, assuming the skew is not affected by environmental concerns (e.g. temperature). This helps with correlating events recorded on distinct deployed data loggers, limiting the timestamp error to about +/- 1 minute per year. So the estimated skew has to be about that accurate.

That should be covered; cbprintf does it without floating point operations.

Yep. Just a (somewhat trivial/obvious) example of an optional feature that requires floating-point. Though I'll have a patch incoming for that - there's one spot in the current code that's unreachable at runtime but that the compiler can't figure out, and errors out in the FLOAT_UNSUPPORTED case (...and I think, emitting needless calls to softfloat APIs in the soft float case, although I haven't actually checked.). Simple fix.

Please add me as a reviewer when that's submitted if github doesn't do so.

So, perhaps just 1 as a short-to-mid-term solution, leaving open the possibility of 3 or 4 down the line?

Yes.

@jharris-intel
Copy link
Contributor Author

Please add me as a reviewer when that's submitted if github doesn't do so.

Sure, shall do.

perhaps just 1 as a short-to-mid-term solution, leaving open the possibility of 3 or 4 down the line?
Yes.

Ditto.

(On both of these I'm largely blocked on #32237, but that "should" be resolved shortly I think. At the very least workaround-able.)

I want to be able to correct local clocks to about 2 ppm, like I'd get from a quality external RTC, assuming the skew is not affected by environmental concerns (e.g. temperature). This helps with correlating events recorded on distinct deployed data loggers, limiting the timestamp error to about +/- 1 minute per year.

Great, thank you for the clear requirements here.

A question (pedantic I am aware... "being pedantic" sometimes seems like my job description):

  1. 2ppm meaning what precisely? Instantaneous ppm isn't really attainable (consider a (terrible/failed) timesource that alternates between 1kHz and 2kHz every second. If you're updating every minute, you cannot maintain 2ppm instantaneous.), so I assume you either have a missing assumption (e.g. "local clockspeed does not change by more than 1ppm over the time synchronization period") or I'm missing something.

@pabigot
Copy link
Collaborator

pabigot commented Feb 11, 2021

so I assume you either have a missing assumption (e.g. "local clockspeed does not change by more than 1ppm over the time synchronization period")

I meant that to be addressed by:

assuming the skew is not affected by environmental concerns (e.g. temperature)

because I know temperature is likely to be an issue. In my applications the skew does converge (the local clock for me is the Zephyr tick counter from a 25 ppm 32 KiHz crystal oscillator; the reference clock is TAI from a Linux host using NTP, transferred over Bluetooth).

So yes, I'm assuming that the local clock local runs at a constant unknown rate close to but not exactly its nominal rate. E.g. 32767.181 Hz instead of 32768.000 Hz. The skew combined with a time measured in the local clock gets me a pretty accurate estimate of the time passed in the reference clock.

@jharris-intel
Copy link
Contributor Author

So yes, I'm assuming that the local clock local runs at a constant unknown rate close to but not exactly its nominal rate.

Over a single update's time period I assume?

That makes sense.

@npitre
Copy link
Collaborator

npitre commented Mar 11, 2021

Tagging myself in.

CONFIG_FLOAT_USE_HARD
CONFIG_FLOAT_USE_SOFT
CONFIG_FLOAT_UNSUPPORTED

+1 here too.

Being a fixed point math adept myself I understand the motivation for trying to solve the timeutil API, however this should be dealt separately from the
CONFIG_FLOAT_UNSUPPORTED issue.

@zephyrbot zephyrbot added the Needs review This PR needs attention from Zephyr's maintainers label Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs review This PR needs attention from Zephyr's maintainers RFC Request For Comments: want input from the community
Projects
Status: No status
Development

No branches or pull requests

6 participants