Some 64-bit milliseconds tick values were not migrated to 32-bit #209

warasilapm · 2024-02-28T23:38:09Z

It would appear at some time in the past (I believe here), the portable API was changed from 64-bit millisecond ticks to 32-bit milliseconds ticks. In general, this was migrated successfully. However, there are still a number of 64-bit millisecond time values which would more correctly be 32-bit values, since that matches the portable API's type. None of these included here strictly cause issues, but they are confusing and their use could conceivably lead to issues with misuse in the future. See attached patch.

I may have missed some uses; perhaps a more thorough review is in order.

RobMeades · 2024-02-29T12:08:05Z

Agreed: I was aware of this but it is kind of a ticklish thing to change, need to check each one to make sure we're not introducing a new problem. Thanks for the ones you've spotted here, what I will do is search the codebase for int64_t variables and sift through them to see if there are any more to be had, then we will have to check each one.

I had wondered if there is an argument for creating a macro which performs the compare "safely", and which would fall-through if the tick wraps, though the macro would need to be named very well in order that the person writing the code knows how it will behave, since the macro is now hiding the truth.

Anyway, maybe we can discuss this later.

RobMeades · 2024-03-01T15:44:03Z

As discussed, I have created a branch which integrates your proposed fixes here and in #207/#208, and also introduces a pair of macros, U_PORT_TICK_TIME_EXPIRED_OR_WRAP_MS() and U_PORT_TICK_TIME_BEYOND_STOP_OR_WRAP_MS(); the macros are used where timeouts are checked throughout the code so as to ensure the wrap case (which should cause both macros to return true) is handled.

You can find a preview of the branch here:

https://github.com/u-blox/ubxlib/tree/preview_fix_tick_wrap_rmea

Note that this has NOT BEEN tested properly yet, since a key part of our test system is down. The test system should be fixed on Monday, at which point I can begin running proper tests. Obviously with such a wide change I'm concerned that I've typed in something wrong, maybe got the sense of a check upside-down: please, if you have the time, give it a thorough review.

All feedback welcomed.

warasilapm · 2024-03-15T18:52:10Z

If I may ask - why use a macro for the tick comparison functions instead of a static inline style header function? I notice they both have explicit casts to uint32_t. These suppress the compiler warnings for implicit integer conversion. If you're concerned with the types, a static inline style function would enforce the type more naturally and (if memory serves correctly) also issue a warning if the implicit integer conversion warning is turned on in the compiler.

RobMeades · 2024-03-15T18:58:38Z

Hi: only because static inline is, IIRC, a request to the compiler, not an instruction, so it cannot be guaranteed. Though the macro doesn't enforce type-checking, casting does enforce the type used.

RobMeades · 2024-03-15T19:00:23Z

But I could go either way of course, just seems a bit more "embedded" to use a macro, which guarantees a specific behaviour irrespective of toolchain.

warasilapm · 2024-03-15T19:38:30Z

Hi: only because static inline is, IIRC, a request to the compiler, not an instruction, so it cannot be guaranteed. Though the macro doesn't enforce type-checking, casting does enforce the type used.

This is largely true. The nuance is complicated but I'm so used to always having optimization turned on. For the three primary arm targeting c compilers (gcc, clang, and armcc), with any level of optimization static inline is always respected, since the internal linkage of the function allows the compiler to avoid emitting an out-of-line version. I believe gcc (which is what I use most often) tends to still emit an out-of-line version when compiled with -O0.

All three compilers do support the __attribute__((always_inline)) (gcc, clang, armcc) syntax for forcing functions inline, though I think this can sometimes tie the compiler's hands in -Os which may prefer to emit an out of line version for the code reuse benefits. Generally I prefer to trust the compiler in that case.

I generally live on -O1, -Og, or -Os. In my experience with gcc, all of them will pick up the hint correctly and avoid emitting an out-of-line version when unnecessary. I also find that an attached debugger will better follow the code path of the inline function, allowing easier line steps through it, while a macro can only be stepped over or instruction stepped. I'm just being an evangelist for header delcared static inline functions; I don't think using a macro is strictly a problem. The names are a bit long though; maybe uPortTickTimeExpired() and uPortTickTimePastStop() would be less cumbersome.

RobMeades · 2024-03-15T19:56:15Z

OK, I can be persuaded to go for static inline, though, personal preference again of course, I would prefer to leave the "or wrap" part in the function name; we are doing this to point out to the caller that the function will return true at wrap as well as on expiry or past stop. Otherwise the developer might just do a compare to avoid this stupidly inefficient function call that just, after all, compares two numbers. Definitely want to leave the ms in there, 'cos it is not a tick, it is a tick-derived-time-value in milliseconds.

warasilapm · 2024-03-15T20:27:07Z

You're welcome to name it whatever you want; it's not my codebase after all. I have a different perspective on things again so I'll share for posterity.

Wrapping Schmapping

The thing is, it's not that it returns true if the value(s) has/have wrapped. It returns true if the time has expired or is past stop. The caller shouldn't need to know or even think about the wrap around that occurs around INT32_MAX. I think what you're implying is actually further from the right behavior, suggesting that if either value has wrapped past this point the function will return true. That would be a different buggy behavior compared to handling the wrap around correctly.

If there's any caveats to list, it would be that this is only valid within the range (INT32_MIN, INT32_MAX ) relative to the current time. Otherwise, the math will correctly handle the timeouts presented. I generally just list these as a @note field in the doxygen comment.

What's in a Millisecond

This can be nice to have in the signature, but truthfully it doesn't matter what unit the int32_t values have. The math is the same either way. Hence, I dropped them. In ubxlib's case there's only milliseconds ticks values as far as I can see. It's more important that the two values are both the same base unit, and no change to the function signature can really enforce that in C without a more complicated structure or timing system. In my opinion, it's not really worth it.

On a different evangelism note, static inline functions tend to show up in the call graph during static analysis. Some tools will not pick up the macro as part of the call graph. That can be handy some times when working with automated coverage and analysis tools, but as far as working with the code I still mostly rely on a text search myself so it doesn't really matter.

RobMeades · 2024-03-15T21:12:18Z

We make ubxlib as source code with the primary intention that it is understandable/absorbable by customers; your opinion carries more weight than mine/ours, so we will follow your pattern. I am testing the revised changes now, will post here shortly.

My concern remains that I have got a comparison the wrong way around somewhere in this relatively broad change; I will merge the change when you have confirmed when you are happy that has not happened.

RobMeades · 2024-03-15T22:25:29Z

Branch is now updated with the new form, do let me know when you are happy with it and I will merge it to master here and delete the preview branch some time after that.

warasilapm · 2024-03-27T20:37:07Z

Been looking at this again trying to resolve issues running ubxlib past INT32_MAX rollover. Had the following comments:

The keep going callback in u_device_private_cell.c is still broken. We both missed this one.
I misunderstood that your expiry function calls uPortGetTickTimeMs() under the hood, and therefore is only compatible with milliseconds values. I had assumed that you passed in the return value from uPortGetTickTimeMs() into the elapsed functions. Given that context, it makes more sense for the unit to be in there. My mistake.
For the static inline to compile as I described, the function body needs to be in the header, not in a separate compilation unit. Just as a random example plucked out of thin air, this style is what I'm thinking of.

Unfortunately, I don't really have a great way of pulling this down and testing it at the moment. Our fork diverged way back at 1.2.0; not sure what the executive decision will be on updating.

RobMeades · 2024-03-27T21:31:31Z

The keep going callback in u_device_private_cell.c is still broken. We both missed this one.

Ooo, well spotted, will fix.

Given that context, it makes more sense for the unit to be in there.

Understood, will modify in next revision.

For the static inline to compile as I described, the function body needs to be in the header.

Interesting, hadn't appreciated that but it does seem to be the case that relying on the linker is not so wise. Will modify.

Our fork diverged way back at 1.2.0; not sure what the executive decision will be on updating.

Well I guess you could apply the changes manually to places you know/care about; with the exception of the one you spotted above they were reasonably easy to search for. Either way the result will require a careful review.

RobMeades · 2024-03-28T00:25:54Z

Changes now applied to the preview branch, see here:

https://github.com/u-blox/ubxlib/tree/preview_fix_tick_wrap_rmea

Thanks again for the feedback. I'm not going to merge this change into master here until you are happy with it, of course, so let me know how it goes.

warasilapm · 2024-03-29T20:30:38Z

So we've made progress on handling the INT32_MAX wrap around point. However, there are still rough edges to consider. Code which immediately captures a start time and then retries for some timeout do not suffer from this next issue, but ones that revisit a potentially much earlier set start time do. For example, consider the code which enforces a 1 second gap between CFUN commands in registerNetwork().

ubxlib/cell/src/u_cell_net.c

Lines 1243 to 1248 in 3cf5348

    
           // Come out of airplane mode and try to register 
        
           // Wait for flip time to expire first though 
        
           while (!uPortTickTimeExpiredMs(pInstance->lastCfunFlipTimeMs, 
        
                                          U_CELL_PRIVATE_AT_CFUN_FLIP_DELAY_SECONDS * 1000)) { 
        
               uPortTaskBlock(1000); 
        
           }

What if it's been more than INT32_MAX ticks since the last CFUN command was issued? This while loop will end up waiting as much as INT32_MAX + U_CELL_PRIVATE_AT_CFUN_FLIP_DELAY_SECONDS * 1000 milliseconds. This is much longer than the expected worst case timeout. The issue here is that the current comparison (and indeed all using the newly introduced port API functions) maps half of the integer space (red and yellow below) as within the timeout period instead of just the timeout period (yellow only below).

There are a couple of different approaches to solving this sort of problem. The first is to explicitly check for values in that range. With the current code that would look like a timeout implementation below:

static U_INLINE bool uPortTickTimeExpiredMs(int32_t startTimeMs,
                                            int32_t timeoutMs)
{
    int32_t delta = uPortGetTickTimeMs() - startTimeMs;
    return (delta > timeoutMs) || (delta < 0);
}

The second way this is often resolved is to just used unsigned math for expiration values, since the negative values we had to explicitly check for before are simply large values which will always exceed the timeout. Both of these approaches map only the yellow area above and below as within the timeout value. This more tightly bounds the potential waiting time to the potential timeout window anyways, so worst case the check will result in the timeout to be observed in code.

If that's not sufficient, the only way is to maintain these timeout values, essentially checking that they are within the expected value and bringing them within range or invalidating them every 2^31 ticks. This is a lot of complexity and overhead and generally it's acceptable to have the application wait up to the specified timeout anyways in the exceedingly rare case this occurs.

Note that a similar relationship is true with the current state of the comparisons for "stop" values instead of "start" values with a timeout. These map stopTimeMs - INT32_MAX to "not yet timed out", just like startTimeMs + timeoutMs - INT32_MAX.

I'm curious if there was a reason the portable layer chose int32_t instead of uint32_t. At least in STM32Cube land both HAL and FreeRTOS os ticks are both unsigned numbers.

An additional spot we have identified this style of is issue is below:

ubxlib/cell/src/u_cell_sock.c

Lines 2003 to 2006 in a2d986e

    
           while (uPortGetTickTimeMs() - pInstance->connectedAtMs < 
        
                  U_CELL_SOCK_SARA_R422_DNS_DELAY_MILLISECONDS) { 
        
               uPortTaskBlock(100); 
        
           }

Another potential source of issues worth revisiting is the use of 0 or -1 to indicate unset or invalid times. This works when it is used for optionally set timeout values, such as the AT client's atTimeoutSavedMs member. However, it doesn't work well for start times, like startTimeMs in the cell instance structure. The local keep going callback in the cellular network code will fail to abort a failing connection if startTimeMs is a negative tick value as a result of a check for this "unset" value that it is returned to.

ubxlib/cell/src/u_cell_net.c

Lines 751 to 755 in 3cf5348

    
           if ((pInstance->startTimeMs > 0) && 
        
               !uPortTickTimeExpiredMs(pInstance->startTimeMs, 
        
                                       U_CELL_NET_CONNECT_TIMEOUT_SECONDS * 1000)) { 
        
               keepGoing = false; 
        
           }

This check has to be removed to prevent immediate timeout in these cases. The good news on this one is that, currently, there does not seem to be a code path which calls this without first setting startTimeMs so a bug from an unset value does not seem to be an issue. In our application this has yet to cause a problem because we only call through uNetworkInterfaceUp() and uNetworkInterfaceDown() which passes in the keepGoingCallback() in u_network_private_cell.h.

RobMeades · 2024-03-29T23:53:56Z

All good stuff, give me a little time to process...

RobMeades · 2024-04-02T15:11:47Z

@warasilapm: have been discussing with Michael today, hope to have next revision tomorrow.

RobMeades · 2024-04-04T00:15:54Z

Branch is now updated.

This is the "Michael edition": a uTimeout API is introduced in /common/timeout with "anonymous" [not really] type uTimeoutStart_t, a function uTimeoutStart() to populate variables of that type and functions uTimeoutExpiredMs()/uTimeoutExpiredSeconds() to check for expiry. All calculations are uint32_t.

This new API is used in all cases where time calculations are performed, aside from one case in BLE and the lock timer in the AT Client (which has been pretty darned thoroughly tested and doesn't quite fit the pattern).

Just about every file in ubxlib has been touched by this so I don't want to leave it sitting too long or it will be a nightmare to merge: please let me know what you think, @mazgch as well as @warasilapm.

RobMeades · 2024-04-04T12:13:15Z

@mazgch has made initial comments on the branch (thanks Michael). I have addressed these, I think, but it was getting quite difficult for Michael to view (the baseline for the current preview branch is also pretty old now) so I have (a) deleted the old preview branch, (b) created a new branch preview_fix_tick_wrap_2_rmea and (c), in order to make commenting hopefully easier, I have made this branch a PR #225 to this repo (rather than our internal development repo). I won't merge this PR, it will only be for comments.

RobMeades · 2024-04-04T12:17:23Z

@warasilapm: in the hope that it will enable you to comment directly on #225 I have sent you an invitation to this repo as a read-only collaborator. Or you can also just comment here of course.

warasilapm · 2024-04-05T19:03:25Z

@RobMeades As far as I know, I can comment on PRs and submit code review on PRs in public repositories. I just can't be used as an "approving" review for the purposes of code review requirements, etc. As such, I have declined the invitation for now; if there are other features you need me to have access for in the future I'll let you know.

I'll still take a look at the changes in #225 as I am able.

warasilapm · 2024-04-05T19:06:36Z

@RobMeades If memory serves, you should even be able to request review from me on the PR without me as a collaborator.

RobMeades · 2024-04-05T22:48:43Z

If memory serves, you should even be able to request review from me on the PR without me as a collaborator.

Unfortunately typing your Github ID doesn't seem to resolve:

RobMeades · 2024-04-10T15:26:28Z

@warasilapm: I understand from separate e-mail communication that you cannot use a new release of ubxlib in your product, whatever is proposed in the very fruitful discussions we have had here. I had planned to wait until you were happy with the changes but maybe that is not good use of your time, or at least not something you would be able to prioritise? If that is the case, let me know and we will continue with our internal review process; the changes have already been tested, but of course, given the statistical nature of these things, can be tested further, I just didn't want to head down that route if you disagreed with the direction.

Please advise.

warasilapm · 2024-04-10T16:15:35Z

@RobMeades Did you develop a test case where the tick value is initialized close to the 2^31 and 2^32 rollover values?

RobMeades · 2024-04-10T16:23:21Z

Yes, we do this on test instance 23 (where ubxlib happens to be running on Windows, not that it matters) which has a SARA-R5, NINA-W15 and M8 module attached, and also runs out-of-coverage tests. It is running with U_CFG_TEST_TIMEOUT_SPEED_UP=14, so a multiplier of 2^14: time should wrap roughly every 4 minutes. This is about as fast as we can go without having a wrap that is within our largest timeout.

EDIT: for context, the complete test run on that instance takes just under one hour.

warasilapm · 2024-04-10T17:31:39Z

Okay sounds good! Just wanted to make sure that was being covered explicitly since that's the main advantage I could provide with testing on our application.

I may be able to find some time to review the PR you put up in the next couple of days but you are correct in that the takeaway in the email from our side today is a) our version is not synchronized with latest ubxlib so it's challenging for me to work with, b) we aren't updating to latest ubxlib on this release cycle, and c) there are other immediate priorities for me during the work day over reviewing your PR.

RobMeades · 2024-04-10T18:02:51Z

Got it: in that case I will wait until late Friday evening UK time before I go ahead with things on our side; there are reasons for me not to upset the applecart before then.

RobMeades · 2024-04-13T11:40:54Z

FYI, #225 is about to enter our internal review process; once it has passed that it should appear on master here, hopefully next week. @warasilapm: still happy to receive any comments you may have, which I will feed into that process.

RobMeades · 2024-04-23T14:39:11Z

@warasilapm: the change has now passed testing, review, and is pushed to master here in commit 79e808e. Are you OK for me to close #207, #208 and #209 or is there more you would like to discuss?

warasilapm · 2024-04-24T20:37:04Z

I'm good to close the issues for now.

RobMeades self-assigned this Feb 29, 2024

RobMeades mentioned this issue Mar 29, 2024

few tick timeout comparison and other issues/improvements: #223

Closed

RobMeades closed this as completed Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some 64-bit milliseconds tick values were not migrated to 32-bit #209

Some 64-bit milliseconds tick values were not migrated to 32-bit #209

warasilapm commented Feb 28, 2024 •

edited

Loading

RobMeades commented Feb 29, 2024

RobMeades commented Mar 1, 2024 •

edited

Loading

warasilapm commented Mar 15, 2024

RobMeades commented Mar 15, 2024 •

edited

Loading

RobMeades commented Mar 15, 2024 •

edited

Loading

warasilapm commented Mar 15, 2024 •

edited

Loading

RobMeades commented Mar 15, 2024 •

edited

Loading

warasilapm commented Mar 15, 2024

RobMeades commented Mar 15, 2024 •

edited

Loading

RobMeades commented Mar 15, 2024 •

edited

Loading

warasilapm commented Mar 27, 2024

RobMeades commented Mar 27, 2024 •

edited

Loading

RobMeades commented Mar 28, 2024

warasilapm commented Mar 29, 2024

RobMeades commented Mar 29, 2024

RobMeades commented Apr 2, 2024

RobMeades commented Apr 4, 2024 •

edited

Loading

RobMeades commented Apr 4, 2024 •

edited

Loading

RobMeades commented Apr 4, 2024

warasilapm commented Apr 5, 2024

warasilapm commented Apr 5, 2024

RobMeades commented Apr 5, 2024

RobMeades commented Apr 10, 2024

warasilapm commented Apr 10, 2024

RobMeades commented Apr 10, 2024 •

edited

Loading

warasilapm commented Apr 10, 2024 •

edited

Loading

RobMeades commented Apr 10, 2024

RobMeades commented Apr 13, 2024

RobMeades commented Apr 23, 2024

warasilapm commented Apr 24, 2024

Some 64-bit milliseconds tick values were not migrated to 32-bit #209

Some 64-bit milliseconds tick values were not migrated to 32-bit #209

Comments

warasilapm commented Feb 28, 2024 • edited Loading

RobMeades commented Feb 29, 2024

RobMeades commented Mar 1, 2024 • edited Loading

warasilapm commented Mar 15, 2024

RobMeades commented Mar 15, 2024 • edited Loading

RobMeades commented Mar 15, 2024 • edited Loading

warasilapm commented Mar 15, 2024 • edited Loading

RobMeades commented Mar 15, 2024 • edited Loading

warasilapm commented Mar 15, 2024

Wrapping Schmapping

What's in a Millisecond

RobMeades commented Mar 15, 2024 • edited Loading

RobMeades commented Mar 15, 2024 • edited Loading

warasilapm commented Mar 27, 2024

RobMeades commented Mar 27, 2024 • edited Loading

RobMeades commented Mar 28, 2024

warasilapm commented Mar 29, 2024

RobMeades commented Mar 29, 2024

RobMeades commented Apr 2, 2024

RobMeades commented Apr 4, 2024 • edited Loading

RobMeades commented Apr 4, 2024 • edited Loading

RobMeades commented Apr 4, 2024

warasilapm commented Apr 5, 2024

warasilapm commented Apr 5, 2024

RobMeades commented Apr 5, 2024

RobMeades commented Apr 10, 2024

warasilapm commented Apr 10, 2024

RobMeades commented Apr 10, 2024 • edited Loading

warasilapm commented Apr 10, 2024 • edited Loading

RobMeades commented Apr 10, 2024

RobMeades commented Apr 13, 2024

RobMeades commented Apr 23, 2024

warasilapm commented Apr 24, 2024

warasilapm commented Feb 28, 2024 •

edited

Loading

RobMeades commented Mar 1, 2024 •

edited

Loading

RobMeades commented Mar 15, 2024 •

edited

Loading

RobMeades commented Mar 15, 2024 •

edited

Loading

warasilapm commented Mar 15, 2024 •

edited

Loading

RobMeades commented Mar 15, 2024 •

edited

Loading

RobMeades commented Mar 15, 2024 •

edited

Loading

RobMeades commented Mar 15, 2024 •

edited

Loading

RobMeades commented Mar 27, 2024 •

edited

Loading

RobMeades commented Apr 4, 2024 •

edited

Loading

RobMeades commented Apr 4, 2024 •

edited

Loading

RobMeades commented Apr 10, 2024 •

edited

Loading

warasilapm commented Apr 10, 2024 •

edited

Loading