Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Bluetooth: controller: Follow up on ticker conflict resolution #16830
Is your enhancement proposal related to a problem? Please describe.
With the changes in the PR #15787, I notice 2 main issues:
I will assume the upstream/master would be broken for nRF51 and for nRF52 will be degraded in terms of expected functionality till the above issues are addressed.
Describe the solution you'd like
Some in the above description (which was an email initially)
Describe alternatives you've considered
We will discuss in the comments below.
From: Chettimada, Vinayak Kariappa [mailto:email@example.com]
We hold back on roll-back. I feel, we can iterate on the existing changes and come up on a decision or list of requirements to be covered. A skype call makes sense, unfortunately I will not be available until next Friday (I need to be full time at my desk to make any sense to have fruitful discussion, I travel back to Bangalore next Tuesday).
From: Morten Priess (MTPR) firstname.lastname@example.org
It turns out that it is a matter of the “old” ticker favoring long intervals.
The “old” ticker reserves the ADV slot in good time, as it is inserted on the ticker “bottom half” (ticker_job), after timeout.
The “new” ticker evaluates the nodes at the timeout, just prior to the ADV ticker node timeout and says “next node (ADV) does not have higher priority, no latency” and goes ahead and schedules the node.
So basically one could argue that the “new” ticker does it right… Why should old reservations have higher priority? If this is a feature of the ticker that actually makes things run better, obviously that’s how it should be.
Could this be the solution for the collision resolving issue? To set the default priority of ADV ticker node to -1?
Regarding the CPU usage, I will look into that tomorrow.
Fra: Chettimada, Vinayak Kariappa email@example.com
In the old ticker, conflicts are resolved on the following basis:
BLE controller and Flash driver are users of ticker, wherein a reserved time space by flash driver halts CPU and conflict resolution could be delay as much as the reserved duration requested by the flash driver.
Regarding the definition of priority, for me it meant a fixed precedence set by ‘an’ authority (like a master or a single entity controller in the system). In the old controller there is no single master/entity module that maintains the states/scheduling of the roles (adv, scan, slave, master or flash driver or any new proprietary role that can get added at runtime). The conflict is resolved by contest, hence my difficult decision to use the term ‘force’ that’s applied just-in-time by the contesting roles (or ticker instances).
With the introduction of ‘priority’ (set by an authority), I will prefer we continue to use ‘force’ being now derived from ‘priority’ but try to have conflict resolution by contest.
No. We don’t have an authority entity in the controller yet, we will need one that will need to decide on the priority based on all active ticker_instances. Do we want to go this way?
Regarding 1 "The scan_adv sample is missing required advertising events due to incorrect conflict resolution":
A quick test with this added shows identical behavior in adv_scan sample simulation for legacy- and new ticker.
Question: Should this feature be under some CONFIG_? That would allow users to chose legacy- or new behavior?
Regarding 2 "...any execution of ticker_worker inside an occupied region will introduce high Radio ISR latency":
This means that jitter will be seen on the prepare callback due to some early "must_expire", however, this is not thought to be a problem.
CONFIG_ not required, we use the legacy behavior for legacy BLE states/roles (adv, scan, slave, master & flash driver) and priority be used for 'future' feature requirement
This appears to be like revert back of #15787 conflict resolution from ticker_worker to ticker_job, if that is what you mean, I am ok. Reason, having the old implementation and adding a simple "must expire"check in ticker_job and a "would-have-been" callback seems reasonable. I would like the ticker_worker to be very lightweight to ensure soft realtime event anchors have least latency due to ticker's own code.
This would not be exactly like a revert of #15787. One reason for this is that the old implementation would potentially skip several intervals (in theory) to find a free region. I would prefer to only skip one interval per ticker_job (for each conflicting node). This way, you will only get one early "must expire" callback, and it will be easier to undo, if a ticker update opens a free slot (see next comment about dynamic audio slave timing).
Regarding audio timing requirements
In our audio implementation, we use a modified LLL BLE slave implementation, where after each successful RX of the primary packet, we add a 1 cycle latency to the ticker node (lazy_periodic = 1) to skip the retransmission, thereby freeing the next interval for other events. At the next prepare, we remove the latency (lazy_periodic = 0). By doing this, we can effectively implement optimal audio in LLL, without need for modifying ULL.
So maybe we're at a cross road where we need to decide if we have to split the ticker functionality in two: A legacy config to make sure ticker_worker is not run during radio activity, and an "advanced" (or whatever) version favoring real time and optimal conflict resolution?
Lets not skip one full interval, but :
This way if the winning ticker gets removed, then the must-expire ticker node gets the reserved region.
Case 1: Periodic must-expire ticker node and normal ticker node of same interval overlap with must-expire expiring 'before' normal ticker node:
Audio saves from using more CPU, i.e. calling ticker_update to ping-pong the lazy_periodic value.
Case 2: Periodic must-expire ticker node and normal ticker node of same interval overlap with must-expire expiring 'after' normal ticker node:
I do not think it is a good idea to involve LLL mechanisms in the basic scheduling rules for several reasons:
As I see it, we agree that the scheduler should have an option to support radios with critical IRQ latency, i.e.,
Thus, the loosing entries are moved out of the just forwarded reservation slot, however, the scheduling resolution is NOT done until the scheduler time/head reach them in a future time.
One remaining question is how the loosing entries are moved out of the just forwarded reservation slot, when this requires several connection intervals and the entries are of the type 'must-expire'. An easy solution would be to have counter to say how many times the 'must-expire' was skipped such that the 'must-expire' execution can take that into account when updating the event state etc.