New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ble: gatt: db_hash_work runs for too long and makes serial communication fail #43811
Comments
@thoh-ot assigned this to you following-up on the work you did on GATT hashing recently, but feel free to reassign. By the way, have you seen this on the Oticon side? @closatt can you please try to reproduce with the latest Zephyr? And, if possible, provide a basic sample that can reproduce the problem. |
@thoh-ot will you be able to take a look at this? Otherwise we'll try to find someone else to assign this to. |
So I have been thinking a little bit about this - to me this seems like an architectural issue with using the system work queue for "long running" work while also servicing other constraints in the system. As the system work queue executes the work in a cooperative thread with fairly high priority I think I won't have the capacity to make a long term viable solution. |
@jori-nordic could take a look into using a user work queue or pre-emptible threads to defer long duration operations/functions. Ex. BT ECC thread (hci_ecc.c). |
@jori-nordic thanks for the patch and sorry for the delay. |
@closatt good to hear. Something else needed attention the past week, but I'll try to merge that patch this week. |
Send long-running tasks to a dedicated low-priority workqueue. This shouldn't increase memory usage since by doing this, we get rid of the ECC processing thread. This should fix issues like zephyrproject-rtos#43811, since the system workqueue runs at a cooperative priority, and the new dedicated one runs at a pre-emptible priority. Fixes zephyrproject-rtos#43811 Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
Send long-running tasks to a dedicated low-priority workqueue. This shouldn't increase memory usage since by doing this, we get rid of the ECC processing thread. This should fix issues like #43811, since the system workqueue runs at a cooperative priority, and the new dedicated one runs at a pre-emptible priority. Fixes #43811 Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
Environment
Zephyr version: 2.5
Target: NRF52840
Bug description
Just after registering a service, UART communication is blocked for more than 10ms, causing a communication failure. In the following image, the moment when the bus is interrupted is circled in red. As a result, the chip answers before the tx communication finishes.
Some investigation showed that the
db_hash_work
(gatt.c
) execution is taking several ms, and blocks all serial communications during this time as sysworkq is very high priority:db_hash_work
was executed just at the time of the bug. Setting a k_busy_wait(5000) indb_hash_process
confirmed this, as the interruption time was now 15ms.The execution time of this work depends of the number of services (in this case 13 services).
The only simple temporary workaround we found (which is not thread safe) is to add a k_sleep of several ms after each call to bt_gatt_service_register.
To Reproduce
bt_gatt_service_register
This communication failure is not easy to reproduce as it requires a particular timing. But even without reproducing the communication failure, the fact that db_hash_work lasts a long time can be easily seen with Segger SystemView (or even with timestamped logs placed at the beginning and at the end of
db_hash_process
function). For this, it is enough to register a lot of services (for example 15 with each 5 characteristics) withbt_gatt_service_register
and capture the right moment, 10ms after the last service register.Maybe the execution time is completely different on another target than the NRF52840.
Expected behavior
If this
db_hash_process
can be interrupted, it should maybe be executed in a low priority thread. If it cannot be interrupted, I think there should be at least a callback indicating the caller ofbt_gatt_service_register
thatdb_hash_process
has finished and serial communications are safe.There are certainly better solutions that I didn't think of.
The text was updated successfully, but these errors were encountered: