-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Bluetooth: Host: Fixed where bt_send returns an error but is actually… #74287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@LingaoM while the change looks reasonable, you need to explain (in the commit message) in more detail the sequence of events which will trigger it. Is this purely theoretical, or you actually saw it in practice? If the latter, what kind of build configuration & HW did you have? Btw, there's a merge conflict, so you need to rebase. |
9f2b545 to
4487985
Compare
In fact, it is not a problem with Zephyr itself. The problem occurred when we ran the Zephyr protocol stack on Linux and tested it. However, we believe that the zephyr protocol stack uses a local variable and should be cleared, which is safer :). |
The `sync` is a local variable in the stack space. Clearing this pointer explicitly before releasing it is a safer way. Signed-off-by: Lingao Meng <menglingao@xiaomi.com>
c92df8b to
7c2c4a0
Compare
|
@alwa-nordic CC :). |
7c2c4a0 to
6272e3c
Compare
|
@alwa-nordic isn't that what you were trying to avoid with 1cb83a8 ? |
Add `rsp` field to avoid deep-copy for every cmd. Signed-off-by: Lingao Meng <menglingao@xiaomi.com>
6272e3c to
c130755
Compare
I don't think my approach delays the event flow. If you look at the implementation of |
Thalley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what this PR really does (besides the renames), is that it replaced a copy with another pointer.
I'm not convinced that this is a better approach as it doesn't seem to solve any issues, but does increase our RAM usage for a small performance gain.
The RAM usage is easily measurable, but do we save anything meaningful when not doing the copy?
subsys/bluetooth/host/hci_core.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is cmd(buf)->rsp always non-NULL here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed :)
c130755 to
a392245
Compare
@Thalley I don't think this change will increase ram too lot, if you see https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/host/hci_core.c#L121 , that the default BTW: From coding style, use double-copy is not a good-idea, although not cause performance decrease, but it's odd. |
7ff6d51 to
299c117
Compare
I tend to agree, but also consider that the size of events and responses are usually so small that it doesn't really matter either :) Not opposed to the change, but still unsure whether it's overall better. |
Since `send_cmd` will follow request-response. So rename seqerate, to make clear. Signed-off-by: Lingao Meng <menglingao@xiaomi.com>
299c117 to
b2a66a8
Compare
That this true this PR not improve performance a lot , but for coding style become more concise indeed, at least i think. :) |
Most cases did indeed involve the host making the deadlock by using Anyway, I was left scratching my head at the previous logic (what does |
|
@jori-nordic BabbleSim Tests PASSED. |
I think it's a sanity-check that the HCI driver used the appropriate buffer allocation method for the command complete event which should result in getting hold of the original command buffer, and if it didn't do that (used some generic allocator or even its own pool) then the code was trying to work around it. |
The significant change in this PR is that applications get a reference to the buffer the Command Complete event was received into. This is orthogonal to the primary reason for 1cb83a8, to remove the assumption that a Command Complete event is a response to the previously sent command. More importantly, there exists a separate issue that blocking the HCI event stream makes it impossibly complicated to guarantee no deadlocks form. Giving the application a reference to a event buffer is a hazard in this respect. The hazard is equivalent to invoking an application callback from To remain safe, the application must give the buffer back before it can expect any synchronizing with the Bluetooth Host to complete, since the Host may potentially be blocked by the application. In terms if a callback, we would just say that the callback should be ISR-safe. Stalling due to a held reference is very non-intuitive for our users. It's even less intuitive than stalling due to control held in a callback, which is already a confusing topic. (Aside: The obvious version would be a event loop, and the application not getting any events when the application is not polling for events because it's handling the previous event.) Due to the hazard outlined above, I am against allowing the application to get a reference to a stack-internal buffer in the common case. I would ok adding a second 'expert API' for those who got to go fast. Then there is also the question of benchmarking this. Do you have any numbers from experiments that show a gain in speed or a real reduction in power use? I fear we are doing premature optimization. |
|
I don't think so, generally this API bt_hci_cmd_send_sync only call by host stack, not by application user. Even this api is public, but only in some situation where the user maybe call this function with vendor command. But most of this API only call by host stack, which code belongs ours maintained, we can ensure that. |
| net_buf_reset(buf); | ||
| bt_buf_set_type(buf, BT_BUF_EVT); | ||
| net_buf_reserve(buf, BT_BUF_RESERVE); | ||
| net_buf_add_mem(buf, evt_buf->data, evt_buf->len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alwa-nordic Here we actually borrow the buffer of cmd to carry the data of rsp, but there is a premise here that the length of cmd must be greater than the length of rsp. Therefore, the current code implementation is actually the maximum value of the two areas. https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/host/hci_core.c#L162 .After my PR, this constraint can actually be avoided.
Summarize:
BTW:
I checked all the places where rsp is used in the host stack. There are 38 places in total, and there is no block in any place. |
From my perspective, you have identified a defect here. The Command Complete events can and should go in |
|
Enhancement
syncmake more safe.rspto avoiding deep-copy. (related: Bluetooth: Host: Removebt_buf_get_cmd_complete#68008)