-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluetooth: HCI command corruption on NUM_COMPLETED_PACKETS
#64158
Comments
(edited to have the right PR reference) @JordanYates wasn't #64106 the fix for this? (just wondering why they weren't linked and why this issue is still open) |
No, that PR stops the response (after allocation) from incorrectly giving the This issue is about the allocation process itself. |
@JordanYates got it, thanks. The challenge is that we don't know, and don't require the HCI driver to know, which command is completing through a Command Status or Command Complete event. Instead we allow the drivers to initiate the allocation of a buffer as soon as they've received a complete HCI Event header (which does not yet contain the information on what command completed). If we did have this information we could simply make an exception of not using |
Another thing I wanted to mention, is that is's absolutely essential that |
@JordanYates based on my two comments above (and since you've clearly looked into and thought about this a lot), do you have any proposals for how to fix this? |
Yea, never suspected that moving it after would be acceptable (and wouldn't even fix the issue of consuming the commands response buffer)
So my current solution (which I in no way recommend) is to simply drop all The path I went down before going with the hack was providing the opcode to I suspect the best path would be to special case the |
Special-casing this command completion on the HCI driver-side has similar implications as digging out the opcode and passing it to the buffer allocator. I.e. either way we impose extra parsing requirements for every driver. Another option is |
Sure, but there aren't that many drivers, and it would allow for extra validation in the future (checking that the
I have no comment on how useful it is overall, I saw it was enabled on all the samples, so I enabled it too. |
@cvinayak will get a clarification from the SIG on BT spec 5.4, Vol.4 Part E section 7.3.40 re. the Events generated |
I am unwilling to disable the flow control without an explanation of why it exists in the first place, I don't want the cure to be worse than the disease. |
@JordanYates The controller to host flow control exists to aid the possibility to limit the statically allocated host rx buffers such that controller can throttle ACL data reception in sync with available rx buffers in the host. Typically there is the transport flow control that can implicitly ensure there is no rx buffer overflow; but this has the disadvantage that under high rx throughput other HCI events that are received serialized in the transport will be delayed until all queued ACL data can be dequeued by the host. If your application, host and controller execute on a single CPU then the above mentioned implicit flow control maintained by the buffer allocation and kernel semaphore/queues should ensure no rx buffer overflow occurs. And if under high rx throughput, delayed events are acceptable (example, Tx waiting for num of completed packets is delayed until all rx buffers enqueued inside the controller are received by the host) , you can safely disable use of controller to host flow control. |
FYI, the Nordic controller in NCS 2.5.0 has a workaround for this. The Controller will now never send Command Complete for Host Num Complete. Link to changelog. |
`bt_buf_get_cmd_complete` is broken, and fixing it requires Fixes: zephyrproject-rtos#64158 Like every command completed event, `BT_HCI_OP_HOST_NUM_COMPLETED_PACKETS` are now placed in normal event buffers. These packets will no longer confuse the Host.
`bt_buf_get_cmd_complete` is broken, and fixing would put even more complexity into the HCI drivers. To fix the problem while decreasing complexity in the drivers, this patch removes its use and makes the host accept command complete events in normal event buffers. This decision is based on a goal to simplify the drivers. This patch also aligns well with the goal of getting rid of generic event buffers. Fixes: zephyrproject-rtos#64158 Like every command completed event, `BT_HCI_OP_HOST_NUM_COMPLETED_PACKETS` are now placed in normal event buffers. These packets will no longer confuse the Host.
`bt_buf_get_cmd_complete` is broken, and fixing would put even more complexity into the HCI drivers. To fix the problem while decreasing complexity in the drivers, this patch removes its use and makes the host accept command complete events in normal event buffers. This decision is based on a goal to simplify the drivers. This patch also aligns well with the goal of getting rid of generic event buffers. Fixes: zephyrproject-rtos#64158 Like every command completed event, `BT_HCI_OP_HOST_NUM_COMPLETED_PACKETS` are now placed in normal event buffers. These packets will no longer confuse the Host. Signed-off-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>
`bt_buf_get_cmd_complete` is broken due to zephyrproject-rtos#64158, and fixing it would require changing its signature and put even more complexity into the HCI drivers, as it would require the drivers to perform an even deeper peek into the event in order to observe the opcode. Instead of the above, this patch removes the use of `bt_buf_get_cmd_complete` and adds logic to allow the host to accept command complete events in normal event buffers. The above means performing a copy into the destination buffer, which is the original command buffer. This is a small inefficiency for now, but we should strive to redesign the host into a streaming architecture as much as possible and handle events immediately instead of retaining buffers. This fixes zephyrproject-rtos#64158: Like all command completed events, the completion event for `BT_HCI_OP_HOST_NUM_COMPLETED_PACKETS` is now placed in normal event buffers. The the logic where the host discards this event is already present. Since it's discarded, it will not interfere with the logic around `bt_dev.cmd_send`. Signed-off-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>
`bt_buf_get_cmd_complete` is broken due to zephyrproject-rtos/zephyr#64158, and fixing it would require changing its signature and put even more complexity into the HCI drivers, as it would require the drivers to perform an even deeper peek into the event in order to observe the opcode. Instead of the above, this patch removes the use of `bt_buf_get_cmd_complete` and adds logic to allow the host to accept command complete events in normal event buffers. The above means performing a copy into the destination buffer, which is the original command buffer. This is a small inefficiency for now, but we should strive to redesign the host into a streaming architecture as much as possible and handle events immediately instead of retaining buffers. This fixes zephyrproject-rtos/zephyr#64158: Like all command completed events, the completion event for `BT_HCI_OP_HOST_NUM_COMPLETED_PACKETS` is now placed in normal event buffers. The the logic where the host discards this event is already present. Since it's discarded, it will not interfere with the logic around `bt_dev.cmd_send`. Signed-off-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>
`bt_buf_get_cmd_complete` is broken due to #64158, and fixing it would require changing its signature and put even more complexity into the HCI drivers, as it would require the drivers to perform an even deeper peek into the event in order to observe the opcode. Instead of the above, this patch removes the use of `bt_buf_get_cmd_complete` and adds logic to allow the host to accept command complete events in normal event buffers. The above means performing a copy into the destination buffer, which is the original command buffer. This is a small inefficiency for now, but we should strive to redesign the host into a streaming architecture as much as possible and handle events immediately instead of retaining buffers. This fixes #64158: Like all command completed events, the completion event for `BT_HCI_OP_HOST_NUM_COMPLETED_PACKETS` is now placed in normal event buffers. The the logic where the host discards this event is already present. Since it's discarded, it will not interfere with the logic around `bt_dev.cmd_send`. Signed-off-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no>
`bt_buf_get_cmd_complete` is broken due to zephyrproject-rtos/zephyr#64158, and fixing it would require changing its signature and put even more complexity into the HCI drivers, as it would require the drivers to perform an even deeper peek into the event in order to observe the opcode. Instead of the above, this patch removes the use of `bt_buf_get_cmd_complete` and adds logic to allow the host to accept command complete events in normal event buffers. The above means performing a copy into the destination buffer, which is the original command buffer. This is a small inefficiency for now, but we should strive to redesign the host into a streaming architecture as much as possible and handle events immediately instead of retaining buffers. This fixes zephyrproject-rtos/zephyr#64158: Like all command completed events, the completion event for `BT_HCI_OP_HOST_NUM_COMPLETED_PACKETS` is now placed in normal event buffers. The the logic where the host discards this event is already present. Since it's discarded, it will not interfere with the logic around `bt_dev.cmd_send`. (cherry picked from commit 1cb83a8) Original-Signed-off-by: Aleksander Wasaznik <aleksander.wasaznik@nordicsemi.no> GitOrigin-RevId: 1cb83a8 Change-Id: I9c1641f1bde863f4a7e19334f2f9b12cf52e9ed2 Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/zephyr/+/5280013 Tested-by: ChromeOS Prod (Robot) <chromeos-ci-prod@chromeos-bot.iam.gserviceaccount.com> Reviewed-by: Al Semjonovs <asemjonovs@google.com> Commit-Queue: Al Semjonovs <asemjonovs@google.com> Tested-by: Al Semjonovs <asemjonovs@google.com>
Describe the bug
HCI command buffers can be corrupted by the host receiving a
BT_HCI_OP_HOST_NUM_COMPLETED_PACKETS
(opcode 0x0c35) response at an inopportune time.hci_core.c
assignsbt_dev.sent_cmd
before actually sending the command to the HCI backend:zephyr/subsys/bluetooth/host/hci_core.c
Lines 2692 to 2696 in 625d1bf
bt_buf_get_cmd_complete
, which is used to allocate a buffer for the command response, naively re-uses the command buffer and overwrites the packet type and length.zephyr/subsys/bluetooth/host/buf.c
Lines 89 to 103 in 625d1bf
BT_HCI_OP_HOST_NUM_COMPLETED_PACKETS
is a special command in that it ignores the normal flow control, and normally does not receive a response. From the BT spec:If such a response is received, after the host has assigned
bt_dev.sent_cmd
, but before it is actually sent, the buffer allocation function will corrupt the command parameters, usually leading to the HCI backend rejecting the command as invalid.Expected behavior
Receiving
BT_HCI_OP_HOST_NUM_COMPLETED_PACKETS
should not corrupt pending HCI commands.Impact
Arbitrary HCI command failures
The text was updated successfully, but these errors were encountered: