Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth host: bt_conn_tx_processor() does not handle multiple ISO connections being ready correctly #74321

Closed
Tronil opened this issue Jun 14, 2024 · 5 comments · Fixed by #74385
Assignees
Labels
area: Bluetooth Host area: Bluetooth ISO Bluetooth LE Isochronous Channels area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug

Comments

@Tronil
Copy link
Contributor

Tronil commented Jun 14, 2024

Describe the bug
When multiple ISO payloads are generated at the same time (as in some BSim tests), only one of the connections are processed by bt_conn_tx_processor(). The other(s) are only processed after the next call to bt_conn_data_ready() - which is likely at the next ISO event, i.e. way too late.

The issue arises because tx_data_pull (iso_data_pull in this case) eventually returns NULL after the payload(s) for a particular ISO "connection" has been handled - and this causes bt_conn_tx_processor() to stop completely instead of re-launching itself via bt_tx_irq_raise.

This bug was introduced by PR #72090.

To Reproduce
Steps to reproduce the behavior:
Run BSim test cap_broadcast_ac_13.sh and look at the iso_data_pull trace output.

Expected behavior
All ISO payloads should get processed

Impact
Breaks multiple concurrent ISO streams (at the very least; ACL connections are probably impacted as well)

Logs and console output
Excerpt from a trace from PR #74244 (where this was found):

d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_iso: bt_iso_chan_send: send-iso (no ts)�[0m
d_00: @00:00:01.387051  conn_iso_send conn 0x84e8c80 buf 0x85787a8 has_ts 0
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_iso: conn_iso_send: 0x85787a8 put on list�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: bt_conn_data_ready: DR�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: bt_conn_data_ready: already in list�[0m
d_00: @00:00:01.387051  broadcast_sent_cb Sent payload with seq_num: 183 on stream 0x84a3980
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: tx_notify: raise TX IRQ�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: tx_complete_work: conn 0x84e8d5c�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: tx_notify: conn 0x84e8d5c�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: tx_notify: tx 0x84d6f60 cb 0x81a1d6b user_data (nil)�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: tx_free: 0x84d6f60�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_iso: bt_iso_chan_send: send-iso (no ts)�[0m
d_00: @00:00:01.387051  conn_iso_send conn 0x84e8d5c buf 0x85787d0 has_ts 0
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_iso: conn_iso_send: 0x85787d0 put on list�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: bt_conn_data_ready: DR�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: bt_conn_data_ready: raised�[0m
d_00: @00:00:01.387051  broadcast_sent_cb Sent payload with seq_num: 183 on stream 0x84a39c4
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: tx_notify: raise TX IRQ�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: bt_conn_tx_processor: start�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: should_stop_tx: 0x84e8c80�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: bt_conn_tx_processor: processing conn 0x84e8c80�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_iso: iso_data_pull: conn 0x84e8c80 amount 255�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_iso: iso_data_pull: last frag, pop buf 0x8578780�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: bt_conn_tx_processor: pop: cb 0x81a1d6b userdata (nil)�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: bt_conn_tx_processor: TX process: conn 0x84e8c80 buf 0x8578780 (last)�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: send_buf: conn 0x84e8c80 buf 0x8578780 len 102 buf->len 102�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: conn_tx_alloc: 0x84d6f78�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: send_buf: move 0x8578780 ref in�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: get_data_frag: get-acl-frag: outside 0x8578780 window 0x857bbc0 size 102�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: send_buf: send frag: buf 0x8578780 len 102�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: send_buf: conn 0x84e8c80 buf 0x857bbc0 len 102 flags 0x02�[0m
d_00: @00:00:01.387051  hci_iso_handle BIS payload for source_hdl 0 (HCI handle 3) with ts: 1377899/1377899
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: frag_destroy: �[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: bt_conn_tx_processor: start�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: should_stop_tx: 0x84e8c80�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: bt_conn_tx_processor: processing conn 0x84e8c80�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_iso: iso_data_pull: conn 0x84e8c80 amount 255�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_iso: iso_data_pull: last frag, pop buf 0x85787a8�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: bt_conn_tx_processor: pop: cb 0x81a1d6b userdata (nil)�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: bt_conn_tx_processor: TX process: conn 0x84e8c80 buf 0x85787a8 (last)�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: send_buf: conn 0x84e8c80 buf 0x85787a8 len 102 buf->len 102�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: conn_tx_alloc: 0x84d6f84�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: send_buf: move 0x85787a8 ref in�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: get_data_frag: get-acl-frag: outside 0x85787a8 window 0x857bbc0 size 102�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: send_buf: send frag: buf 0x85787a8 len 102�[0m
d_00: @00:00:01.387051  [00:00:01.387,023] �[0m<dbg> bt_conn: send_buf: conn 0x84e8c80 buf 0x857bbc0 len 102 flags 0x02�[0m
d_00: @00:00:01.387051  hci_iso_handle BIS payload for source_hdl 0 (HCI handle 3) with ts: 1377899/1377899

Environment (please complete the following information):

  • OS: Linux (Debian 12)
  • Toolchain: Using Zephyr CI docker image
  • Commit SHA: d45605e
@Tronil Tronil added the bug The issue is a bug, or the PR is fixing a bug label Jun 14, 2024
@jori-nordic
Copy link
Collaborator

jori-nordic commented Jun 15, 2024

Run BSim test cap_broadcast_ac_13.sh and look at the iso_data_pull trace output.

One shouldn't have to scrutinize logs to figure out if a test is PASS or FAIL.
I'll look at the bug, but please submit a PR to make the test fail in this case.

edit: re-reading my comment, that sounds a bit harsh, sorry.
Thank you for the bug report and analysis. I don't want to shoot the messenger, I was just mad at the misleading passing tests.

@jori-nordic jori-nordic added the priority: high High impact/importance bug label Jun 17, 2024
jori-nordic added a commit to jori-nordic/zephyr that referenced this issue Jun 17, 2024
ISO connections that were in the TX queue were not getting serviced in
time. This happens because `iso_data_pull()` returns `NULL` when that
particular connection (`conn`) is done sending.

But it doesn't trigger the TX processor again to process other channels in
the queue. This patch fixes that by calling `bt_tx_irq_raise()`.

We can't do this from `conn.c` as we don't know if the `NULL` returned is
because the current channel is out of data or because it has data but it
can't send it (e.g. the current buf is being "viewed" already).

Fixes zephyrproject-rtos#74321

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
@jori-nordic
Copy link
Collaborator

@Tronil re: the supplied logs.
I don't really understand what I'm looking at or what I'm supposed to look for. The traces all have the same timestamp so I don't see the delay. The problem you describe is real though.

Does the linked patch fix the issue?

@Tronil
Copy link
Contributor Author

Tronil commented Jun 17, 2024

PR #74244 fails the bsim tests because of this; So if you are looking for a failing test case, you can use that. Note that it is of course not actually testing this part of the code directly, this issue just happens to make an existing test case fail because of this issue.

jori-nordic added a commit to jori-nordic/zephyr that referenced this issue Jun 17, 2024
ISO connections that were in the TX queue were not getting serviced in
time. This happens because `iso_data_pull()` returns `NULL` when that
particular connection (`conn`) is done sending.

But it doesn't trigger the TX processor again to process other channels in
the queue. This patch fixes that by calling `bt_tx_irq_raise()`.

We can't do this from `conn.c` as we don't know if the `NULL` returned is
because the current channel is out of data or because it has data but it
can't send it (e.g. the current buf is being "viewed" already).

Fixes zephyrproject-rtos#74321

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
@jori-nordic
Copy link
Collaborator

PR #74244 fails the bsim tests because of this

Just what I was looking for, thanks. The test now passes with #74385 cherry-picked. Could you also test on your end?

@Tronil
Copy link
Contributor Author

Tronil commented Jun 17, 2024

PR #74244 fails the bsim tests because of this

Just what I was looking for, thanks. The test now passes with #74385 cherry-picked. Could you also test on your end?

I can confirm that the test passes with the change from #74385

coreboot-org-bot pushed a commit to coreboot/zephyr-cros that referenced this issue Jun 21, 2024
ISO connections that were in the TX queue were not getting serviced in
time. This happens because `iso_data_pull()` returns `NULL` when that
particular connection (`conn`) is done sending.

But it doesn't trigger the TX processor again to process other channels in
the queue. This patch fixes that by calling `bt_tx_irq_raise()`.

We can't do this from `conn.c` as we don't know if the `NULL` returned is
because the current channel is out of data or because it has data but it
can't send it (e.g. the current buf is being "viewed" already).

Fixes zephyrproject-rtos/zephyr#74321

(cherry picked from commit 8af7180)

Original-Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
GitOrigin-RevId: 8af7180
Change-Id: I104f1fd52c9b2fd8f129b4564a84af2a52a40e9e
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/zephyr/+/5644690
Tested-by: ChromeOS Prod (Robot) <chromeos-ci-prod@chromeos-bot.iam.gserviceaccount.com>
Commit-Queue: Dawid Niedźwiecki <dawidn@google.com>
Reviewed-by: Dawid Niedźwiecki <dawidn@google.com>
rugeGerritsen pushed a commit to rugeGerritsen/sdk-zephyr that referenced this issue Jul 31, 2024
ISO connections that were in the TX queue were not getting serviced in
time. This happens because `iso_data_pull()` returns `NULL` when that
particular connection (`conn`) is done sending.

But it doesn't trigger the TX processor again to process other channels in
the queue. This patch fixes that by calling `bt_tx_irq_raise()`.

We can't do this from `conn.c` as we don't know if the `NULL` returned is
because the current channel is out of data or because it has data but it
can't send it (e.g. the current buf is being "viewed" already).

Fixes zephyrproject-rtos/zephyr#74321

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
(cherry picked from commit 8af7180)
nordicjm pushed a commit to nrfconnect/sdk-zephyr that referenced this issue Aug 1, 2024
ISO connections that were in the TX queue were not getting serviced in
time. This happens because `iso_data_pull()` returns `NULL` when that
particular connection (`conn`) is done sending.

But it doesn't trigger the TX processor again to process other channels in
the queue. This patch fixes that by calling `bt_tx_irq_raise()`.

We can't do this from `conn.c` as we don't know if the `NULL` returned is
because the current channel is out of data or because it has data but it
can't send it (e.g. the current buf is being "viewed" already).

Fixes zephyrproject-rtos/zephyr#74321

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
(cherry picked from commit 8af7180)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Host area: Bluetooth ISO Bluetooth LE Isochronous Channels area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants