Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: l2cap do not recover when faced with long packets and run out of buffers #20640

Closed
jfischer-no opened this issue Nov 12, 2019 · 3 comments · Fixed by #20661
Closed
Assignees
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@jfischer-no
Copy link
Collaborator

jfischer-no commented Nov 12, 2019

Describe the bug
l2cap do not recover when faced with long packets and runs out of buffers. Discovered while testing hci_usb with IPSP sample. After ping with long payload or short intervals the IPSP device remains unresponsive (does not recover / not respond anymore).

If the l2cap_chan_le_send_sdu returns EAGAIN and the buffer is queued and then in l2cap_chan_le_send_resume the same situation happens again because the l2cap_chan_le_send_sdu() and subsequent calls of
l2cap_chan_le_send()->
l2cap_chan_create_seg()->
l2cap_alloc_seg()->
bt_l2cap_create_pdu() fail (with bt_conn: Unable to allocate buffer), the device remains unresponsive or not recover even if the host stops sending data. l2cap also do not try l2cap_chan_le_send_resume() again.

To Reproduce
Build, flash, connect to IPSP sample, try ping -i 0,3 -s 512 2001:db8::1 or ping -i 1 -s 1024 2001:db8::1

Expected behavior
The device should recover after ping flood, even if it does not have enough buffer for the data.

console output

[00:01:26.327,941] <dbg> bt_l2cap.l2cap_chan_le_send_resume: buf 0x20009f90 sent 113                     
[00:01:26.327,972] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009f58 len 10             
[00:01:26.328,002] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 10 credits 50         
[00:01:26.328,033] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 10                        
[00:01:26.328,063] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009f90 len 23             
[00:01:26.328,094] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 49         
[00:01:26.328,125] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23                        
[00:01:26.328,155] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009fe4 len 23     
[00:01:26.328,186] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 48         
[00:01:26.328,216] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23                        
[00:01:26.328,247] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x2000a070 len 23           
[00:01:26.328,277] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 47         
[00:01:26.328,308] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23                        
[00:01:26.328,338] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x2000a08c len 23
[00:01:26.328,369] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 46
[00:01:26.328,369] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.328,430] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x2000a0a8 len 23
[00:01:26.328,460] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 45
[00:01:26.328,460] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.328,521] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x2000a0c4 len 13
[00:01:26.328,521] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 13 credits 44
[00:01:26.328,552] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 13
[00:01:26.328,613] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x2000a054 len 23
[00:01:26.328,643] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 43
[00:01:26.328,643] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.328,704] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009f20 len 23
[00:01:26.328,704] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 42
[00:01:26.328,735] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.328,796] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009ecc len 23
[00:01:26.328,796] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 41
[00:01:26.328,826] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.328,857] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009654 len 23
[00:01:26.328,887] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 40
[00:01:26.328,918] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.328,948] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x2000961c len 23
[00:01:26.328,979] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 39
[00:01:26.328,979] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.329,040] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009638 len 13
[00:01:26.329,071] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 13 credits 38
[00:01:26.329,071] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 13
[00:01:26.329,132] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009f74 len 23
[00:01:26.329,132] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 37
[00:01:26.329,162] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.329,193] <wrn> bt_conn: Unable to allocate buffer with K_NO_WAIT
[00:01:26.329,193] <wrn> bt_conn: Unable to allocate buffer: timeout -1
[00:01:27.971,221] <dbg> bt_l2cap.bt_l2cap_recv: Packet for CID 64 len 22
[00:01:27.971,282] <dbg> bt_l2cap.l2cap_rx_process: chan 0x2000d18c buf 0x20009590
[00:01:27.971,313] <dbg> bt_l2cap.l2cap_chan_le_recv: chan 0x2000d18c len 20 sdu_len 20
[00:01:27.971,313] <dbg> net_bt.ipsp_alloc_buf: (0x20002de0): Channel 0x2000d18c requires buffer
[00:01:27.971,343] <dbg> bt_l2cap.l2cap_chan_le_recv_seg: chan 0x2000d18c seg 1 len 20
[00:01:27.971,374] <dbg> bt_l2cap.l2cap_chan_le_recv_sdu: chan 0x2000d18c len 20
@jfischer-no jfischer-no added bug The issue is a bug, or the PR is fixing a bug area: Bluetooth labels Nov 12, 2019
@jhedberg jhedberg added the priority: medium Medium impact/importance bug label Nov 12, 2019
@Vudentz
Copy link
Contributor

Vudentz commented Nov 13, 2019

It looks like we are not generating a -EAGAIN in case we run out of memory, that would indeed stop us from resuming so the following patch should fix it:

diff --git a/subsys/bluetooth/host/l2cap.c b/subsys/bluetooth/host/l2cap.c
index 24d4b5f261..65292ca715 100644
--- a/subsys/bluetooth/host/l2cap.c
+++ b/subsys/bluetooth/host/l2cap.c
@@ -1181,7 +1181,7 @@ static int l2cap_chan_le_send(struct bt_l2cap_le_chan *ch, struct net_buf *buf,
 
        seg = l2cap_chan_create_seg(ch, buf, sdu_hdr_len);
        if (!seg) {
-               return -ENOMEM;
+               return -EAGAIN;
        }
 
        /* Channel may have been disconnected while waiting for a buffer */

@jfischer-no
Copy link
Collaborator Author

@Vudentz I thought so too, but was not successful. I will test it again, maybe I missed something

@jfischer-no
Copy link
Collaborator Author

@Vudentz As I mentioned above it is not enough. l2cap debug logging level is to noisy, I switch to info and changed few lines:

diff --git a/subsys/bluetooth/host/l2cap.c b/subsys/bluetooth/host/l2cap.c
index 24d4b5f261..52a70031c7 100644
--- a/subsys/bluetooth/host/l2cap.c
+++ b/subsys/bluetooth/host/l2cap.c
@@ -1181,7 +1181,8 @@ static int l2cap_chan_le_send(struct bt_l2cap_le_chan *ch, struct net_buf *buf,
 
        seg = l2cap_chan_create_seg(ch, buf, sdu_hdr_len);
        if (!seg) {
-               return -ENOMEM;
+               BT_INFO("l2cap_chan_create_seg() failed");
+               return -EAGAIN;
        }
 
        /* Channel may have been disconnected while waiting for a buffer */
@@ -1259,6 +1260,7 @@ static int l2cap_chan_le_send_sdu(struct bt_l2cap_le_chan *ch,
 
                ret = l2cap_chan_le_send(ch, frag, 0);
                if (ret < 0) {
+                       BT_INFO("l2cap_chan_le_send() failed %d", ret);
                        if (ret == -EAGAIN) {
                                /* Store sent data into user_data */
                                memcpy(net_buf_user_data(frag), &sent,
@@ -1299,10 +1301,11 @@ static void l2cap_chan_le_send_resume(struct bt_l2cap_le_chan *ch)
        while ((buf = l2cap_chan_le_get_tx_buf(ch))) {
                int sent = *((int *)net_buf_user_data(buf));
 
-               BT_DBG("buf %p sent %u", buf, sent);
+               BT_INFO("buf %p sent %u", buf, sent);
 
                sent = l2cap_chan_le_send_sdu(ch, &buf, sent);
                if (sent < 0) {
+                       BT_INFO("l2cap_chan_le_send_sdu() failed %d", sent);
                        if (sent == -EAGAIN) {
                                ch->tx_buf = buf;
                        }
@@ -1328,7 +1331,7 @@ static void le_credits(struct bt_l2cap *l2cap, u8_t ident,
        cid = sys_le16_to_cpu(ev->cid);
        credits = sys_le16_to_cpu(ev->credits);
 
-       BT_DBG("cid 0x%04x credits %u", cid, credits);
+       BT_INFO("cid 0x%04x credits %u", cid, credits);
 
        chan = bt_l2cap_le_lookup_tx_cid(conn, cid);
        if (!chan) {
@@ -1892,6 +1895,7 @@ int bt_l2cap_chan_send(struct bt_l2cap_chan *chan, struct net_buf *buf)
        if (err < 0) {
                if (err == -EAGAIN) {
                        /* Queue buffer to be sent later */
+                       BT_INFO("Queue buffer, l2cap_chan_le_send_sdu() failed %d", err);
                        net_buf_put(&(BT_L2CAP_LE_CHAN(chan))->tx_queue, buf);
                        return *((int *)net_buf_user_data(buf));
                }

console:

[00:00:03.365,844] <inf> ipsp: Run IPSP sample
[00:00:03.367,126] <inf> ipsp: Starting to wait
[00:00:12.587,524] <inf> os: prio recv thread stack :   unused 336      usage 112 / 448 (25 %)
[00:00:12.588,012] <inf> os: recv thread stack :        unused 736      usage 288 / 1024 (28 %)
[00:00:12.840,148] <inf> net_config: IPv6 address: 2001:db8::1
[00:00:12.840,209] <inf> net_config: IPv6 address: 2001:db8::1
[00:00:12.889,587] <inf> bt_l2cap: cid 0x0040 credits 3
[00:00:12.890,289] <inf> bt_l2cap: cid 0x0040 credits 2
[00:00:12.890,899] <inf> bt_l2cap: cid 0x0040 credits 2
[00:00:12.938,934] <inf> bt_l2cap: cid 0x0040 credits 1
[00:00:13.838,928] <inf> bt_l2cap: cid 0x0040 credits 1
[00:00:14.838,806] <inf> bt_l2cap: cid 0x0040 credits 1
[00:00:17.888,641] <inf> os: prio recv thread stack :   unused 248      usage 200 / 448 (44 %)
[00:00:18.189,025] <inf> os: recv thread stack :        unused 736      usage 288 / 1024 (28 %)
[00:00:21.841,735] <inf> bt_l2cap: cid 0x0040 credits 2
[00:00:22.141,998] <inf> bt_l2cap: cid 0x0040 credits 3
[00:00:22.343,963] <inf> bt_l2cap: l2cap_chan_le_send() failed -11
[00:00:22.343,963] <inf> bt_l2cap: Queue buffer, l2cap_chan_le_send_sdu() failed -11
[00:00:22.344,207] <inf> bt_l2cap: cid 0x0040 credits 51
[00:00:22.344,543] <inf> bt_l2cap: buf 0x20009f90 sent 113
[00:00:22.344,879] <wrn> bt_conn: Unable to allocate buffer with K_NO_WAIT
[00:00:22.344,879] <wrn> bt_conn: Unable to allocate buffer: timeout -1
[00:00:22.344,879] <inf> bt_l2cap: l2cap_chan_create_seg() failed
[00:00:22.344,909] <inf> bt_l2cap: l2cap_chan_le_send() failed -11
[00:00:22.344,909] <inf> bt_l2cap: l2cap_chan_le_send_sdu() failed -11

l2cap_chan_le_send_resume() is called only once.

Vudentz added a commit to Vudentz/zephyr that referenced this issue Dec 12, 2019
When a segment could not be allocated it should be possible to resume
sending it later once previous segments complete, the only exception is
when there is no previous activity and we are unable to alocate even the
very first segment which should indicate to the caller that it would
block since that only happens on syswq the caller might need to defer to
another thread or resubmit the work.

Fixes zephyrproject-rtos#20640

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
jhedberg pushed a commit that referenced this issue Dec 13, 2019
When a segment could not be allocated it should be possible to resume
sending it later once previous segments complete, the only exception is
when there is no previous activity and we are unable to alocate even the
very first segment which should indicate to the caller that it would
block since that only happens on syswq the caller might need to defer to
another thread or resubmit the work.

Fixes #20640

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants