Bluetooth: l2cap do not recover when faced with long packets and run out of buffers #20640

jfischer-no · 2019-11-12T19:23:51Z

Describe the bug
l2cap do not recover when faced with long packets and runs out of buffers. Discovered while testing hci_usb with IPSP sample. After ping with long payload or short intervals the IPSP device remains unresponsive (does not recover / not respond anymore).

If the l2cap_chan_le_send_sdu returns EAGAIN and the buffer is queued and then in l2cap_chan_le_send_resume the same situation happens again because the l2cap_chan_le_send_sdu() and subsequent calls of
l2cap_chan_le_send()->
l2cap_chan_create_seg()->
l2cap_alloc_seg()->
bt_l2cap_create_pdu() fail (with bt_conn: Unable to allocate buffer), the device remains unresponsive or not recover even if the host stops sending data. l2cap also do not try l2cap_chan_le_send_resume() again.

To Reproduce
Build, flash, connect to IPSP sample, try ping -i 0,3 -s 512 2001:db8::1 or ping -i 1 -s 1024 2001:db8::1

Expected behavior
The device should recover after ping flood, even if it does not have enough buffer for the data.

console output

[00:01:26.327,941] <dbg> bt_l2cap.l2cap_chan_le_send_resume: buf 0x20009f90 sent 113                     
[00:01:26.327,972] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009f58 len 10             
[00:01:26.328,002] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 10 credits 50         
[00:01:26.328,033] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 10                        
[00:01:26.328,063] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009f90 len 23             
[00:01:26.328,094] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 49         
[00:01:26.328,125] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23                        
[00:01:26.328,155] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009fe4 len 23     
[00:01:26.328,186] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 48         
[00:01:26.328,216] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23                        
[00:01:26.328,247] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x2000a070 len 23           
[00:01:26.328,277] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 47         
[00:01:26.328,308] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23                        
[00:01:26.328,338] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x2000a08c len 23
[00:01:26.328,369] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 46
[00:01:26.328,369] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.328,430] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x2000a0a8 len 23
[00:01:26.328,460] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 45
[00:01:26.328,460] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.328,521] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x2000a0c4 len 13
[00:01:26.328,521] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 13 credits 44
[00:01:26.328,552] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 13
[00:01:26.328,613] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x2000a054 len 23
[00:01:26.328,643] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 43
[00:01:26.328,643] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.328,704] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009f20 len 23
[00:01:26.328,704] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 42
[00:01:26.328,735] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.328,796] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009ecc len 23
[00:01:26.328,796] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 41
[00:01:26.328,826] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.328,857] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009654 len 23
[00:01:26.328,887] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 40
[00:01:26.328,918] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.328,948] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x2000961c len 23
[00:01:26.328,979] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 39
[00:01:26.328,979] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.329,040] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009638 len 13
[00:01:26.329,071] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 13 credits 38
[00:01:26.329,071] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 13
[00:01:26.329,132] <dbg> bt_l2cap.l2cap_chan_create_seg: ch 0x2000d18c seg 0x20009f74 len 23
[00:01:26.329,132] <dbg> bt_l2cap.l2cap_chan_le_send: ch 0x2000d18c cid 0x0040 len 23 credits 37
[00:01:26.329,162] <dbg> bt_l2cap.bt_l2cap_send_cb: conn 0x20000698 cid 64 len 23
[00:01:26.329,193] <wrn> bt_conn: Unable to allocate buffer with K_NO_WAIT
[00:01:26.329,193] <wrn> bt_conn: Unable to allocate buffer: timeout -1
[00:01:27.971,221] <dbg> bt_l2cap.bt_l2cap_recv: Packet for CID 64 len 22
[00:01:27.971,282] <dbg> bt_l2cap.l2cap_rx_process: chan 0x2000d18c buf 0x20009590
[00:01:27.971,313] <dbg> bt_l2cap.l2cap_chan_le_recv: chan 0x2000d18c len 20 sdu_len 20
[00:01:27.971,313] <dbg> net_bt.ipsp_alloc_buf: (0x20002de0): Channel 0x2000d18c requires buffer
[00:01:27.971,343] <dbg> bt_l2cap.l2cap_chan_le_recv_seg: chan 0x2000d18c seg 1 len 20
[00:01:27.971,374] <dbg> bt_l2cap.l2cap_chan_le_recv_sdu: chan 0x2000d18c len 20

The text was updated successfully, but these errors were encountered:

Vudentz · 2019-11-13T10:53:40Z

It looks like we are not generating a -EAGAIN in case we run out of memory, that would indeed stop us from resuming so the following patch should fix it:

diff --git a/subsys/bluetooth/host/l2cap.c b/subsys/bluetooth/host/l2cap.c
index 24d4b5f261..65292ca715 100644
--- a/subsys/bluetooth/host/l2cap.c
+++ b/subsys/bluetooth/host/l2cap.c
@@ -1181,7 +1181,7 @@ static int l2cap_chan_le_send(struct bt_l2cap_le_chan *ch, struct net_buf *buf,
 
        seg = l2cap_chan_create_seg(ch, buf, sdu_hdr_len);
        if (!seg) {
-               return -ENOMEM;
+               return -EAGAIN;
        }
 
        /* Channel may have been disconnected while waiting for a buffer */

jfischer-no · 2019-11-13T11:01:25Z

@Vudentz I thought so too, but was not successful. I will test it again, maybe I missed something

jfischer-no · 2019-11-13T14:49:34Z

@Vudentz As I mentioned above it is not enough. l2cap debug logging level is to noisy, I switch to info and changed few lines:

diff --git a/subsys/bluetooth/host/l2cap.c b/subsys/bluetooth/host/l2cap.c
index 24d4b5f261..52a70031c7 100644
--- a/subsys/bluetooth/host/l2cap.c
+++ b/subsys/bluetooth/host/l2cap.c
@@ -1181,7 +1181,8 @@ static int l2cap_chan_le_send(struct bt_l2cap_le_chan *ch, struct net_buf *buf,
 
        seg = l2cap_chan_create_seg(ch, buf, sdu_hdr_len);
        if (!seg) {
-               return -ENOMEM;
+               BT_INFO("l2cap_chan_create_seg() failed");
+               return -EAGAIN;
        }
 
        /* Channel may have been disconnected while waiting for a buffer */
@@ -1259,6 +1260,7 @@ static int l2cap_chan_le_send_sdu(struct bt_l2cap_le_chan *ch,
 
                ret = l2cap_chan_le_send(ch, frag, 0);
                if (ret < 0) {
+                       BT_INFO("l2cap_chan_le_send() failed %d", ret);
                        if (ret == -EAGAIN) {
                                /* Store sent data into user_data */
                                memcpy(net_buf_user_data(frag), &sent,
@@ -1299,10 +1301,11 @@ static void l2cap_chan_le_send_resume(struct bt_l2cap_le_chan *ch)
        while ((buf = l2cap_chan_le_get_tx_buf(ch))) {
                int sent = *((int *)net_buf_user_data(buf));
 
-               BT_DBG("buf %p sent %u", buf, sent);
+               BT_INFO("buf %p sent %u", buf, sent);
 
                sent = l2cap_chan_le_send_sdu(ch, &buf, sent);
                if (sent < 0) {
+                       BT_INFO("l2cap_chan_le_send_sdu() failed %d", sent);
                        if (sent == -EAGAIN) {
                                ch->tx_buf = buf;
                        }
@@ -1328,7 +1331,7 @@ static void le_credits(struct bt_l2cap *l2cap, u8_t ident,
        cid = sys_le16_to_cpu(ev->cid);
        credits = sys_le16_to_cpu(ev->credits);
 
-       BT_DBG("cid 0x%04x credits %u", cid, credits);
+       BT_INFO("cid 0x%04x credits %u", cid, credits);
 
        chan = bt_l2cap_le_lookup_tx_cid(conn, cid);
        if (!chan) {
@@ -1892,6 +1895,7 @@ int bt_l2cap_chan_send(struct bt_l2cap_chan *chan, struct net_buf *buf)
        if (err < 0) {
                if (err == -EAGAIN) {
                        /* Queue buffer to be sent later */
+                       BT_INFO("Queue buffer, l2cap_chan_le_send_sdu() failed %d", err);
                        net_buf_put(&(BT_L2CAP_LE_CHAN(chan))->tx_queue, buf);
                        return *((int *)net_buf_user_data(buf));
                }

console:

[00:00:03.365,844] <inf> ipsp: Run IPSP sample
[00:00:03.367,126] <inf> ipsp: Starting to wait
[00:00:12.587,524] <inf> os: prio recv thread stack :   unused 336      usage 112 / 448 (25 %)
[00:00:12.588,012] <inf> os: recv thread stack :        unused 736      usage 288 / 1024 (28 %)
[00:00:12.840,148] <inf> net_config: IPv6 address: 2001:db8::1
[00:00:12.840,209] <inf> net_config: IPv6 address: 2001:db8::1
[00:00:12.889,587] <inf> bt_l2cap: cid 0x0040 credits 3
[00:00:12.890,289] <inf> bt_l2cap: cid 0x0040 credits 2
[00:00:12.890,899] <inf> bt_l2cap: cid 0x0040 credits 2
[00:00:12.938,934] <inf> bt_l2cap: cid 0x0040 credits 1
[00:00:13.838,928] <inf> bt_l2cap: cid 0x0040 credits 1
[00:00:14.838,806] <inf> bt_l2cap: cid 0x0040 credits 1
[00:00:17.888,641] <inf> os: prio recv thread stack :   unused 248      usage 200 / 448 (44 %)
[00:00:18.189,025] <inf> os: recv thread stack :        unused 736      usage 288 / 1024 (28 %)
[00:00:21.841,735] <inf> bt_l2cap: cid 0x0040 credits 2
[00:00:22.141,998] <inf> bt_l2cap: cid 0x0040 credits 3
[00:00:22.343,963] <inf> bt_l2cap: l2cap_chan_le_send() failed -11
[00:00:22.343,963] <inf> bt_l2cap: Queue buffer, l2cap_chan_le_send_sdu() failed -11
[00:00:22.344,207] <inf> bt_l2cap: cid 0x0040 credits 51
[00:00:22.344,543] <inf> bt_l2cap: buf 0x20009f90 sent 113
[00:00:22.344,879] <wrn> bt_conn: Unable to allocate buffer with K_NO_WAIT
[00:00:22.344,879] <wrn> bt_conn: Unable to allocate buffer: timeout -1
[00:00:22.344,879] <inf> bt_l2cap: l2cap_chan_create_seg() failed
[00:00:22.344,909] <inf> bt_l2cap: l2cap_chan_le_send() failed -11
[00:00:22.344,909] <inf> bt_l2cap: l2cap_chan_le_send_sdu() failed -11

l2cap_chan_le_send_resume() is called only once.

When a segment could not be allocated it should be possible to resume sending it later once previous segments complete, the only exception is when there is no previous activity and we are unable to alocate even the very first segment which should indicate to the caller that it would block since that only happens on syswq the caller might need to defer to another thread or resubmit the work. Fixes zephyrproject-rtos#20640 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

When a segment could not be allocated it should be possible to resume sending it later once previous segments complete, the only exception is when there is no previous activity and we are unable to alocate even the very first segment which should indicate to the caller that it would block since that only happens on syswq the caller might need to defer to another thread or resubmit the work. Fixes #20640 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

jfischer-no added bug The issue is a bug, or the PR is fixing a bug area: Bluetooth labels Nov 12, 2019

jfischer-no assigned Vudentz Nov 12, 2019

jhedberg added the priority: medium Medium impact/importance bug label Nov 12, 2019

Vudentz mentioned this issue Nov 13, 2019

Bluetooth: L2CAP: Queue packets when a segment could not be allocated #20661

Merged

jhedberg closed this as completed in #20661 Dec 13, 2019

endian-albin mentioned this issue Apr 17, 2020

Enable date and time syncronization over bluetooth albsod/pinetime-hypnos#18

Merged

alwa-nordic mentioned this issue Feb 11, 2022

Bluetooth: L2CAP: Deadlock when there are no free buffers while transmitting on multiple channels #34600

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bluetooth: l2cap do not recover when faced with long packets and run out of buffers #20640

Bluetooth: l2cap do not recover when faced with long packets and run out of buffers #20640

jfischer-no commented Nov 12, 2019 •

edited

Vudentz commented Nov 13, 2019

jfischer-no commented Nov 13, 2019

jfischer-no commented Nov 13, 2019

Bluetooth: l2cap do not recover when faced with long packets and run out of buffers #20640

Bluetooth: l2cap do not recover when faced with long packets and run out of buffers #20640

Comments

jfischer-no commented Nov 12, 2019 • edited

Vudentz commented Nov 13, 2019

jfischer-no commented Nov 13, 2019

jfischer-no commented Nov 13, 2019

jfischer-no commented Nov 12, 2019 •

edited