Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ATT Notification buffer not released after reconnection #47649

Closed
carlst opened this issue Jul 11, 2022 · 9 comments · Fixed by #48395
Closed

ATT Notification buffer not released after reconnection #47649

carlst opened this issue Jul 11, 2022 · 9 comments · Fixed by #48395
Assignees
Labels
area: Bluetooth Host area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@carlst
Copy link

carlst commented Jul 11, 2022

Describe the bug
A buffer allocated by Bluetooth ATT (and passed on to conn) but not yet sent when a disconnect occurs is never freed.

Please also mention any information which could help others to understand
the problem you're facing:
-Added BT_DBG messages to tx_meta_data_alloc and tx_meta_data_free to track buffer allocation count

To Reproduce
The issue occurs on a custom application which is sending several Notifications per second in an environment where disconnects occur every few minutes (perhaps electrically noisy environment).

Expected behavior
Either the buffer should be freed upon disconnect, or else freed after reconnect and send operation.

Impact
Annoyance. Workaround is currently to reboot system on disconnect.

Logs and console output
In boflo2.log, the first 2 disconnects occur when no Notification data is pending. In this case the buffer alloc and free counts are either 0 or 1 as expected (see log lines after 'Client MTU 512'). Subsequent 2 disconnects occur when Notification data is pending. In these cases, the buffer alloc count increases.

In boflo3.log (after more disconnects), the system eventually hangs (no response to ATT Client MTU request).

Environment (please complete the following information):

boflo2.log
boflo3.log

@carlst carlst added the bug The issue is a bug, or the PR is fixing a bug label Jul 11, 2022
@carlescufi
Copy link
Member

@carlst thanks for reporting. Is this a regression? Or you haven't tried older Zephyr versions?

@carlst
Copy link
Author

carlst commented Jul 12, 2022

I have not tried older Zephyr versions. I have reviewed parts of the conn module handling disconnects, and will add further logging to gather more details.

@carlst
Copy link
Author

carlst commented Jul 13, 2022

I added further logging to bluetooth/host/conn.c and captured a send_buf operation followed by a disconnect in log file boflo4_47649

The added log entries (see send_frag 2 failed and send_buf: failed) indicate failure to completely send the buffer. As the connection proceeds through disconnect state, it is marked for cleanup (the added log entries bt_conn_set_state: set cleanup : old_state disconnect-complete and conn_cleanup: enter 1) but no callback to ATT to free its buffers (e.g. conn_tx_destroy?) occurs.

Please let me know if I can provide any further information.

boflo4_47649.log
.

@jori-nordic
Copy link
Contributor

jori-nordic commented Jul 15, 2022

Hmm I thought it was resolved, but I guess I didn't look hard enough. This very much looks like #43718
Edit: FYI @hermabe

@hermabe
Copy link
Member

hermabe commented Jul 15, 2022

Hmm I thought it was resolved, but I guess I didn't look hard enough. This very much looks like #43718

Yes, most likely missing a call to conn_tx_destroy() somewhere. I am also wondering why the buffers are not freed on disconnect. conn_cleanup() should be called on disconnect and free all buffers.

jori-nordic added a commit to jori-nordic/zephyr that referenced this issue Jul 15, 2022
Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
@jori-nordic
Copy link
Contributor

@carlst I'm having some trouble reproducing it, do you mind uploading the build/zephyr/.config file so I can use your exact configuration ? If you have proprietary Kconfigs you can just grep any lines that contain CONFIG_BT, that will be enough.

Also, do you do anything on (re)connection before sending the notifications ? e.g. MTU exchange, encryption, stuff like that.

@carlst
Copy link
Author

carlst commented Jul 20, 2022

@jori-nordic Uploaded as config_zephyr.log to (to avoid OS and browser limitations). Let me know if you need anything else.

Yes, the issue is difficult to reproduce. It requires Notification data to be in flight when a disconnect occurs. i.e. data is handed over from ATT/L2CAP buffer to Bluetooth host but network buffer(s) not fully sent. Using long Notifications (e.g. 150 bytes) with short LL data lengths (e.g. default of 27) seems to help trigger it (multiple transfers during connection event). To increase odds, I also run with the external antenna of the sender disconnected so the signal is very weak.

Upon connection, there is a request to change the MTU (to 512). No encryption.

The peripheral also makes a Connection Parameter Update Request over the L2CAP signaling channel, but only after the connection exists for ~5 seconds.

config_zephyr.log
.

@jori-nordic
Copy link
Contributor

jori-nordic commented Jul 28, 2022

@carlst thanks to the config, I think I managed to reproduce it (using babblesim, which is pretty neat btw) here: main...jori-nordic:zephyr:att-buffer-leak .

I opened a proper PR to fix it. Thanks for the bug report !

edit: just thought about it, could you try to reproduce your issue with the patch from #48395 applied ? Then you can add a comment to the PR directly if that really fixes the issue.

@carlst
Copy link
Author

carlst commented Jul 28, 2022

@jori-nordic I can no longer reproduce the issue with #48395 applied. And I verified the patch code was called (and the in-flight buffer freed by ATT). A nice, clean fix.

Babblesim is indeed quite good; I have not used it much where timing is important but may do so in the future.

jori-nordic added a commit to jori-nordic/zephyr that referenced this issue Jul 29, 2022
Symptoms were that a disconnect happening when a buffer was enqueued but
not sent resulted in a meta-data memory leak.

This is problematic because it seemed (as per the OP) that it resulted in
the whole host getting locked up when a device sending long L2CAP packets
was prone to disconnect-reconnect cycles (e.g. in a bad RF environment).

Fixes zephyrproject-rtos#47649

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
fabiobaltieri pushed a commit that referenced this issue Aug 1, 2022
Symptoms were that a disconnect happening when a buffer was enqueued but
not sent resulted in a meta-data memory leak.

This is problematic because it seemed (as per the OP) that it resulted in
the whole host getting locked up when a device sending long L2CAP packets
was prone to disconnect-reconnect cycles (e.g. in a bad RF environment).

Fixes #47649

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
jeremybettis pushed a commit to jeremybettis/zephyr that referenced this issue Aug 4, 2022
Symptoms were that a disconnect happening when a buffer was enqueued but
not sent resulted in a meta-data memory leak.

This is problematic because it seemed (as per the OP) that it resulted in
the whole host getting locked up when a device sending long L2CAP packets
was prone to disconnect-reconnect cycles (e.g. in a bad RF environment).

Fixes zephyrproject-rtos#47649

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
theob-pro pushed a commit that referenced this issue Jun 1, 2023
Symptoms were that a disconnect happening when a buffer was enqueued but
not sent resulted in a meta-data memory leak.

This is problematic because it seemed (as per the OP) that it resulted in
the whole host getting locked up when a device sending long L2CAP packets
was prone to disconnect-reconnect cycles (e.g. in a bad RF environment).

Fixes #47649

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
(cherry picked from commit 00709e5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Host area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants