Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: BLE 5 data throughput to Linux host #31742

Closed
Oxymoron79 opened this issue Jan 28, 2021 · 17 comments · Fixed by #32740
Closed

Bluetooth: BLE 5 data throughput to Linux host #31742

Oxymoron79 opened this issue Jan 28, 2021 · 17 comments · Fixed by #32740
Assignees
Labels
area: Bluetooth area: USB Universal Serial Bus bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@Oxymoron79
Copy link

Oxymoron79 commented Jan 28, 2021

Describe the bug

I'm trying to maximize the data throughput of a Bluetooth LE 5 Peripheral to a Bluetooth Dongle connected to a
Linux host running BlueZ using two Nordic nRF52840 Dongles.

The firmware of the Bluetooth LE 5 Peripheral and the Dongle are built with zephyr.

The data is transferred using GATT notifications using the Bluetooth LE 5 Data Length Extension and the 2MBit PHY.

I found 2 issues when I increase the GATT notifications rate to about 250 notifications per second (or higher):

  1. The data transmission gets stuck after a few notifications have been transferred. This may be related to the issue
    Bluetooth: Deadlock with TX of ACL data and HCI commands (command blocked by data) #25917
  2. After the data transmission has finished, bluez is unable to disconnect the peripheral. In this case, the bluez service
    has to be restarted and the device has to be removed from the bluez cache.

I am new to using the Zephyr RTOS, so these issues may be caused by some wrong kernel configuration parameters. I tried
different settings but was not successful to resolve the issues.

To Reproduce

To reproduce the behavior I have uploaded the zephyr projects for the peripheral and the central devices to my
GitHub repository with a python script to test the data throughput.

The README describes the build steps and the usage of the test script.

Run the throughput_test.py with these parameters for a notification rate of about 250 notifications per second and
a notification payload size of 240 bytes:
./throughput_test.py -i 4 -l 240 -n 1000

Expected behavior

The test script should finish without exceptions.

Logs and console output

Peripheral console logs some occurrences of bt_gatt_notify returning a error
and the kernel logs a Unable to allocate TX context error when the issue occurs:

data_work_handler: bt_gatt_notify returned: -57
[14:07:44.410,400] <err> bt_conn: Unable to allocate TX context

I also captured the HCI traffic with btmon for the 2 issues and attached the files:

  1. throughput_stuck_i4_l240_n250.log when the notification transmission was stuck.
  2. throughput_disconnect_i4_l240_n200.log when the disconnect failed.

Environment (please complete the following information):

  • OS: Arch Linux, Bluez 5.55, Linux Kernel 5.10.10-arch1-1 SMP PREEMPT
  • Toolchain: Zephyr SDK 0.11.4
  • v2.5.0-rc1
@Oxymoron79 Oxymoron79 added the bug The issue is a bug, or the PR is fixing a bug label Jan 28, 2021
@Oxymoron79
Copy link
Author

@carlescufi
Copy link
Member

carlescufi commented Jan 28, 2021

The firmware of the Bluetooth LE 5 Peripheral and the Dongle are built with zephyr.

So one dongle runs a standalone Zephyr application and the other runs hci_usb?
In fact, can you please give us steps to reproduce the issue?
I see it's well documented in the repo now.

@Oxymoron79
Copy link
Author

So one dongle runs a standalone Zephyr application and the other runs hci_usb?

Yes, exactly.

@carlescufi
Copy link
Member

Could you please try setting:
CONFIG_BT_RX_BUF_LEN=258
on both apps? Note that this is 258, not 255.

@carlescufi
Copy link
Member

Also, in hci_usb's prj.conf, could you uncomment:
#CONFIG_BT_CTLR_TX_BUFFER_SIZE=251

@Oxymoron79
Copy link
Author

Oxymoron79 commented Jan 28, 2021

@carlescufi I applied your suggestions but unfortunately it didn't help. The test got stuck after 71 notifications:

./throughput_test.py -i 4 -l 240 -n 1000
Using Adapter (/org/bluez/hci0, org.bluez.Adapter1)
Discover device hosting the throughput service abcdef00-f5bf-58d5-9d17-172177d1316a for 10 seconds.
Found.
Connect to Device (/org/bluez/hci0/dev_E3_4D_2A_E8_66_06, org.bluez.Device1)
Done.
Check for throughput service abcdef00-f5bf-58d5-9d17-172177d1316a
Found.
Read config characteristic abcdef01-f5bf-58d5-9d17-172177d1316a parameters:
- interval: 100 ms
- data_len: 10 bytes
Set config characteristic abcdef01-f5bf-58d5-9d17-172177d1316a parameters:
- interval: 4 ms
- data_len: 240 bytes
Done.
Receive 1000 notifications from data characteristic abcdef02-f5bf-58d5-9d17-172177d1316a
Notification    1: 240 bytes, timestamp: 0.00132 s, dt: 0.00132 s
Notification    2: 240 bytes, timestamp: 0.00465 s, dt: 0.00333 s
...
Notification   70: 240 bytes, timestamp:   0.328 s, dt: 0.00338 s
Notification   71: 240 bytes, timestamp:   0.331 s, dt: 0.00342 s
<it's stuck here>

Here's the peripheral console log:

Connected                                                                                                 
Connection Information:                                                                                   
* Role: Slave.                                                                                            
* Connection Type: LE                                                                                     
  PHY: TX: 0x01, RX: 0x01.                                                                                
  Data Length: TX Max length: 27, TX Max time: 328us, RX Max length: 27, RX Max time: 328us.              
Update Data Length.                                                                                       
Update 2MBit PHY.                                                                                         
Update Connection Parameters: Interval min: 7ms, Interval max: 400ms, Latency: 0, timeout: 400ms.         
Data Length updated: TX Max length: 251, TX Max time: 1064us, RX Max length: 251, RX Max time: 1064us     
PHY updated: TX: 0x02, RX: 0x02                                                                           
Wrote config:                                                                                             
- interval_ms: 4                                                                                          
- data_length: 240                                                                                        
"Data" Characteristic Notifications got enabled                                                           
[00:00:22.615,661] <err> bt_conn: Unable to allocate TX context 

I also pushed the changes to https://github.com/Oxymoron79/zephyr-ble-peripheral

@carlescufi
Copy link
Member

[00:00:22.615,661] bt_conn: Unable to allocate TX context

@joerchan the number of contexts are not configurable, so I think this is either a problem in the way the application uses the APIs or an actual bug.

@nashif nashif added the priority: medium Medium impact/importance bug label Jan 30, 2021
@carlescufi
Copy link
Member

@Oxymoron79 a few fixes have been merged to master since you last reported here. Could you please try to reproduce the issue again and let us know?

@Oxymoron79
Copy link
Author

@carlescufi I tested it with the commit b6a4db3 on the master branch.

Unfortunately I could reproduce the same issues.

BUT: I started to measure the time delta between the notifications received and noticed that it sometimes easily exceeds 100 milliseconds. I wondered whether submitting the bt_gatt_notify call to a system work item in the timer handler is a good idea.

So I refactored the peripheral to enqueue a notification request into a FIFO in the timer and feed the bt_gatt_notify calls from that FIFO. I pushed this implementation to the notification_fifo branch of my repository, in case you're interested.

With this implementation I couldn't reproduce the issues and was able to get the throughput up to about 640 kbit/second:

$ ./throughput_test.py -i 3 -l 243 -n 10000
...
Notification 9998: 243 bytes, timestamp:    30.2 s, dt: 0.00133 s
Notification 9999: 243 bytes, timestamp:    30.2 s, dt: 0.00139 s
Notification 10000: 243 bytes, timestamp:    30.2 s, dt: 0.00146 s
Summary: Received 2430000 bytes in 10000 notifications during 30.2 seconds: 643.536 kbits/sec.
min dt:   0.133 ms, max dt: 1.03e+02 ms.

Also the error bt_conn: Unable to allocate TX context never occurred with this implementation.

So I think that the issues I encountered were caused by my usage of the API - but the latest commits to master certainly also helped.

I think this issue can be closed.

Thank you for your support with this issue.

@joerchan
Copy link
Contributor

joerchan commented Feb 5, 2021

@Oxymoron79 Thank you for the update.

I'm gonna re-open this issue just so that we can agree on the facts and close it once we all agree that there is nothing that should be fixed here.

The reason for the Unable to allocate TX context is because bt_gatt_notify is called from the system work queue. We cannot block from this thread, since this is the thread that would release them, and blocking would cause a deadlock.

	/* The TX context always get freed in the system workqueue,
	 * so if we're in the same workqueue but there are no immediate
	 * contexts available, there's no chance we'll get one by waiting.
	 */
	if (k_current_get() == &k_sys_work_q.thread) {
		return k_fifo_get(&free_tx, K_NO_WAIT);
	}

@jhedberg @carlescufi Do we agree that this is the behavior, or is the number of TX contexts incorrect?

@Oxymoron79 Can you elaborate on what you meant by "stuck", I think there is some confusion about whether you managed to get the Stack stuck, or if your application just didn't handle the returned error code from bt_gatt_notify?

@joerchan joerchan reopened this Feb 5, 2021
@jfischer-no
Copy link
Collaborator

@Oxymoron79 Can you retest with the changes in #32009, see also bug report #31922, please?

@Oxymoron79
Copy link
Author

@carlescufi With "stuck" I meant that the receiver of the notifications (the application accessing the BlueZ API) no longer receives notifications.

Here's how I will refer to the involved components:

 ------------------       -----------        -----------------------------------
| Peripheral       |     | hci_usb   |      | Kernelspace   | Userspace         |
| App -> BT stack -|-RF->| BT stack -|-USB->| btusb -|-HCI->| BlueZ -> Receiver |
 ------------------       -----------       |           ^---- btmon             |
                                             -----------------------------------

The Peripheral App sets up a timer whose expiry function adds a work item to the system workqueue.
The enqueued work item calls bt_gatt_notify of the BT stack API, logging the error to the console if the function fails:

static void data_work_handler(struct k_work *work)
{
    int err = 0;
    err = bt_gatt_notify(NULL, &service.attrs[3], data, config.data_length);
    if (err != 0)
        printk("data_work_handler: bt_gatt_notify returned: %i\n", err);
}

On the PC side, the Receiver application is receiving the notifications from the BlueZ userspace daemon. To check whether
the BlueZ daemon is working properly, the HCI communcation with the Kernel Bluetooth subsystem is captured with btmon.

When the notification transfer is "stuck", this is what I observed:

  • The Peripheral App doesn't log anything to the console. Some time before the transmission gets "stuck" the BT stack
    of the Peripheral logged the bt_conn: Unable to allocate TX context error:
...
Wrote config:
- interval_ms: 3
- data_length: 243
"Data" Characteristic Notifications got enabled
[00:05:07.898,895] <err> bt_conn: Unable to allocate TX context
  • The Receiver is waiting for the notification from the BlueZ API. There is no timeout or error received from the
    BlueZ daemon:
...
Notification  240: 243 bytes, timestamp:   0.756 s, dt:  0.0014 s
Notification  241: 243 bytes, timestamp:   0.758 s, dt: 0.00142 s
Notification  242: 243 bytes, timestamp:   0.759 s, dt: 0.00134 s
Notification  243: 243 bytes, timestamp:   0.761 s, dt: 0.00143 s
<stuck here>
  • The btmon is no longer capturing any communication:
...
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                         #310 [hci0] 17.212098
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                         #311 [hci0] 17.213525
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
<stuck here>

What is interessting is that when the system is stuck and I send a read request to another characteristic of the
Peripheral (here: the configuration characteristic with the UUID abcdef01-f5bf-58d5-9d17-172177d1316a),
the data characteristic (UUID: abcdef02-f5bf-58d5-9d17-172177d1316a) "wakes up" and transmits a few more
notifications. Eventually the read request receives a Timeout error from the BlueZ API and the BlueZ daemon
disconnects the Peripheral.
During this

  • the Peripheral logs nothing - not even the disconnect command that should be logged by the
    disconnected callback.
  • the Receiver receives a few more notifications and then terminates due to the disconnect:
...
Notification  244: 243 bytes, timestamp: 1.66e+03 s, dt: 1.66e+03 s
Notification  245: 243 bytes, timestamp: 1.66e+03 s, dt: 0.00126 s
Notification  246: 243 bytes, timestamp: 1.66e+03 s, dt:  0.0013 s
Notification  247: 243 bytes, timestamp: 1.66e+03 s, dt: 0.00132 s
Notification  248: 243 bytes, timestamp: 1.66e+03 s, dt:  0.0014 s
Notification  249: 243 bytes, timestamp: 1.66e+03 s, dt: 0.00138 s
Notification  250: 243 bytes, timestamp: 1.66e+03 s, dt: 0.00133 s
Notification  251: 243 bytes, timestamp: 1.66e+03 s, dt: 0.00142 s
Notification  252: 243 bytes, timestamp: 1.66e+03 s, dt:  0.0014 s
Notification  253: 243 bytes, timestamp: 1.66e+03 s, dt: 0.00131 s
<stuck again until disconnect>
  • btmon captures the notifications and the Disconnect command:
....
< ACL Data TX: Handle 0 flags 0x00 dlen 7                                         #312 [hci0] 1672.174660
      ATT: Read Request (0x0a) len 2
        Handle: 0x0012
> HCI Event: Number of Completed Packets (0x13) plen 5                            #313 [hci0] 1672.182105
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                       #314 [hci0] 1672.233094
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                       #315 [hci0] 1672.234431
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                       #316 [hci0] 1672.235810
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                       #317 [hci0] 1672.237161
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                       #318 [hci0] 1672.238564
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                       #319 [hci0] 1672.239999
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                       #320 [hci0] 1672.241357
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                       #321 [hci0] 1672.242757
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                       #322 [hci0] 1672.244164
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                       #323 [hci0] 1672.245505
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2

<stuck here until disconnect>
 
< HCI Command: Disconnect (0x01|0x0006) plen 3                                    #324 [hci0] 1704.302848
        Handle: 0
        Reason: Remote User Terminated Connection (0x13)
> HCI Event: Command Status (0x0f) plen 4                                         #325 [hci0] 1704.305107
      Disconnect (0x01|0x0006) ncmd 1
        Status: Success (0x00)
> HCI Event: Disconnect Complete (0x05) plen 4                                    #326 [hci0] 1704.330136
        Status: Success (0x00)
        Handle: 0
        Reason: Connection Terminated By Local Host (0x16)
@ MGMT Event: Device Disconnected (0x000c) plen 8                             {0x0001} [hci0] 1704.330189
        LE Address: E3:4D:2A:E8:66:06 (Static)
        Reason: Connection terminated by local host (0x02)

After the peripeheral has been disconnected

  1. it can be discovered again with bluetoothctl
  2. a connection can be established, but the peripheral doesn't call the
    connected callback.
  3. a few seconds later, bluetoothctl reports that the services were resolved (without actually creating the DBus
    objects for the services and characteristics).
  4. after that, the device is disconnected again at which time the peripheral logs the error: bt_hci_core: Unable to look up conn with handle 1

The system can only be recovered by restarting the Peripheral.

I can't deduce with certainty which component gets "stuck" in this setup:

  • The Peripheral seems to be the component as it is the only one that needs to be reset to get the system working again.
    IMO, it can get stuck due to:
    • a bug in the BT stack.
    • the application overloading the system workqueue with work items from the timer. This seems to be the case as I am
      able to work around the issue by using a normal FIFO instead of the system workqueue.
  • The Bluetooth dongle running the hci_usb sample application may be the cause.
  • The btusb kernel module may be the cause as well.
  • The BlueZ daemon is unlikely to be the cause as its communication with the kernel driver seems to be fine according to
    the captured messages from btmon.

Sorry for this long post, but I wanted to explain my findings in detail. I hope it helps.

@Oxymoron79
Copy link
Author

Oxymoron79 commented Feb 7, 2021

@jfischer-no I built the Peripheral and the Dongle using your pr-fix-hci_usb branch.

Unfortunately this issue still occurs. Here are the logs:

Peripheral:

Connected                                                                                                 
Connection Information:                                                                                   
* Role: Slave.                                                                                            
* Connection Type: LE                                                                                     
  PHY: TX: 0x01, RX: 0x01.                                                                                
  Data Length: TX Max length: 27, TX Max time: 328us, RX Max length: 27, RX Max time: 328us.              
Update Data Length.                                                                                       
Update 2MBit PHY.                                                                                         
Update Connection Parameters: Interval min: 7ms, Interval max: 400ms, Latency: 0, timeout: 400ms.         
Data Length updated: TX Max length: 251, TX Max time: 1064us, RX Max length: 251, RX Max time: 1064us     
PHY updated: TX: 0x02, RX: 0x02                                                                           
Wrote config:                                                                                             
- interval_ms: 3                                                                                          
- data_length: 243                                                                                        
"Data" Characteristic Notifications got enabled                                                           
[00:00:31.398,315] <err> bt_conn: Unable to allocate TX context 

Receiver application:

$ ./throughput_test.py -i 3 -l 243 -n 1000
Using Adapter (/org/bluez/hci0, org.bluez.Adapter1)
Discover device hosting the throughput service abcdef00-f5bf-58d5-9d17-172177d1316a for 10 seconds.
Found.
Connect to Device (/org/bluez/hci0/dev_E3_4D_2A_E8_66_06, org.bluez.Device1)
Done.
Check for throughput service abcdef00-f5bf-58d5-9d17-172177d1316a
Found.
Read config characteristic abcdef01-f5bf-58d5-9d17-172177d1316a parameters:
- interval: 100 ms
- data_len: 10 bytes
Set config characteristic abcdef01-f5bf-58d5-9d17-172177d1316a parameters:
- interval: 3 ms
- data_len: 243 bytes
Done.
Receive 1000 notifications from data characteristic abcdef02-f5bf-58d5-9d17-172177d1316a
Notification    1: 243 bytes, timestamp: 0.000306 s, dt: 0.000306 s
Notification    2: 243 bytes, timestamp: 0.00166 s, dt: 0.00136 s
Notification    3: 243 bytes, timestamp: 0.00298 s, dt: 0.00132 s
....
Notification  310: 243 bytes, timestamp:   0.959 s, dt: 0.00134 s
Notification  311: 243 bytes, timestamp:   0.961 s, dt: 0.00146 s
Notification  312: 243 bytes, timestamp:   0.962 s, dt: 0.00129 s
<stuck here>

btmon capture:

Bluetooth monitor ver 5.55
= Note: Linux version 5.10.13-arch1-1 (x86_64)                                                   0.482684
= Note: Bluetooth subsystem version 2.22                                                         0.482689
= New Index: 00:00:00:00:00:00 (Primary,USB,hci0)                                         [hci0] 0.482692
= Open Index: 00:00:00:00:00:00                                                           [hci0] 0.482693
= Index Info: 00:00:00:00:00:00 (The Linux Foundation)                                    [hci0] 0.482695
@ MGMT Open: bluetoothd (privileged) version 1.18                                       {0x0001} 0.482698
@ MGMT Command: Start Discovery (0x0023) plen 1                                 {0x0001} [hci0] 15.919294
        Address type: 0x06
          LE Public
          LE Random
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6                             #1 [hci0] 15.919395
        Address: 19:82:E1:FB:AD:71 (Non-Resolvable)
> HCI Event: Command Complete (0x0e) plen 4                                           #2 [hci0] 15.921397
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7                            #3 [hci0] 15.921442
        Type: Active (0x01)
        Interval: 11.250 msec (0x0012)
        Window: 11.250 msec (0x0012)
        Own address type: Random (0x01)
        Filter policy: Accept all advertisement (0x00)
> HCI Event: Command Complete (0x0e) plen 4                                           #4 [hci0] 15.924367
      LE Set Scan Parameters (0x08|0x000b) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2                                #5 [hci0] 15.924408
        Scanning: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4                                           #6 [hci0] 15.927367
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
@ MGMT Event: Command Complete (0x0001) plen 4                                  {0x0001} [hci0] 15.927423
      Start Discovery (0x0023) plen 1
        Status: Success (0x00)
        Address type: 0x06
          LE Public
          LE Random
@ MGMT Event: Discovering (0x0013) plen 2                                       {0x0001} [hci0] 15.927433
        Address type: 0x06
          LE Public
          LE Random
        Discovery: Enabled (0x01)
> HCI Event: LE Meta Event (0x3e) plen 36                                             #7 [hci0] 15.944373
      LE Advertising Report (0x02)
        Num reports: 1
        Event type: Connectable undirected - ADV_IND (0x00)
        Address type: Public (0x00)
        Address: D0:03:4B:1E:A5:80 (Apple, Inc.)
        Data length: 24
        Flags: 0x1a
          LE General Discoverable Mode
          Simultaneous LE and BR/EDR (Controller)
          Simultaneous LE and BR/EDR (Host)
        TX power: 12 dBm
        Company: Apple, Inc. (76)
          Type: Unknown (15)
          Data: c00aa9184c00440b
          Type: Unknown (16)
          Data: 0104
        RSSI: -72 dBm (0xb8)
> HCI Event: LE Meta Event (0x3e) plen 12                                             #8 [hci0] 15.945368
      LE Advertising Report (0x02)
        Num reports: 1
        Event type: Scan response - SCAN_RSP (0x04)
        Address type: Public (0x00)
        Address: D0:03:4B:1E:A5:80 (Apple, Inc.)
        Data length: 0
        RSSI: -71 dBm (0xb9)
@ MGMT Event: Device Found (0x0012) plen 38                                     {0x0001} [hci0] 15.945388
        LE Address: D0:03:4B:1E:A5:80 (Apple, Inc.)
        RSSI: -71 dBm (0xb9)
        Flags: 0x00000000
        Data length: 24
        Flags: 0x1a
          LE General Discoverable Mode
          Simultaneous LE and BR/EDR (Controller)
          Simultaneous LE and BR/EDR (Host)
        TX power: 12 dBm
        Company: Apple, Inc. (76)
          Type: Unknown (15)
          Data: c00aa9184c00440b
          Type: Unknown (16)
          Data: 0104
> HCI Event: LE Meta Event (0x3e) plen 33                                             #9 [hci0] 15.947366
      LE Advertising Report (0x02)
        Num reports: 1
        Event type: Connectable undirected - ADV_IND (0x00)
        Address type: Random (0x01)
        Address: E3:4D:2A:E8:66:06 (Static)
        Data length: 21
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported
        128-bit Service UUIDs (complete): 1 entry
          Vendor specific (abcdef00-f5bf-58d5-9d17-172177d1316a)
        RSSI: -39 dBm (0xd9)
> HCI Event: LE Meta Event (0x3e) plen 16                                            #10 [hci0] 15.948379
      LE Advertising Report (0x02)
        Num reports: 1
        Event type: Scan response - SCAN_RSP (0x04)
        Address type: Random (0x01)
        Address: E3:4D:2A:E8:66:06 (Static)
        Data length: 4
        Name (complete): TP
        RSSI: -38 dBm (0xda)
@ MGMT Event: Device Found (0x0012) plen 39                                     {0x0001} [hci0] 15.948410
        LE Address: E3:4D:2A:E8:66:06 (Static)
        RSSI: -38 dBm (0xda)
        Flags: 0x00000000
        Data length: 25
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported
        128-bit Service UUIDs (complete): 1 entry
          Vendor specific (abcdef00-f5bf-58d5-9d17-172177d1316a)
        Name (complete): TP
> HCI Event: LE Meta Event (0x3e) plen 43                                            #11 [hci0] 15.950372
      LE Advertising Report (0x02)
        Num reports: 1
        Event type: Connectable undirected - ADV_IND (0x00)
        Address type: Public (0x00)
        Address: 54:BD:79:1F:CF:29 (Samsung Electronics Co.,Ltd)
        Data length: 31
        Flags: 0x1a
          LE General Discoverable Mode
          Simultaneous LE and BR/EDR (Controller)
          Simultaneous LE and BR/EDR (Host)
        Company: Samsung Electronics Co. Ltd. (117)
          Data: 420401016154bd791fcf2956bd791fcf28ff443800000000
        RSSI: -64 dBm (0xc0)
> HCI Event: LE Meta Event (0x3e) plen 40                                            #12 [hci0] 15.951373
      LE Advertising Report (0x02)
        Num reports: 1
        Event type: Scan response - SCAN_RSP (0x04)
        Address type: Public (0x00)
        Address: 54:BD:79:1F:CF:29 (Samsung Electronics Co.,Ltd)
        Data length: 28
        Name (short): [TV] Samsung 8 Series (65)
        RSSI: -63 dBm (0xc1)
@ MGMT Event: Device Found (0x0012) plen 73                                     {0x0001} [hci0] 15.951398
        LE Address: 54:BD:79:1F:CF:29 (Samsung Electronics Co.,Ltd)
        RSSI: -63 dBm (0xc1)
        Flags: 0x00000000
        Data length: 59
        Flags: 0x1a
          LE General Discoverable Mode
          Simultaneous LE and BR/EDR (Controller)
          Simultaneous LE and BR/EDR (Host)
        Company: Samsung Electronics Co. Ltd. (117)
          Data: 420401016154bd791fcf2956bd791fcf28ff443800000000
        Name (short): [TV] Samsung 8 Series (65)
@ MGMT Command: Stop Discovery (0x0024) plen 1                                  {0x0001} [hci0] 15.962884
        Address type: 0x06
          LE Public
          LE Random
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2                               #13 [hci0] 15.962950
        Scanning: Disabled (0x00)
        Filter duplicates: Disabled (0x00)
> HCI Event: Command Complete (0x0e) plen 4                                          #14 [hci0] 15.964368
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
@ MGMT Event: Command Complete (0x0001) plen 4                                  {0x0001} [hci0] 15.964418
      Stop Discovery (0x0024) plen 1
        Status: Success (0x00)
        Address type: 0x06
          LE Public
          LE Random
@ MGMT Event: Discovering (0x0013) plen 2                                       {0x0001} [hci0] 15.964431
        Address type: 0x06
          LE Public
          LE Random
        Discovery: Disabled (0x00)
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6                            #15 [hci0] 15.969147
        Address: EF:F9:48:07:CD:80 (Static)
> HCI Event: Command Complete (0x0e) plen 4                                          #16 [hci0] 15.971366
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Add Device To White List (0x08|0x0011) plen 7                      #17 [hci0] 15.971395
        Address type: Random (0x01)
        Address: E3:4D:2A:E8:66:06 (Static)
> HCI Event: Command Complete (0x0e) plen 4                                          #18 [hci0] 15.974366
      LE Add Device To White List (0x08|0x0011) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7                           #19 [hci0] 15.974405
        Type: Passive (0x00)
        Interval: 60.000 msec (0x0060)
        Window: 60.000 msec (0x0060)
        Own address type: Random (0x01)
        Filter policy: Ignore not in white list (0x01)
> HCI Event: Command Complete (0x0e) plen 4                                          #20 [hci0] 15.977359
      LE Set Scan Parameters (0x08|0x000b) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2                               #21 [hci0] 15.977382
        Scanning: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4                                          #22 [hci0] 15.980361
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 33                                            #23 [hci0] 16.051387
      LE Advertising Report (0x02)
        Num reports: 1
        Event type: Connectable undirected - ADV_IND (0x00)
        Address type: Random (0x01)
        Address: E3:4D:2A:E8:66:06 (Static)
        Data length: 21
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported
        128-bit Service UUIDs (complete): 1 entry
          Vendor specific (abcdef00-f5bf-58d5-9d17-172177d1316a)
        RSSI: -39 dBm (0xd9)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2                               #24 [hci0] 16.051437
        Scanning: Disabled (0x00)
        Filter duplicates: Disabled (0x00)
> HCI Event: Command Complete (0x0e) plen 4                                          #25 [hci0] 16.054367
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Create Connection (0x08|0x000d) plen 25                            #26 [hci0] 16.054396
        Scan interval: 60.000 msec (0x0060)
        Scan window: 60.000 msec (0x0060)
        Filter policy: White list is not used (0x00)
        Peer address type: Random (0x01)
        Peer address: E3:4D:2A:E8:66:06 (Static)
        Own address type: Random (0x01)
        Min connection interval: 30.00 msec (0x0018)
        Max connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Min connection length: 0.000 msec (0x0000)
        Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4                                            #27 [hci0] 16.057368
      LE Create Connection (0x08|0x000d) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 31                                            #28 [hci0] 16.152374
      LE Enhanced Connection Complete (0x0a)
        Status: Success (0x00)
        Handle: 0
        Role: Master (0x00)
        Peer address type: Random (0x01)
        Peer address: E3:4D:2A:E8:66:06 (Static)
        Local resolvable private address: 00:00:00:00:00:00 (Non-Resolvable)
        Peer resolvable private address: 00:00:00:00:00:00 (Non-Resolvable)
        Connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Master clock accuracy: 0x07
@ MGMT Event: Device Connected (0x000b) plen 34                                 {0x0001} [hci0] 16.152423
        LE Address: E3:4D:2A:E8:66:06 (Static)
        Flags: 0x00000000
        Data length: 21
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported
        128-bit Service UUIDs (complete): 1 entry
          Vendor specific (abcdef00-f5bf-58d5-9d17-172177d1316a)
< HCI Command: LE Read Remote Used Features (0x08|0x0016) plen 2                     #29 [hci0] 16.152561
        Handle: 0
> HCI Event: LE Meta Event (0x3e) plen 4                                             #30 [hci0] 16.153361
      LE Channel Selection Algorithm (0x14)
        Handle: 0
        Algorithm: #2 (0x01)
> HCI Event: Command Status (0x0f) plen 4                                            #31 [hci0] 16.154361
      LE Read Remote Used Features (0x08|0x0016) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 12                                            #32 [hci0] 16.254384
      LE Read Remote Used Features (0x04)
        Status: Success (0x00)
        Handle: 0
        Features: 0x7f 0x41 0x01 0x00 0x00 0x00 0x00 0x00
          LE Encryption
          Connection Parameter Request Procedure
          Extended Reject Indication
          Slave-initiated Features Exchange
          LE Ping
          LE Data Packet Length Extension
          LL Privacy
          LE 2M PHY
          Channel Selection Algorithm #2
          Minimum Number of Used Channels Procedure
< ACL Data TX: Handle 0 flags 0x00 dlen 7                                            #33 [hci0] 16.255433
      ATT: Exchange MTU Request (0x02) len 2
        Client RX MTU: 517
> HCI Event: LE Meta Event (0x3e) plen 11                                            #34 [hci0] 16.301367
      LE Data Length Change (0x07)
        Handle: 0
        Max TX octets: 251
        Max TX time: 2112
        Max RX octets: 251
        Max RX time: 2120
> HCI Event: Number of Completed Packets (0x13) plen 5                               #35 [hci0] 16.303378
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 7                                            #36 [hci0] 16.453293
      ATT: Exchange MTU Response (0x03) len 2
        Server RX MTU: 247
< ACL Data TX: Handle 0 flags 0x00 dlen 11                                           #37 [hci0] 16.453771
      ATT: Read By Type Request (0x08) len 6
        Handle range: 0x0001-0xffff
        Attribute type: Unknown (0x2b3a)
> HCI Event: Number of Completed Packets (0x13) plen 5                               #38 [hci0] 16.504383
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 9                                            #39 [hci0] 16.553304
      ATT: Error Response (0x01) len 4
        Read By Type Request (0x08)
        Handle: 0x0001
        Error: Attribute Not Found (0x0a)
< ACL Data TX: Handle 0 flags 0x00 dlen 11                                           #40 [hci0] 16.553717
      ATT: Read By Type Request (0x08) len 6
        Handle range: 0x0001-0xffff
        Attribute type: Database Hash (0x2b2a)
> HCI Event: Number of Completed Packets (0x13) plen 5                               #41 [hci0] 16.603384
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 24                                           #42 [hci0] 16.653426
      ATT: Read By Type Response (0x09) len 19
        Attribute data length: 18
        Attribute data list: 1 entry
        Handle: 0x0008
        Value: f587fd9e0c6ecb58f9d709a9f27bead1
< ACL Data TX: Handle 0 flags 0x00 dlen 11                                           #43 [hci0] 16.653921
      ATT: Read By Type Request (0x08) len 6
        Handle range: 0x0009-0xffff
        Attribute type: Database Hash (0x2b2a)
> HCI Event: LE Meta Event (0x3e) plen 6                                             #44 [hci0] 16.701369
      LE PHY Update Complete (0x0c)
        Status: Success (0x00)
        Handle: 0
        TX PHY: LE 2M (0x02)
        RX PHY: LE 2M (0x02)
> HCI Event: LE Meta Event (0x3e) plen 11                                            #45 [hci0] 16.702352
      LE Data Length Change (0x07)
        Handle: 0
        Max TX octets: 251
        Max TX time: 1064
        Max RX octets: 251
        Max RX time: 1064
> HCI Event: Number of Completed Packets (0x13) plen 5                               #46 [hci0] 16.703359
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 9                                            #47 [hci0] 16.753160
      ATT: Error Response (0x01) len 4
        Read By Type Request (0x08)
        Handle: 0x0009
        Error: Attribute Not Found (0x0a)
< ACL Data TX: Handle 0 flags 0x00 dlen 8                                            #48 [hci0] 16.772361
      ATT: Write Request (0x12) len 3
        Handle: 0x0006
          Data: 05
> HCI Event: Number of Completed Packets (0x13) plen 5                               #49 [hci0] 16.803385
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 5                                            #50 [hci0] 16.853157
      ATT: Write Response (0x13) len 0
< ACL Data TX: Handle 0 flags 0x00 dlen 9                                            #51 [hci0] 16.853710
      ATT: Write Request (0x12) len 4
        Handle: 0x0004
          Data: 0200
> HCI Event: Number of Completed Packets (0x13) plen 5                               #52 [hci0] 16.903379
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 5                                            #53 [hci0] 16.953144
      ATT: Write Response (0x13) len 0
< ACL Data TX: Handle 0 flags 0x00 dlen 7                                            #54 [hci0] 16.953643
      ATT: Read Request (0x0a) len 2
        Handle: 0x000b
> HCI Event: Number of Completed Packets (0x13) plen 5                               #55 [hci0] 17.003379
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 7                                            #56 [hci0] 17.053150
      ATT: Read Response (0x0b) len 2
        Value: 5450
< ACL Data TX: Handle 0 flags 0x00 dlen 7                                            #57 [hci0] 17.053632
      ATT: Read Request (0x0a) len 2
        Handle: 0x000d
> HCI Event: Number of Completed Packets (0x13) plen 5                               #58 [hci0] 17.103383
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 7                                            #59 [hci0] 17.153146
      ATT: Read Response (0x0b) len 2
        Value: 0000
< ACL Data TX: Handle 0 flags 0x00 dlen 7                                            #60 [hci0] 17.153619
      ATT: Read Request (0x0a) len 2
        Handle: 0x0012
> HCI Event: Number of Completed Packets (0x13) plen 5                               #61 [hci0] 17.203383
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 8                                            #62 [hci0] 17.253153
      ATT: Read Response (0x0b) len 3
        Value: 64000a
< ACL Data TX: Handle 0 flags 0x00 dlen 10                                           #63 [hci0] 17.256246
      ATT: Write Request (0x12) len 5
        Handle: 0x0012
          Data: 0300f3
> HCI Event: Number of Completed Packets (0x13) plen 5                               #64 [hci0] 17.303367
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 5                                            #65 [hci0] 17.353146
      ATT: Write Response (0x13) len 0
< ACL Data TX: Handle 0 flags 0x00 dlen 9                                            #66 [hci0] 17.356010
      ATT: Write Request (0x12) len 4
        Handle: 0x0015
          Data: 0100
> HCI Event: Number of Completed Packets (0x13) plen 5                               #67 [hci0] 17.403368
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 5                                            #68 [hci0] 17.453145
      ATT: Write Response (0x13) len 0
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                          #69 [hci0] 17.454869
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                          #70 [hci0] 17.456279
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                          #71 [hci0] 17.457622
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2

...

> ACL Data RX: Handle 0 flags 0x02 dlen 250                                         #378 [hci0] 18.414089
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                         #379 [hci0] 18.415506
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
> ACL Data RX: Handle 0 flags 0x02 dlen 250                                         #380 [hci0] 18.416869
      ATT: Handle Value Notification (0x1b) len 245
        Handle: 0x0014
          Data: 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2
<stuck here>

@carlescufi
Copy link
Member

The reason for the Unable to allocate TX context is because bt_gatt_notify is called from the system work queue. We cannot block from this thread, since this is the thread that would release them, and blocking would cause a deadlock.

@Oxymoron79 could you please move the call to bt_gatt_notify() out of the system workqueue and then see if this still reproduces the issue?

@Oxymoron79
Copy link
Author

The reason for the Unable to allocate TX context is because bt_gatt_notify is called from the system work queue. We cannot block from this thread, since this is the thread that would release them, and blocking would cause a deadlock.

@Oxymoron79 could you please move the call to bt_gatt_notify() out of the system workqueue and then see if this still reproduces the issue?

@carlescufi You mean to use a separate workqueue to call bt_gatt_notify() in, right? I did that and it indeed solves the issue.

I put that version in the notification_workqueue branch of my repository.

@carlescufi
Copy link
Member

@carlescufi You mean to use a separate workqueue to call bt_gatt_notify() in, right? I did that and it indeed solves the issue.

Right, so this probably requires a fix in the documentation. @joerchan will send a PR.

I put that version in the notification_workqueue branch of my repository.

Can we close this GitHub issue now?

@joerchan
Copy link
Contributor

joerchan commented Mar 1, 2021

Can we close this GitHub issue now?

Assuming that the "stuck" was because the error code returned (data_work_handler: bt_gatt_notify returned: -57) wasn't handled then I think this can be closed with the documentation update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth area: USB Universal Serial Bus bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants