Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: host: bt_unpair failed because function [bt_conn_set_state] wont work as expected #38853

Closed
briliant-ben opened this issue Sep 27, 2021 · 3 comments
Assignees
Labels
area: Bluetooth Host area: Bluetooth bug The issue is a bug, or the PR is fixing a bug

Comments

@briliant-ben
Copy link

briliant-ben commented Sep 27, 2021

Describe the bug
When a central wants to disconnect from a peripheral with unpair(), it wont work as expected because the bt_conn_set_state() is not a thread-safe function, I think.

First, the function conn_disconnect(), it has two steps, the former will send a disconenct HCI cmd, th later will set connection state, while between them, if the HCI diconnect EVT comes and rx_thread has higher priority, the conn->state may will experience a flow: BT_CONN_CONNECTED -> BT_CONN_DISCONNECT_COMPLETE -> BT_CONN_DISCONNECTED -> BT_CONN_DISCONNECT.

Then, if this thread switch happens in bt_conn_set_state(), we wont get the right state as expected and do bt_conn_ref(conn), would cause a failure when connect to this peripheral next time with err log "Found valid connection in disconnect state".

static int conn_disconnect(struct bt_conn *conn, uint8_t reason)
{
	int err;

	err = bt_hci_disconnect(conn->handle, reason);
	if (err) {
		return err;
	}

	if (conn->state == BT_CONN_CONNECTED) {   // <-- the precondition BT_CONN_CONNECTED
		bt_conn_set_state(conn, BT_CONN_DISCONNECT);
	}

	return 0;
}

void bt_conn_set_state(struct bt_conn *conn, bt_conn_state_t state)
{
	bt_conn_state_t old_state;

	BT_DBG("%s -> %s", state2str(conn->state), state2str(state));

	if (conn->state == state) {
		BT_WARN("no transition %s", state2str(state));
		return;
	}

	old_state = conn->state;   //<-- get the state again, maybe it has already been modified
	conn->state = state;
	// so following flow wont work correctly
	......
}

To Reproduce
Steps to reproduce the behavior:

  1. make menuconfig and make sure the rx_thread has higher priority than tx_thread,
  2. let central connects to a peripheral,
  3. Here I used a shell cmd like 'bt clear 00:E0:4C:09:E1:CE public' to disconnect.
  4. when reconnect, will have "Connection failed (-22)"

Expected behavior
Maybe the conn->state experience a flow: BT_CONN_CONNECTED -> BT_CONN_DISCONNECT --> BT_CONN_DISCONNECT_COMPLETE -> BT_CONN_DISCONNECTED.

Impact
annoyance: reconnect failed.

Logs and console output
NOTE: it`s not a standard zephyr console
we will have OK flow log like this:

[2021/9/27 9:48:30] ATBZ=bt,clear,00:E0:4C:09:E1:CE,public
......
[2021/9/27 9:48:30] [BT][DBG][bt_conn][bt_conn_set_state]	connected -> disconnect-complete
......
[2021/9/27 9:48:30] [BT][DBG][bt_conn][bt_conn_set_state]	disconnect-complete -> disconnected
......
[2021/9/27 9:48:30] Disconnected: 00:E0:4C:09:E1:CE (public) (reason 0x16)
[2021/9/27 9:48:30] [BT][DBG][bt_conn][bt_conn_unref]	handle 17 ref 4 -> 3
[2021/9/27 9:48:30] [BT][DBG][bt_conn][bt_conn_unref]	handle 17 ref 3 -> 2
[2021/9/27 9:48:30] [BT][DBG][bt_conn][bt_conn_unref]	handle 0 ref 2 -> 1
[2021/9/27 9:48:30] [BT][DBG][bt_hci_core][hci_rx_thread]	calling fifo_get_wait
[2021/9/27 9:48:30] [BT][DBG][bt_hci_core][hci_rx_thread]	buf ->10010d68 type 1 len 12
[2021/9/27 9:48:30] [BT][DBG][bt_hci_core][hci_event]	event 0xff
[2021/9/27 9:48:30] [BT][DBG][bt_hci_core][hci_rx_thread]	calling fifo_get_wait
[2021/9/27 9:48:30] d_send_sync]	rsp ->1000dbbc opcode 0x0406 len 0
[2021/9/27 9:48:30] [BT][DBG][bt_conn][bt_conn_unref]	handle 0 ref 1 -> 0
[2021/9/27 9:48:30] [BT][DBG][bt_keys][bt_keys_clear]	00:E0:4C:09:E1:CE (public) (keys 0x001d)
[2021/9/27 9:48:30] [BT][DBG][bt_id][bt_id_del]	addr 00:E0:4C:09:E1:CE (public)
[2021/9/27 9:48:30] [BT][DBG][bt_gatt][find_sc_cfg]	id: 0, addr: 00:E0:4C:09:E1:CE (public)
[2021/9/27 9:48:30] Pairing successfully cleared
......
[2021/9/27 9:48:47] ATBZ=bt,connect,00:E0:4C:09:E1:CE,public
[2021/9/27 9:48:47] [BT][DBG][bt_keys][bt_keys_find_irk]	00:E0:4C:09:E1:CE (public)
[2021/9/27 9:48:47] [BT][DBG][bt_conn][bt_conn_set_state]	disconnected -> connect-scan
[2021/9/27 9:48:47] [BT][DBG][bt_conn][bt_conn_ref]	handle 0 ref 1 -> 2
[2021/9/27 9:48:47] [BT][DBG][bt_conn][bt_conn_ref]	handle 0 ref 2 -> 3
[2021/9/27 9:48:47] [BT][DBG][bt_conn][bt_conn_unref]	handle 0 ref 3 -> 2
[2021/9/27 9:48:47] [BT][DBG][bt_conn][bt_conn_ref]	handle 0 ref 2 -> 3
......
[2021/9/27 9:48:49] Connection pending
......

and have FAIL flow log like this:

[2021/9/27 9:49:20] ATBZ=bt,clear,00:E0:4C:09:E1:CE,public
......
[2021/9/27 9:49:20] [BT][DBG][bt_conn][bt_conn_set_state]	connected -> disconnect-complete
......
[2021/9/27 9:49:20] [BT][DBG][bt_conn][bt_conn_set_state]	disconnect-complete -> disconnected
.....
[2021/9/27 9:49:20] Disconnected: 00:E0:4C:09:E1:CE (public) (reason 0x16)
[2021/9/27 9:49:20] [BT][DBG][bt_conn][bt_conn_unref]	handle 17 ref 4 -> 3
[2021/9/27 9:49:20] [BT][DBG][bt_conn][bt_conn_unref]	handle 17 ref 3 -> 2
[2021/9/27 9:49:20] [BT][DBG][bt_conn][bt_conn_unref]	handle 0 ref 2 -> 1
[2021/9/27 9:49:20] [BT][DBG][bt_hci_core][hci_rx_thread]	calling fifo_get_wait
[2021/9/27 9:49:20] [BT][DBG][bt_hci_core][hci_rx_thread]	buf ->10010d84 type 1 len 12
[2021/9/27 9:49:20] [BT][DBG][bt_hci_core][hci_event]	event 0xff
[2021/9/27 9:49:20] [BT][DBG][bt_hci_core][hci_rx_thread]	calling fifo_get_wait
[2021/9/27 9:49:20] [BT][DBG][bt_conn][bt_conn_set_state]	connected -> disconnect
[2021/9/27 9:49:20] [BT][DBG][bt_conn][bt_conn_ref]	handle 0 ref 1 -> 2
[2021/9/27 9:49:20] [BT][DBG][bt_conn][bt_conn_unref]	handle 0 ref 2 -> 1
[2021/9/27 9:49:20] [BT][DBG][bt_keys][bt_keys_clear]	00:E0:4C:09:E1:CE (public) (keys 0x001d)
[2021/9/27 9:49:20] [BT][DBG][bt_id][bt_id_del]	addr 00:E0:4C:09:E1:CE (public)
[2021/9/27 9:49:20] [BT][DBG][bt_gatt][find_sc_cfg]	id: 0, addr: 00:E0:4C:09:E1:CE (public)
[2021/9/27 9:49:20] Pairing successfully cleared
......
[2021/9/27 9:49:37] ATBZ=bt,connect,00:E0:4C:09:E1:CE,public
[2021/9/27 9:49:37] [BT][DBG][bt_conn][bt_conn_ref]	handle 0 ref 1 -> 2
[2021/9/27 9:49:37] [BT][WRN][bt_conn][bt_conn_exists_le]	Found valid connection in disconnect state
[2021/9/27 9:49:37] [BT][DBG][bt_conn][bt_conn_unref]	handle 0 ref 2 -> 1
[2021/9/27 9:49:37] Connection failed (-22)
....

Environment (please complete the following information):
Not a standard zephyr development environement

Additional context

  • If I am right, I think, the state-machine of connection state shouldn't directly use bt_conn_set_state, maybe we should
    • use a lock to protect conn->state transition ?
    • put conn->state transition in a single thread?
    • now i just modified function conn_disconnect like this to avoid the problem (NOT SURE):
static int conn_disconnect(struct bt_conn *conn, uint8_t reason)
{
	int err;

	if (conn->state == BT_CONN_CONNECTED) {
		bt_conn_set_state(conn, BT_CONN_DISCONNECT);
	}

	err = bt_hci_disconnect(conn->handle, reason);
	if (err) {
		return err;
	}

	return 0;
}
  • a little bug I found (maybe), put here, so a new issue wont be needed:
static int bt_smp_br_accept(struct bt_conn *conn, struct bt_l2cap_chan **chan)
{
......
	for (i = 0; i < ARRAY_SIZE(bt_smp_pool); i++) {   // <-- bt_smp_pool
		struct bt_smp_br *smp = &bt_smp_br_pool[i]; // <-- bt_smp_br_pool
......
}
@briliant-ben briliant-ben added the bug The issue is a bug, or the PR is fixing a bug label Sep 27, 2021
@briliant-ben briliant-ben changed the title Bluetooth: host: bt_unpair failed due to function [bt_conn_set_state] wont work as expected Bluetooth: host: bt_unpair failed because function [bt_conn_set_state] wont work as expected Sep 27, 2021
@joerchan
Copy link
Contributor

make menuconfig and make sure the rx_thread has higher priority than tx_thread,

We have this:

/* The Bluetooth subsystem requires the Tx thread to execute at higher priority
 * than the Rx thread as the Tx thread needs to process the acknowledgements
 * before new Rx data is processed. This is a necessity to correctly detect
 * transaction violations in ATT and SMP protocols.
 */
BUILD_ASSERT(CONFIG_BT_HCI_TX_PRIO < CONFIG_BT_RX_PRIO);

Are you even able to build it like this?

@briliant-ben
Copy link
Author

OH, I made a mistake.

Thanks for your help!

I haven't studied the zephyr kernel scheduling mechanism yet, if there wont be a thread switching in bt_conn_set_state() (in conn_disconnect), then it wont fail.

@carlescufi
Copy link
Member

Thanks for replying back. I will close this, but feel free to comment further if you need help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Host area: Bluetooth bug The issue is a bug, or the PR is fixing a bug
Projects
None yet
Development

No branches or pull requests

4 participants