Skip to content

Conversation

@LingaoM
Copy link
Contributor

@LingaoM LingaoM commented Jun 21, 2024

The default main branch, cmd complete & cmd status use common normal net_buf pool, will be increase dead-lock, so split it to separate.

This always ensures that the event flow is not blocked.

@zephyrbot zephyrbot added area: Bluetooth area: Bluetooth Host Bluetooth Host (excluding BR/EDR) labels Jun 21, 2024
@LingaoM LingaoM force-pushed the split_pool branch 2 times, most recently from a3ca302 to cd33a8b Compare June 21, 2024 07:39
@LingaoM LingaoM added the Enhancement Changes/Updates/Additions to existing features label Jun 21, 2024
@LingaoM LingaoM force-pushed the split_pool branch 3 times, most recently from f49cf94 to 37d5879 Compare June 21, 2024 08:31
@LingaoM LingaoM force-pushed the split_pool branch 3 times, most recently from 96db9d2 to bdec445 Compare June 21, 2024 09:18
The default main branch, cmd complete & cmd status use common normal
net_buf pool, will be increase dead-lock, so split it to separate.

Signed-off-by: Lingao Meng <menglingao@xiaomi.com>
Copy link
Contributor

@Thalley Thalley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not yet sure whether this is a good change or not.

Do you have any logs or (even better) some tests that show the issue you are trying to solve, which shows that the issue is solved by this PR?

@LingaoM
Copy link
Contributor Author

LingaoM commented Jun 22, 2024

I'm not yet sure whether this is a good change or not.

Do you have any logs or (even better) some tests that show the issue you are trying to solve, which shows that the issue is solved by this PR?

#74287 (comment)

Maybe this answer by @alwa-nordic better.

Comment on lines +116 to +118
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the range and default value here due to the fallback mechanism you have defined?

If we did not fallback to the event buffers in case of error, could this be reduced? Do we have any events larger than ~32 without ext_adv?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have also thought of a similar approach, but considering that almost all configurations now use extended advertising as a default option, the value of this optimization is not high.

Copy link
Contributor Author

@LingaoM LingaoM Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current design will add about 140 bytes of RAM overhead, but it ensures relative safety, so I think it is worth it and change this way will not be noticeable to users.

Comment on lines +120 to +121
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that valid behavior of the controller?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is the command complete for NOP opcode. This is legal to send on boot, and we have a quirk in the host to wait for it, as it is somewhat useful over H4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: #64158

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently normal event pool will also use by extended advertising event report, this is a huge activity, Also other events such as periodic advertising report. So for command complete and command status also use same normal pool not a good way.

Copy link
Contributor

@Thalley Thalley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me, but it looks to me that we can get some significant optimization from the default size of BT_BUF_CMD_COMPLETE_EVT_SIZE

jhedberg
jhedberg previously approved these changes Jun 24, 2024
@LingaoM LingaoM requested a review from Thalley June 25, 2024 01:00
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this reads as "if you use a split build (ie hci on another chip), then you better use 255".
So I propose you also add default 255 if !BT_CTLR. Same remark for the other kconfig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

After zephyrproject-rtos#72135
there are no-used at all, so deleted.

Signed-off-by: Lingao Meng <menglingao@xiaomi.com>
jori-nordic
jori-nordic previously approved these changes Jun 25, 2024
@aescolar aescolar added this to the v4.0.0 milestone Jun 26, 2024
@alwa-nordic
Copy link
Contributor

Do we need a pool for Command Complete? I think we don't need the new pool and can save the memory by putting Command Complete events into the single-buffer sync_evt_pool.

The current implementation only needs one event buffer, since we copy out the data immediately in hci_cmd_done. But even with the changes proposed in #74287, #74287 (comment) says we will never block the buffer.

@LingaoM
Copy link
Contributor Author

LingaoM commented Jun 28, 2024

Do we need a pool for Command Complete? I think we don't need the new pool and can save the memory by putting Command Complete events into the single-buffer sync_evt_pool.

Three reasons to keep a separate pool for command complete events:

  1. Please do not couple command complete event and bluetooth connection, since sync_evt_pool only used with connection, but command will use in any scenes. and also the default size of SYNC_EVT_SIZE less than 68, this does not save memory, but it is more likely to cause dead-lock anyway.
  2. Currently normal event pool will also use by extended advertising event report, this is a huge activity, Also other events such as periodic advertising report. So for command complete and command status also use same normal pool not a good way.
  3. The current design will add about 140 bytes of RAM overhead, but it ensures relative safety, so I think it is worth it and change this way will not be noticeable to users.

@LingaoM
Copy link
Contributor Author

LingaoM commented Jun 28, 2024

The current implementation only needs one event buffer, since we copy out the data immediately in hci_cmd_done. But even with the changes proposed in #74287, #74287 (comment) says we will never block the buffer.

I fully understand that currently only one is supported, but the current configuration is two:

config BT_BUF_CMD_TX_COUNT
	int "Number of HCI command buffers"
	default 2
	range 2 64
	help
	  Number of buffers available for outgoing HCI commands from the Host.

Another advantage of using two here is that we can always ensure that there is always a free buffer for command complete event even in some extreme scenarios, if the Controller reports a message inexplicably, we can still tolerate the error., so I use K_NO_WAIT instead of timeout, to avoiding dead-lock.

@jori-nordic
Copy link
Contributor

but it is more likely to cause dead-lock anyway

What deadlock are you hinting at?

@jori-nordic
Copy link
Contributor

even in some extreme scenarios, if the Controller reports a message inexplicably, we can still tolerate the error., so I use K_NO_WAIT instead of timeout, to avoiding dead-lock

Can you elaborate on this deadlock scenario? What sequence of events/data would get us into this state. Have you seen it in manual testing?

@jori-nordic jori-nordic dismissed their stale review July 1, 2024 08:17

I don't understand the issue this patch is supposed to address.

@LingaoM
Copy link
Contributor Author

LingaoM commented Jul 4, 2024

even in some extreme scenarios, if the Controller reports a message inexplicably, we can still tolerate the error., so I use K_NO_WAIT instead of timeout, to avoiding dead-lock

Can you elaborate on this deadlock scenario? What sequence of events/data would get us into this state. Have you seen it in manual testing?

Please see:#68008 (comment)
In this PR, allocate for command complete events will not block at all, because use k_NO_WAIT.

@LingaoM
Copy link
Contributor Author

LingaoM commented Jul 4, 2024

but it is more likely to cause dead-lock anyway

What deadlock are you hinting at?

#70153

@LingaoM LingaoM closed this Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Bluetooth Host Bluetooth Host (excluding BR/EDR) area: Bluetooth Enhancement Changes/Updates/Additions to existing features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants