Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: controller: Fix premature connection event close #28089

Merged

Conversation

cvinayak
Copy link
Contributor

@cvinayak cvinayak commented Sep 6, 2020

Fixes the following:

  1. Fix premature connection event close due to the new Tx buffers not being de-multiplexed and routed to connection's Lower Link Layer context when they arrive while being inside the connection's radio event.

  2. Fix central role event prepare to demux and enqueue two Tx buffers so that MD bit can be set correctly.

  3. Add force MD bit feature wherein connection events can be extended to match the rate at which applications provide new Tx data.
    MD bit in Tx PDUs is forced on if there has been CONFIG_BT_CTLR_TX_BUFFERS count number times Tx/Rx happened in a connection event. The assumption here is, if controller's tx buffers are full, it is probable that the application would send more Tx data during the connection event, and keeping the connection event alive will help improve overall throughput.

  4. Added feature to in-system measure incoming Tx throughput.

  5. Added automatic runtime calculation of Forced MD bit count based on incoming Tx buffer throughput.

Fixes #27981.

Signed-off-by: Vinayak Kariappa Chettimada vich@nordicsemi.no

@cvinayak
Copy link
Contributor Author

cvinayak commented Sep 6, 2020

@jjamesson FYI

Please try the following Kconfig options in your hci_uart application:
CONFIG_BT_CTLR_TX_BUFFERS=5
CONFIG_BT_CTLR_ADVANCED_FEATURES=y
CONFIG_BT_CTLR_FORCE_MD_COUNT=7

Below is the throughput test of a nrf52840 hci_uart board connected to nRF52840 throughput sample connected each other using 5 wires, and a UART at 1Mbps baud rate:
image

No. of times to force MD bit to be set in Tx PDU after a successful
transmission of non-empty PDU.

This will prolong the connection event to from being closed in cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to allow runtime reconfiguration of this value? When it's non zero does it mean higher power consumption for all connections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to allow runtime reconfiguration of this value?

I cant think of a easy way for this value to be determined at runtime.
If the rate of Tx buffer supplied is 2.6 ms, and a BLE Trx for empty PDU takes 388us, this number shall be higher than ~7.

When it's non zero does it mean higher power consumption for all connections?

Yes, power consumption per connection event with non-empty PDU will be higher. I will make this feature kick in if Trx count in a radio event is BT_CTLR_TX_BUFFERS or more, this way only high throughput scenario will be affected.

@@ -50,6 +50,11 @@ static uint16_t trx_cnt;
static uint8_t mic_state;
#endif /* CONFIG_BT_CTLR_LE_ENC */

#if defined(CONFIG_BT_CTLR_FORCE_MD_COUNT) && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i am big fan of avoiding preprocessor i would do something like

#if defined(CONFIG_BT_CTLR_FORCE_MD_COUNT) && (CONFIG_BT_CTLR_FORCE_MD_COUNT > 0)
#define FORCE_MD_COUNT CONFIG_BT_CTLR_FORCE_MD_COUNT 
#else
#define FORCE_MD_COUNT 0
#endif

And use that everywhere else as C value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will reduce the conditional compile noise.

@cvinayak cvinayak force-pushed the github_hci_uart_throughput branch 2 times, most recently from 2928fb7 to 3bf04a7 Compare September 7, 2020 12:06
@jjamesson
Copy link

jjamesson commented Sep 7, 2020

Hi, @cvinayak,

I have tried the contents of this PR with my embedded platform (see thread #27981), and indeed the throughput increased for me. However, I am unable to match the speeds that you have demonstrated.

When previously I had throughputs of ~ 410 kbps and ~100 kbps (with connection intervals of 6 units and 31 units respectively), I now achieve throughputs of ~570 kbps and ~360 kbps. This is definitely better, but I'm still expecting this number could go higher.

I did some troubleshooting, and I see that the controller occasionally utilizes the connection interval pretty well, and then again at times, starts stuttering again when sending back ACKs. See the image below, where I use a connection interval of 31 units.

With regards,
James

osc5

@cvinayak cvinayak added this to the v2.4.0 milestone Sep 7, 2020
@cvinayak
Copy link
Contributor Author

cvinayak commented Sep 7, 2020

@jjamesson I suspect the rate of new Tx buffers made available is not fast enough, hence BLE connection events are closed. The feature to keep alive using CONFIG_BT_CTLR_FORCE_MD_COUNT=7 may need an increased count. Is there a way you can measure the time your uC takes to send 2 Tx packets over UART? Example, if a Tx packet take 2.6 ms over UART, then 2.6 ms / 388 us (Transmission of empty PDUs over 2M Phy) gives the number ~7. Based on your measurement increase the number.

I have updated the PR, please fetch latest for your testing.

@jjamesson
Copy link

jjamesson commented Sep 7, 2020

Is there a way you can measure the time your uC takes to send 2 Tx packets over UART? Example, if a Tx packet take 2.6 ms over UART, then 2.6 ms / 388 us (Transmission of empty PDUs over 2M Phy) gives the number ~7.

@cvinayak,

Yes, I can easily view the TX times via an oscilloscope. I see that one TX packet does indeed take 2.6 ms to transfer, and for two sequential packets the transfer time is 5.2 ms (without an ACK in between). Thus I would assume that CONFIG_BT_CTLR_FORCE_MD_COUNT=7 would be suitable.

I did try values 8 and 9 as well, without any increase in throughput. Having CONFIG_BT_CTLR_FORCE_MD_COUNT=9 resulted in a throughput decrease.

PS. Perhaps it was meant this way, but the latest commit you did result in a throughput decrease in general, whereas I am able to achieve a throughput of 478 kbps with this SHA. With the latest but one commit, this was 568 kbps.

@jjamesson I suspect the rate of new Tx buffers made available is not fast enough, hence BLE connection events are closed.

Please note the image displayed previously. The green signal represents the ACKs from the controller. As I see it, in some places, the ACKs/time is not very frequent, but then all of the sudden the controller starts doing faster bursts. Would this be expected if the number of TX buffers is 0?

With regards,
James

@cvinayak
Copy link
Contributor Author

cvinayak commented Sep 8, 2020

Hi @jjamesson

PS. Perhaps it was meant this way, but the latest commit you did result in a throughput decrease in general, whereas I am able to achieve a throughput of 478 kbps with this SHA. With the latest but one commit, this was 568 kbps.

The commit added was ae42af0
I forgot to cherry-pick into this PR when originally pushed. This commit removes the redundant faulty design when adding the fix in 9132f46

Please note the image displayed previously. The green signal represents the ACKs from the controller. As I see it, in some places, the ACKs/time is not very frequent, but then all of the sudden the controller starts doing faster bursts. Would this be expected if the number of TX buffers is 0?

Reasons for connection events to close:

  • No more data to Tx to peer and no more Rx data from peer
  • Two consecutive packet reception with CRC error (Radio channel collision, other 2.4 GHz transmissions around/noise)
  • Peer device closed connection event (Peer radio scheduling, BR/EDR/WIFI chips co-existency, share radio timeline/antenna)
  • Close to next connection interval

My throughput numbers are with nRF52dk_nrf52832 with throughput sample from NCS repository, as the peer. This application will not close connection event if received PDU's have MD bit set. With the > 690Kbps throughput reported by the application, I am assuming I dont have CRC errors.

If you can provide me nRF sniffer log for your latest throughput testing, I can conclude on the reason for your bursts.

-Vinayak

@jjamesson
Copy link

Hi, @cvinayak,

The commit added was ae42af0 I forgot to cherry-pick into this PR when originally pushed. This commit removes the redundant faulty design when adding the fix in 9132f46.

When I tried this, I cloned your repo and checked out the github_hci_uart_throughput branch, so I was referencing to your 2 latest commits in your branch, namely f6e16bc5 and a922a9e4.

... I can conclude on the reason for your bursts.

I see that the bursts were present in the a922a9e4 commit, and I'm not able to reproduce them with the latest f6e16bc5 commit. The incoming ACKs are more controlled, but then again, the throughput is decreased by the absence of these bursts, and I still get few ACKs per connection interval.

Here's the NRF sniffer log for connection interval of 31. Maybe this uncovers something.

With regards,
James

@cvinayak cvinayak marked this pull request as draft September 8, 2020 10:54
@cvinayak
Copy link
Contributor Author

cvinayak commented Sep 8, 2020

image

@jjamesson from the sniffer trace, new Tx packet was not made available within the keep alive Forced MD bit count of time (I observe ~3.9 ms between packet no. 2712 to 2727). The next connection event did not have all its BT_CTLR_TX_BUFFERS filled, hence the keep alive logic did not fire). Similar pattern around packet 492 to 507 etc...

I am on slack with nickname vich, and available in the Zephyr's Bluetooth channel. In case our timezone overlap, we can discuss live.

@carlescufi
Copy link
Member

@jjamesson please use this link to join Slack if you would like to discuss this in real-time: https://tinyurl.com/y5glwylp
Make sure you join the #bluetooth channel.

@cvinayak
Copy link
Contributor Author

cvinayak commented Sep 9, 2020

Highlights of discussion in slack:

14h
carlesc  I see that you are using a 52810. Shouldn't make a difference, just noting divergences with our own test setup
14h
carlesc  yes, the screenshots you post do seem to show the host sending in lockstep with the num completes
14h
carlesc  but your connection intervals are being cut short, which is surprising if you are using @vich’s patch from the PR
14h
carlesc  and what is even more surprising is that I look at your on-air trace and, as @vich says, the controller is sitting there sending empty packets and keeping the connection event alive waiting for TX data
14h
carlesc  so there's something here escaping us and that we could not reproduce in our own setup
14h
carlesc  are you sure your Host is sending as much data as the num complete value contains? Remember that 1 num complete event may be acking multiple data packets (edited) 
14h
carlesc  (that's what we would see with a logic analyzer that decodes UART btw)
14h
James Jamesson  

    I don't have a logic analyzer, but I may find an oscilloscope that has one built in, maybe this could be of use.
    Could you elaborate more on note about the connection interval being short? Are you looking at the NRF sniffer trace or the OSC image?
    I see that @vich has added one more commit 3h ago, I haven't tested with this yet.
    I have checked that indeed it ACKs 1 message at a time (although I haven't browsed through the entire throughput test). Moreover, I use ACKs to count the number of available controller buffers, and I should run to a stall pretty quickly if this wasn't the case.

14h
carlesc  @James Jamesson I'll get back to you in a few
14h
vich  Point 3, you don’t need the latest commit, it is an unrelated optimisation
14h
vich  Point 4, so when your test completes you have all the num complete buffers back? Or you lost some count? Controller does not accumulate all into one event, first time the event will have a count of one, then during the UART delay, it will accumulate. With this theory, all but 2 count would always remain and this would not stall transfer.
13h
carlesc  @James Jamesson let me reply to point 2. The controller will keep the connection even alive as long as it has data in its buffers to be sent. Once it has no data left to send it will close the event until the connection interval elapses. This can be overridden with the new BT_CTLR_FORCE_MD_COUNT, where the controller will keep the connection event alive for MD_COUNT extra "packets" (which will be sent as empty packets) while it waits for more actual tx data to arrive via UART (edited) 
13h
carlesc  According to the nRF sniffer trace you sent, the controller does wait with empty packets for more tx data to arrive, but it never arrives so it ends up closing the connection event
13h
James Jamesson  I have managed to fix the issue. Your concerns about ACKing multiple packets was the issue with my case. I modified my code and saw that indeed, the controller occasionally ACKed two packets in one message, and I didn't use the correct means to take this into account. This is also why I saw this same behaviour when using the NRF52 dev kit with Bluez. (edited) 
12h
James Jamesson  Given that my application has some overhead when switching from one fifo to another, I now manage to get max throughput of 750 kbps, which is almost the max limit of UART
12h
carlesc  excellent! thank you for your feedback James! 750Kbps is more than reasonable over UART at 1Mbps
12h
carlesc  however, I am not sure why you saw the issue with BlueZ, which certainly correctly parses the number of completed packets event?
12h
James Jamesson  So I apologize you had to bear with me for so long, but I'm also extremely grateful that you helped me solve this issue. I'm amazed to how fast you guys reacted to this issue, even over the weekend. So much gracias!
12h
James Jamesson  I wrote an application to run over Bluez, so Bluez did all of the connection management, yet my application would do the buffer management and enqueueing
12h
carlesc  oh I see
12h
carlesc  @James Jamesson you are welcome, but since your setup is particularly exotic, may I ask why you are writing your own Host instead of using Zephyr's or BlueZ directly?
12h
James Jamesson  Yes, ofcourse :slightly_smiling_face:

    The embedded application we are going to run on top of the STM32F2 is quite constrained so we want an extremely lightweight stack. We can't afford too much thread occupation for the BT stack
    The throughput needs to be the absolute maximum that is possible over UART. To achieve this, we want to understand and have 100% control of whats going over UART
    We have some previous application application data parsers that worked with the previous bluetooth versions, so we thought it would be a piece of cake to migrate this to BT5 with the Zephyr's hci controller :)

12h
carlesc  very nice :slightly_smiling_face: thanks for the info! Let us know if you have more questions. The issue you opened will be closed as soon as @vich’s PR is merged

@cvinayak
Copy link
Contributor Author

@jjamesson Please try a build of hci_uart, latest commits in this PR, and let me know if the throughput is satisfactory?
https://github.com/zephyrproject-rtos/zephyr/commits/7bb9920d16579f3d3b89e5687bbdaa629e5624ca

I have added a draft automatic calculation of the Force MD bit count, hence, you do not need to set any explicit Kconfig options other than configuring the max data length and tx bufffer size.

@cvinayak cvinayak marked this pull request as ready for review September 18, 2020 08:13
@carlescufi
Copy link
Member

@nordic-krch could you please test this with the AUTO option enabled and see that you get full thoughput?
@joerchan any chance you could give this a go as well before Friday?

@ioannisg
Copy link
Member

@cvinayak this needs rebase, now.

Copy link
Member

@carlescufi carlescufi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in order. @MaureenHelm and @ioannisg we really need this for 2.4.0, it fixes an impactful issue with throughput.

Copy link
Member

@ioannisg ioannisg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is safe to get this in at this late state of the release, considering that it had been tested extensively while it was waiting to get approvals.

@carlescufi carlescufi added the priority: medium Medium impact/importance bug label Sep 25, 2020
@MaureenHelm MaureenHelm modified the milestones: v2.4.0, v2.5.0 Sep 27, 2020
Remove commented out code.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Minor relocation of lll_conn_flush function to place alongwith
non-static functions.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Move tx_ull_dequeue function to be placed alongwith other
static function definitions.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Add a static inline interface to get ULL context reference
count.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Refactor ull_conn_tx_ack function as it no longer needs to
return the connection context back to caller.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Fix premature connection event close due to the new Tx
buffers not being de-multiplexed and routed to connection's
Lower Link Layer context when they arrive while being
inside the connection's radio event.

Also, fix master prepare to demux and enqueue two Tx buffers
so that MD bit can be set correctly.

Relates to zephyrproject-rtos#27981.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Optimize the Tx PDU preparation, empty PDU only needs MD bit
to be modified, other fields be initialised only at power up.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Fix the check that decides to close a connection event,
which was missing a check on MD bit being set for empty PDU
to be Tx-ed out.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Add force Md bit feature wherein connection events can be
extended to match the rate at which applications provide
new Tx data.

Fixes zephyrproject-rtos#27981.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Add Force MD bit feature wherein connection events can be
extended to match the rate at which applications provide
new Tx data. MD bit in Tx PDUs is forced on if there has
been CONFIG_BT_CTLR_TX_BUFFERS count number times Tx/Rx
happened in a connection event. The assumption here is,
if controller's tx buffers are full, it is probable that
the application would send more Tx data during the
connection event, and keeping the connection event alive
will help improve overall throughput.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Added feature to in-system measure incoming Tx throughput.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Added automatic runtime calculation of Forced MD bit count
based on incoming Tx buffer throughput.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
@ioannisg ioannisg merged commit e4fe743 into zephyrproject-rtos:master Sep 28, 2020
@cvinayak cvinayak deleted the github_hci_uart_throughput branch March 1, 2021 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Low UART utilization for hci_uart
8 participants