Bluetooth: controller: Fix Tx Buffer Overflow #31813

cvinayak · 2021-02-01T05:16:06Z

Fix Tx Buffer Overflow caused by uninitialized node_tx
memory being used by ULL ISR context due to Compiler
Instructions Reordering in the use of MFIFO_ENQUEUE.

The MFIFO last index was committed before the data element
was stored in the MFIFO due to Compiler Instructions
Reordering.

This is fixed now by adding a Data Memory Barrier
instruction alongwith a compiler memory clobber.

Fixes #30378.

Signed-off-by: Vinayak Kariappa Chettimada vich@nordicsemi.no

thoh-ot

Squash your commits, makes no sense to have a second commit that only overrides part of the first commit.

Why wouldn't the compiler_barrier() have worked in this case?

mped-oticon · 2021-02-01T11:36:56Z

Using a memory barrier is overkill on a single-core CPU; generous use of volatile should be sufficient in preventing the compiler from scheduling this "optimized" instruction order.

carlescufi · 2021-02-01T12:56:56Z

subsys/bluetooth/controller/ll_sw/nordic/hal/nrf5/cpu.h

This is OK for now, but I feel this belongs somewhere in arch/ ?

Will use it, when an interface is available in ARCH, but dont want to go into introducing one now which is going to to be ARM unique.

carlescufi · 2021-02-01T12:59:27Z

subsys/bluetooth/controller/ll_sw/nordic/hal/nrf5/cpu.h

Why not use CONFIG_CPU_CORTEX_M?
@ioannisg is this applicable to Cortex-A or -R as well?

Why not use CONFIG_CPU_CORTEX_M?

It is just a choice to have explicit CPU check, to the architecture variants supported by controller, so that new ports are aware of this interface use.

If the problem is the instruction re-ordering, i think we should try to solve it without asking the CPU to execute any instruction (DMB would eat 2 cycles, i think)

It is just a choice to have explicit CPU check, to the architecture variants supported by controller, so that new ports are aware of this interface use.

I guess it's better to hide it under SOC_NORDIC_NRF5_COMPATIBLE or similar.
Otherwise, CPU_CORTEX_M is cleaner, I think.

cvinayak · 2021-02-01T13:39:29Z

@thoh-ot

Squash your commits, makes no sense to have a second commit that only overrides part of the first commit.

First commit is for use of DMB instructions instead of DSB, data with instruction sync not needed in the controller, DMB sufficient.
Second commit is to fix the compiler instruction reordering plus any write buffer be flushed (using DMB).

Why wouldn't the compiler_barrier() have worked in this case?

It will work for Cortex-M0 but for Cortex-M4 with its Write Buffer, it is recommended to use DMB.
https://static.docs.arm.com/dai0321/a/DAI0321A_programming_guide_memory_barriers_for_m_profile.pdf

@mped-oticon

Using a memory barrier is overkill on a single-core CPU; generous use of volatile should be sufficient in preventing the compiler from scheduling this "optimized" instruction order.

Use of volatile does not ensure that compiler will not reorder instructions, see this:
15bc537
and, this:

ioannisg · 2021-02-01T15:24:40Z

subsys/bluetooth/controller/ll_sw/nordic/hal/nrf5/cpu.h

IF the problem is only the re-ordering, why use a data memory barrier? _DMB() is really not needed in Cortex-M. Instruction re-ordering could be avoided with a compiler barrier, or am I missing something?

Based on ARM recommendation:

On microcontroller devices using the Cortex-M3 and Cortex-M4 processors, omission of the DMB instruction in semaphore and Mutex operations does not cause an error. However, this is not guaranteed if:
•the processor has a cache
•the software is used in a multi-core system.
ARM recommends that the DMB instruction should be used in semaphores and Mutex operations in OS designs.

will keep the DMB instruction to be safe with respect to if cache is implemented in processor.

but that certainly does not apply to Cortex-M0, for example, and even M4, I am not sure.
My suggestion is to limit the solution to the minimal: Instruction re-orderning -> compliler barrier, so we are not hiding with DSB/DMB any other corner cases that we can not later detect and educate ourselves.

As long as the Zephyr kernel code is using the recommendations, I will be against taking short cuts. Let zephyr kernel code, remove the use of DSB/DMB first. This is something yourself added in some places in the ARM architecture code.

DMB should not really be added anywhere, if i remember correctly. If it is, it should be removed, agreed. As for DSB (and ISB), this is added, for reasons other than fighting instruction re-ordering. See relevant comments below :) .

ioannisg · 2021-02-01T15:30:59Z

subsys/bluetooth/controller/ll_sw/nordic/hal/nrf5/cpu.h

Could this be applied in the case above, as well?

This solves the issue, but following the ARM recommendation, will keep the DMB instruction.

A reason not to do this is, if you ever want this code to switch to using arch_ APIs; eventually we're going to get an ARM or even a cross-ARCH API to insert compiler barriers and avoid instruction re-ordering. Then if you already do what's needed here, migrating to generic API calls will be easier and backwards-compatible.

Kernel code uses DSB/ISB instructions for both M0 and M4 architectures, maybe i understand the control path incorrectly, but they are there definitely in the cpu_idle.S, swap_helper.S to name a few.

When there is a cross-ARCH API available, the change can be done to reuse.

Yeap, DSB, ISBs are used, for reasons explained inline (not related to compiler barrier placements)

ioannisg

It doesn't matter so much, really, but I'd like the fix to contain only what was necessary, i.e. the compiler barrier, not the additional instruction, coming at a 2-cycle cost, for the reasons mentioned in the comments.
But it's not core ARCH code so I won't insist if you want to use _DMB.

cvinayak · 2021-02-01T16:33:01Z

It doesn't matter so much, really, but I'd like the fix to contain only what was necessary, i.e. the compiler barrier, not the additional instruction, coming at a 2-cycle cost, for the reasons mentioned in the comments.
But it's not core ARCH code so I won't insist if you want to use _DMB.

core ARCH code does use DSB/ISB though, and was the inspiration (and ARM recommendation) to use _DMB. The 2-cycle cost is too much actually in the kernel, if I think of it for an architecture like M0/M3/M4, kernel does lot more swap compared to what BLE controller uses it for.

ioannisg · 2021-02-01T17:05:46Z

core ARCH code does use DSB/ISB though, and was the inspiration (and ARM recommendation) to use _DMB.

Core ARCH uses these, not to battle instruction re-ordering but in real use-cases where it is important to do DSB (e.g. after WFI, or SCB register writes) or ISB (after CONTROL, BASEPRI register sets, etc.). But as I said, I won't insist agains __DMB(), after all, it is still CMSIS-compatible.

cvinayak · 2021-02-01T17:45:49Z

not to battle instruction re-ordering

C compiler is not used, the instruction order and ARM recommendations are implemented in assembly code. Example, DSB used here:

zephyr/arch/arm/core/aarch32/swap_helper.S

Line 554 in 8410a8c

dsb

where it is important to do DSB (e.g. after WFI, or SCB register writes) or ISB (after CONTROL, BASEPRI register sets, etc.).

For WFI example, I am refering to ARM Cortex™-M Programming Guide to Memory Barrier Instruction, Section 4.14 Entering sleep mode

In simple designs that do no include a system level write buffer, the memory barrier instruction is not required before entering sleep mode in the Cortex-M3 (r2p0 or later), Cortex-M4, Cortex-M0+, and Cortex-M0 processors. This is handled by the processor.

Yet this is implemented as a recommendation, in the zephyr kernel. I dont see why controller code should not follow the ARM architectural recommendations to ensure that the write buffer on the processor has completed its operation before subsequent operations can be started.

cvinayak · 2021-02-02T04:26:57Z

I have now removed use of DMB for Cortex-M0, but retain it for Cortex-M4/M33, so as to be consistent with GCC builtins which use DMB as memory barrier when CONFIG_ATOMIC_OPERATIONS_BUILTIN=y

(https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/config/arm/sync.md#L46)

Also, see that cpu_idle.S for Cortex-M0 has introduced ISR latency by using DSB with interrupts locked before entering WFI

zephyr/arch/arm/core/aarch32/cpu_idle.S

Line 98 in fec399e

dsb

@ioannisg do you think I should remove use of DMB for Cortex-M33, seems GCC does not use DMB for ARMv8 architecture:
https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/config/arm/arm.h#L287

ioannisg · 2021-02-02T07:35:24Z

@cvinayak DMB is really not needed in Cortex-M implementations, at least, I have not found yet a practical use case. I would consider removing it, yes.

DSB, ISBs are a different story.

DSB is needed before WFI (it is actually an implementation requirement for Cortex-M3,4,7,33,35,55) AFAIK.
ISB is also needed for BASEPRI, and CONTROL so the irq_lock or CONTROL register changes are realized immediately before anything else is executed. Again, these have little to do with instruction re-ordering.

cvinayak · 2021-02-02T09:15:05Z

@ioannisg you may open up any of the Cortex-M4 Zephyr Bluetooth application's zephyr.lst file, which use GCC builtins and you will notice a lot of the builtins are enclosed with DMB instructions. These are just wasted CPU cycles.

ok, I will remove the use of DMB in this PR.

Use of Data Memory Barrier instruction with memory clobber in ARM Cortex M architectures is sufficient in the controller implementation to keep compiler data access instructions in order so that an ISR vectoring has memory accesses in the correct order as intented by design. Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>

Fix Tx Buffer Overflow caused by uninitialized node_tx memory being used by ULL ISR context due to Compiler Instructions Reordering in the use of MFIFO_ENQUEUE. The MFIFO last index was committed before the data element was stored in the MFIFO due to Compiler Instructions Reordering. This is fixed now by adding a Data Memory Barrier instruction alongwith a compiler memory clobber. Fixes zephyrproject-rtos#30378. Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>

ioannisg · 2021-02-02T13:40:53Z

Thanks @cvinayak

Yet this is implemented as a recommendation, in the zephyr kernel. I dont see why controller code should not follow the ARM architectural recommendations to ensure that the write buffer on the processor has completed its operation before subsequent operations can be started.

As a side comment, IMHO, it is more practical to look at the implementation requirements, instead of the architecture requirements, to see what is really needed in the case of barriers (and other features). After all, we don't make any custom ARM-M design - we all implement the existing Cortex-M CPUs, so their implementation requirements is all what matters :)

Anyways, as said, DMB won't harm you here, so wouldn't complain more (in case you ever want to bring this back) :)

cvinayak requested review from carlescufi, kruithofa and thoh-ot as code owners February 1, 2021 05:16

github-actions bot added area: Bluetooth area: Bluetooth Controller labels Feb 1, 2021

cvinayak force-pushed the github_tx_overflow_fix branch from 57bc864 to ce3c307 Compare February 1, 2021 05:18

cvinayak added this to the v2.5.0 milestone Feb 1, 2021

cvinayak mentioned this pull request Feb 1, 2021

Unbonded peripheral gets 'Tx Buffer Overflow' when erasing bond on master #15570

Closed

thoh-ot reviewed Feb 1, 2021

View reviewed changes

carlescufi added the bug The issue is a bug, or the PR is fixing a bug label Feb 1, 2021

carlescufi reviewed Feb 1, 2021

View reviewed changes

carlescufi requested review from ioannisg and joerchan February 1, 2021 13:00

cvinayak requested review from carlescufi and thoh-ot February 1, 2021 13:52

ioannisg reviewed Feb 1, 2021

View reviewed changes

zephyrbot assigned cvinayak Feb 1, 2021

cvinayak force-pushed the github_tx_overflow_fix branch from ce3c307 to fba2bbf Compare February 2, 2021 03:36

cvinayak force-pushed the github_tx_overflow_fix branch from fba2bbf to ac8b90d Compare February 2, 2021 09:49

ioannisg approved these changes Feb 2, 2021

View reviewed changes

carlescufi approved these changes Feb 2, 2021

View reviewed changes

nashif merged commit 1af2b91 into zephyrproject-rtos:master Feb 2, 2021

cvinayak deleted the github_tx_overflow_fix branch March 1, 2021 00:33

Bluetooth: controller: Fix Tx Buffer Overflow #31813

Bluetooth: controller: Fix Tx Buffer Overflow #31813

Uh oh!

Conversation

cvinayak commented Feb 1, 2021

Uh oh!

thoh-ot left a comment

Choose a reason for hiding this comment

Uh oh!

mped-oticon commented Feb 1, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlescufi Feb 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cvinayak commented Feb 1, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ioannisg left a comment

Choose a reason for hiding this comment

Uh oh!

cvinayak commented Feb 1, 2021

Uh oh!

ioannisg commented Feb 1, 2021

Uh oh!

cvinayak commented Feb 1, 2021

Uh oh!

cvinayak commented Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ioannisg commented Feb 2, 2021

Uh oh!

cvinayak commented Feb 2, 2021

Uh oh!

ioannisg commented Feb 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

carlescufi Feb 1, 2021 •

edited

Loading

cvinayak commented Feb 2, 2021 •

edited

Loading