Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arch: arm: cortex_m: maintainability improvement by reducing assembly surface #65071

Conversation

ithinuel
Copy link
Collaborator

This PR simplifies maintenance and improves portability by relying on C where asm is not strictly required.

This requires:

@ithinuel ithinuel force-pushed the arch/arm/cortex_m/reduce_assembly_occurences branch from 56315ad to 3a84306 Compare November 27, 2023 11:10
@ithinuel ithinuel force-pushed the arch/arm/cortex_m/reduce_assembly_occurences branch from 3a84306 to c48b0e3 Compare December 5, 2023 15:51
@decsny decsny requested review from yvanderv and removed request for dleach02, yonsch, ifyall, npal-cy, MulinChao and ChiHuaL April 8, 2024 16:42
@MaureenHelm MaureenHelm removed the dev-review To be discussed in dev-review meeting label Apr 10, 2024
@carlescufi
Copy link
Member

Merging since the maintainer is the author.

@carlescufi carlescufi merged commit 42036cd into zephyrproject-rtos:main Apr 15, 2024
30 of 31 checks passed
@ithinuel ithinuel deleted the arch/arm/cortex_m/reduce_assembly_occurences branch April 15, 2024 16:21
@Thalley
Copy link
Collaborator

Thalley commented Apr 16, 2024

@ithinuel this PR seems to break some ARM targets like the nRF5340. I haven't tested other platforms just yet, but at least the Bluetooth IPC driver seems to not work after this.

@ithinuel
Copy link
Collaborator Author

Hello @Thalley, I am aware of some breakage on some NXP platforms (see #64784) but I’m not aware of anything else.

Can you please create an issue & provide a method to reproduce?

@gchwier
Copy link
Collaborator

gchwier commented Apr 16, 2024

It causes also a rolling reboot in tests of MCUboot.
To reproduce:
west build --sysbuild -p -b nrf5340dk/nrf5340/cpuapp zephyr/tests/boot/with_mcumgr
west flash

@gchwier
Copy link
Collaborator

gchwier commented Apr 16, 2024

when added

	while (1) {
		k_sleep(K_MSEC(1000));
	}

instead of just returning main, it works (no rolling reboot)

@carlescufi
Copy link
Member

@gchwier can you please open an issue as well?

@ithinuel
Copy link
Collaborator Author

@gchwier I think I identified an issue in cpu_idle.c.

Would you mind checking if #71593 solves it?
This is the only significant difference I could notice between the two generated binaries.

@Thalley
Copy link
Collaborator

Thalley commented Apr 17, 2024

@gchwier I think I identified an issue in cpu_idle.c.

Would you mind checking if #71593 solves it? This is the only significant difference I could notice between the two generated binaries.

It does not fix the issue I'm seeing - I'll create a bug report soon

@Thalley
Copy link
Collaborator

Thalley commented Apr 17, 2024

@gchwier I think I identified an issue in cpu_idle.c.
Would you mind checking if #71593 solves it? This is the only significant difference I could notice between the two generated binaries.

It does not fix the issue I'm seeing - I'll create a bug report soon

@ithinuel #71596

Interestingly there are no issues for non-Bluetooth applications, so may be related to the HCI driver or something else.

@nordic-piks
Copy link
Collaborator

nordic-piks commented Apr 17, 2024

@gchwier I think I identified an issue in cpu_idle.c.
Would you mind checking if #71593 solves it? This is the only significant difference I could notice between the two generated binaries.

It does not fix the issue I'm seeing - I'll create a bug report soon

@ithinuel #71596

Interestingly there are no issues for non-Bluetooth applications, so may be related to the HCI driver or something else.

There are issues for multiple tests (not only Bluetooth), eg.
nrf9160dk_nrf9160: tests/kernel/workq/work_queue/kernel.workqueue

10:12:56  Running TESTSUITE workqueue_delayed
10:12:56  ===================================================================
10:12:56  ASSERTION FAIL [!z_is_thread_state_set(_kernel.cpus[0].current, ((1UL << (4))))] @ ../../../../../../../zephyr/kernel/sched.c:1170
10:12:56  E: ***** HARD FAULT *****
10:12:56  E:   Fault escalation (see below)
10:12:56  E: ARCH_EXCEPT with reason 4
10:12:56  E: r0/a1:  0x00000004  r1/a2:  0x00000492  r2/a3:  0x00000001
10:12:56  E: r3/a4:  0x00000004 r12/ip:  0x00000000 r14/lr:  0x00006c09
10:12:56  E:  xpsr:  0x29000000
10:12:56  E: Faulting instruction address (r15/pc): 0x000086d0
10:12:56  E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
10:12:56  E: Current thread: 0x200008f0 (main)
10:12:56  E: Halting system```

@gchwier
Copy link
Collaborator

gchwier commented Apr 17, 2024

@gchwier I think I identified an issue in cpu_idle.c.

Would you mind checking if #71593 solves it? This is the only significant difference I could notice between the two generated binaries.

With that fix there is no rolling-boot, but there is also no printed messages on the console, and shell is not available

Copy link
Member

@stephanosio stephanosio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit late to the party here ... but is converting the core arch implementations written in ASM to C really something we want/should be doing?

The _isr_wrapper function in isr_wrapper.S, for instance, is called for every interrupt and just a few instructions added to it can lead to a significant performance degradation. Same goes for z_arm_pendsv, which is called for every context switch.

Places like these are where we want complete control over the exact instructions used and hence why they were written in ASM.

Also, it is more difficult to control data and instruction barriers by switching to the C implementation (especially using the CMSIS functions), and this can lead to unnecessary and redundant pipeline flushes and hence performance degradation.

Moreover, the existing ASM implementation was fairly clean, well commented, and easy enough to read.

Given that this has already caused many issues for us and will likely cause more in the future, should we look into reverting it and getting back to the good old ASM implementation?

Comment on lines -231 to -235
/* ISB is not strictly necessary here (stack pointer is not being
* touched), but it's recommended to avoid executing pre-fetched
* instructions with the previous privilege.
*/
isb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important bits like these get lost when translating to C.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a good point. The ISB is included in __set_CONTROL, so the barrier is not lost, but it would be good to transfer the comment about pre-fetched instructions.

@ithinuel
Copy link
Collaborator Author

Thank you for your valuable feedback.
The idea of this change was initiated by the difficulties encountered while learning about zephyr and its implementation.

I understand all your points and agree with most of them. However, since its introduction, the code has evolved to include a significant amount of conditional features inclusion which compounds with accommodations for the various architectural variants (v6m, v7m, v8m.main, v8m.base etc) makes some of the code difficult to read and keep track of.
isr_wrapper.S literally had nearly as much instruction lines (36) as pre-processor (30), and this is counting the blocks like this as 8instructions for 5pre-processor.

In the various comparison I made, the generated code was on par or better than the previous solution. The most recent comparison I did was built with west build --sysbuild -p -b nrf5340dk/nrf5340/cpuapp tests/boot/with_mcumgr at a2ed816 and 4b276c4 where the generated code for _isr_wrapper is effectively 1 instruction smaller than the ASM implementation.

In a number of places, the generated code is/was either suboptimal or incorrect. Eg in this instance (which hasn’t been changed as part of this PR), it would fail to build for Cortex-M23 with BUILTIN_STACK_GUARD enabled and will generate sub-optimal code with unnecessary pop/push sequences. So the ASM implementation is not free from latent issues and I obviously intend to fix these aswell in upcoming contributions.

A notable pain point is around manual in-lining of irq locking/unlocking which makes it hard to keep them all consistent and to introduce changes. Eg, the implementation of SMP on Armv6-m will require to
enable SoC specific synchronisation between core. Identifying and propagating the required changes will be challenging.
The issues raised so far are actually all related to missing calls __enable_irq()/__disable_irq().

Due to the nature of the change, I was expecting some breakage to have sipped through the checks (all CI passed). I am still ready to investigate and document the cause of any issue that may still arise. Hopefully, this hasn’t inconvenienced too many people and I appreciate the patience of those who have been.

So I don’t think reverting would be an appropriate move and I remain convinced this is a good thing for the maintainability.

@gmarull
Copy link
Member

gmarull commented Apr 26, 2024

CMSIS is actually a problem: #57246, I hope it gets some attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Architectures area: ARM ARM (32-bit) Architecture platform: nRF Nordic nRFx
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet