-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big Xtensa architecture cleanup #32356
Merged
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
c669607
kernel/swap: Move arch_cohere_stacks() back under the lock
4d61aff
arch/xtensa: General cleanup, remove dead code
088d1cf
arch/xtensa: Inline atomics
1a6d7b6
arch/xtensa: Remove XTOS dependency in irq_lock()
ed7a7e4
soc/intel_adsp: Elevate cached/uncached mapping to a SoC API
81ed70a
soc/intel_adsp: Clean up MP startup
962039a
soc/intel_adsp: Clean up cache handling in MP startup
12b4bc4
soc/intel_adsp: Use the correct MP stack pointer
efa05d1
soc/intel_adsp: Put initial stack into the CPU0 interrupt stack
811aa96
arch/xtensa: soc/intel_adsp: Rework MP code entry
5a47aa0
arch/xtensa: Add an arch-internal README on register windows
882c24d
arch/xtensa: Add non-HAL caching primitives
7130571
arch/xtensa: Invalidate bottom of outbound stacks
1aab761
tests/fifo_api: Move k_fifo off stack
1e3b4af
arch/xtensa: Remember to spill windows in arch_cohere_stacks()
3ead8f1
tests/queue: tests/lifo_usage: Address ADSP/coherence issues
c780c20
tests/p4wq: Fix impossible sleep interval
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
# How Xtensa register windows work | ||
|
||
There is a paucity of introductory material on this subject, and | ||
Zephyr plays some tricks here that require understanding the base | ||
layer. | ||
|
||
## Hardware | ||
|
||
When register windows are configured in the CPU, there are either 32 | ||
or 64 "real" registers in hardware, with 16 visible at one time. | ||
Registers are grouped and rotated in units of 4, so there are 8 or 16 | ||
such "quads" (my term, not Tensilica's) in hardware of which 4 are | ||
visible as A0-A15. | ||
|
||
The first quad (A0-A3) is pointed to by a special register called | ||
WINDOWBASE. The register file is cyclic, so for example if NREGS==64 | ||
and WINDOWBASE is 15, quads 15, 0, 1, and 2 will be visible as | ||
(respectively) A0-A3, A4-A7, A8-A11, and A12-A15. | ||
|
||
There is a ROTW instruction that can be used to manually rotate the | ||
window by a immediate number of quads that are added to WINDOWBASE. | ||
Positive rotations "move" high registers into low registers | ||
(i.e. after "ROTW 1" the register that used to be called A4 is now | ||
A0). | ||
|
||
There are CALL4/CALL8/CALL12 instructions to effect rotated calls | ||
which rotate registers upward (i.e. "hiding" low registers from the | ||
callee) by 1, 2 or 3 quads. These do not rotate the window | ||
themselves. Instead they place the rotation amount in two places | ||
(yes, two; see below): the 2-bit CALLINC field of the PS register, and | ||
the top two bits of the return address placed in A0. | ||
|
||
There is an ENTRY instruction that does the rotation. It adds CALLINC | ||
to WINDOWBASE, at the same time copying the old (now hidden) stack | ||
pointer in A1 into the "new" A1 in the rotated frame, subtracting an | ||
immediate offset from it to make space for the new frame. | ||
|
||
There is a RETW instruction that undoes the rotation. It reads the | ||
top two bits from the return address in A0 and subtracts that value | ||
from WINDOWBASE before returning. This is why the CALLINC bits went | ||
in two places. They have to be stored on the stack across potentially | ||
many calls, so they need to be GPR data that lives in registers and | ||
can be spilled. But ENTRY isn't specified to assume a particular | ||
return value format and is used immediately, so it makes more sense | ||
for it to use processor state instead. | ||
|
||
Note that we still don't know how to detect when the register file has | ||
wrapped around and needs to be spilled or filled. To do this there is | ||
a WINDOWSTART register used to detect which register quads are in use. | ||
The name "start" is somewhat confusing, this is not a pointer. | ||
WINDOWSTART stores a bitmask with one bit per hardware quad (so it's 8 | ||
or 16 bits wide). The bit in windowstart corresponding to WINDOWBASE | ||
will be set by the ENTRY instruction, and remain set after rotations | ||
until cleared by a function return (by RETW, see below). Other bits | ||
stay zero. So there is one set bit in WINDOWSTART corresponding to | ||
each call frame that is live in hardware registers, and it will be | ||
followed by 0, 1 or 2 zero bits that tell you how "big" (how many | ||
quads of registers) that frame is. | ||
|
||
So the CPU executing RETW checks to make sure that the register quad | ||
being brought into A0-A3 (i.e. the new WINDOWBASE) has a set bit | ||
indicating it's valid. If it does not, the registers must have been | ||
spilled and the CPU traps to an exception handler to fill them. | ||
|
||
Likewise, the processor can tell if a high register is "owned" by | ||
another call by seeing if there is a one in WINDOWSTART between that | ||
register's quad and WINDOWBASE. If there is, the CPU traps to a spill | ||
handler to spill one frame. Note that a frame might be only four | ||
registers, but it's possible to hit registers 12 out from WINDOWBASE, | ||
so it's actually possible to trap again when the instruction restarts | ||
to spill a second quad, and even a third time at maximum. | ||
|
||
Finally: note that hardware checks the two bits of WINDOWSTART after | ||
the frame bit to detect how many quads are represented by the one | ||
frame. So there are six separate exception handlers to spill/fill | ||
1/2/3 quads of registers. | ||
|
||
## Software & ABI | ||
|
||
The advantage of the scheme above is that it allows the registers to | ||
be spilled naturally into the stack by using the stack pointers | ||
embedded in the register file. But the hardware design assumes and to | ||
some extent enforces a fairly complicated stack layout to make that | ||
work: | ||
|
||
The spill area for a single frame's A0-A3 registers is not in its own | ||
stack frame. It lies in the 16 bytes below its CALLEE's stack | ||
pointer. This is so that the callee (and exception handlers invoked | ||
on its behalf) can see its caller's potentially-spilled stack pointer | ||
register (A1) on the stack and be able to walk back up on return. | ||
Other architectures do this too by e.g. pushing the incoming stack | ||
pointer onto the stack as a standard "frame pointer" defined in the | ||
platform ABI. Xtensa wraps this together with the natural spill area | ||
for register windows. | ||
|
||
By convention spill regions always store the lowest numbered register | ||
in the lowest address. | ||
|
||
The spill area for a frame's A4-A11 registers may or may not exist | ||
depending on whether the call was made with CALL8/CALL12. It is legal | ||
to write a function using only A0-A3 and CALL4 calls and ignore higher | ||
registers. But if those 0-2 register quads are in use, they appear at | ||
the top of the stack frame, immediately below the parent call's A0-A3 | ||
spill area. | ||
|
||
There is no spill area for A12-A15. Those registers are always | ||
caller-save. When using CALLn, you always need to overlap 4 registers | ||
to provide arguments and take a return value. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyross fwiw, I've noticed some differences between GCC and XCC here wrt stack and register layout when debugging exceptions. The SOF exception code will dump registers and stack but this looks different between XCC and GCC. SOF has a python script to decode the dump (works on XCC not GCC). Something to be aware off here for later on.