-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big Xtensa architecture cleanup #32356
Commits on Mar 2, 2021
-
kernel/swap: Move arch_cohere_stacks() back under the lock
Commit 6b84ab3 ("kernel/sched: Adjust locking in z_swap()") moved the call to arch_cohere_stacks() out of the scheduler lock while doing some reorgnizing. On further reflection, this is incorrect. When done outside the lock, the two arch_cohere_stacks() calls will race against each other. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for c669607 - Browse repository at this point
Copy the full SHA c669607View commit details -
arch/xtensa: General cleanup, remove dead code
There was a bunch of dead historical cruft floating around in the arch/xtensa tree, left over from older code versions. It's time to do a cleanup pass. This is entirely refactoring and size optimization, no behavior changes on any in-tree devices should be present. Among the more notable changes: + xtensa_context.h offered an elaborate API to deal with a stack frame and context layout that we no longer use. + xtensa_rtos.h was entirely dead code + xtensa_timer.h was a parallel abstraction layer implementing in the architecture layer what we're already doing in our timer driver. + The architecture thread structs (_callee_saved and _thread_arch) aren't used by current code, and had dead fields that were removed. Unfortunately for standards compliance and C++ compatibility it's not possible to leave an empty struct here, so they have a single byte field. + xtensa_api.h was really just some interrupt management inlines used by irq.h, so fold that code into the outer header. + Remove the stale assembly offsets. This architecture doesn't use that facility. All told, more than a thousand lines have been removed. Not bad. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for 4d61aff - Browse repository at this point
Copy the full SHA 4d61affView commit details -
The xtensa atomics layer was written with hand-coded assembly that had to be called as functions. That's needlessly slow, given that the low level primitives are a two-instruction sequence. Ideally the compiler should see this as an inline to permit it to better optimize around the needed barriers. There was also a bug with the atomic_cas function, which had a loop internally instead of returning the old value synchronously on a failed swap. That's benign right now because our existing spin lock does nothing but retry it in a tight loop anyway, but it's incorrect per spec and would have caused a contention hang with more elaborate algorithms (for example a spinlock with backoff semantics). Remove the old implementation and replace with a much smaller inline C one based on just two assembly primitives. This patch also contains a little bit of refactoring to address the scheme has been split out into a separate header for each, and the ATOMIC_OPERATIONS_CUSTOM kconfig has been renamed to ATOMIC_OPERATIONS_ARCH to better capture what it means. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for 088d1cf - Browse repository at this point
Copy the full SHA 088d1cfView commit details -
arch/xtensa: Remove XTOS dependency in irq_lock()
This whole file is written to assume XEA2, so there's no value to using an abstraction call here. Write to the RSIL instruction directly. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for 1a6d7b6 - Browse repository at this point
Copy the full SHA 1a6d7b6View commit details -
soc/intel_adsp: Elevate cached/uncached mapping to a SoC API
The trace output layer was using this transformation already, make it an official API. There are other places doing similar logic that can benefit. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for ed7a7e4 - Browse repository at this point
Copy the full SHA ed7a7e4View commit details -
soc/intel_adsp: Clean up MP startup
The multiprocessor entry code here had some bits that look to have been copied from esp32, including a clumsy stack switch that's needed there. But it wasn't actually switching the stack at all, which on this device is pointed at the top of HP-SRAM and can stay there until the second CPU swaps away into a real thread (this will need to change once we support >2 CPUS though). Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for 81ed70a - Browse repository at this point
Copy the full SHA 81ed70aView commit details -
soc/intel_adsp: Clean up cache handling in MP startup
There's no need to muck with the cache directly as long as we're careful about addressing the shared start record through an uncached volatile pointer. Correct a theoretical bug with the initial cache invalidate on the second CPU which was actually doing a flush (and thus potentially pushing things the boot ROM wrote into RAM now owned by the OS). Optimize memory layout a bit when using KERNEL_COHERENCE; we don't need a full cache line for the start record there as it's already in uncached memory. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for 962039a - Browse repository at this point
Copy the full SHA 962039aView commit details -
soc/intel_adsp: Use the correct MP stack pointer
The kernel passes the CPU's interrupt stack expected that it will start on that, so do it. Pass the initial stack pointer from the SOC layer in the variable "z_mp_stack_top" and set it in the assembly startup before calling z_mp_entry(). Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for 12b4bc4 - Browse repository at this point
Copy the full SHA 12b4bc4View commit details -
soc/intel_adsp: Put initial stack into the CPU0 interrupt stack
Zephyr's normal architecture is to do all initialization in the interrupt stacks. The CAVS code was traditionally written to start the stack at the end of HP-SRAM, where it has no protection against overlap with other uses (e.g. MP startup used the same region for stacks and saw cache collisions, and the SOF heap lives in this area too). Put it where Zephyr expects and we'll have fewer surprises. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for efa05d1 - Browse repository at this point
Copy the full SHA efa05d1View commit details -
arch/xtensa: soc/intel_adsp: Rework MP code entry
Instead of passing the crt1 _start function as the entry code for auxiliary CPUs, use a tiny assembly stub instead which can avoid the runtime testing needed to skip the work in _start. All the crt1 code was doing was clearing BSS (which must not happen on a second CPU) and setting the stack pointer (which is wrong on the second CPU). This allows us to clean out the SMP code in crt1. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for 811aa96 - Browse repository at this point
Copy the full SHA 811aa96View commit details -
arch/xtensa: Add an arch-internal README on register windows
Back when I started work on this stuff, I had a set of notes on register windows that slowly evolved into something that looks like formal documentation. There really isn't any overview-style documentation of this stuff on the public internet, so it couldn't hurt to commit it here for posterity. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for 5a47aa0 - Browse repository at this point
Copy the full SHA 5a47aa0View commit details -
arch/xtensa: Add non-HAL caching primitives
The Xtensa L1 cache layer has straightforward semantics accessible via single-instructions that operate on cache lines via physical addresses. These are very amenable to inlining. Unfortunately the Xtensa HAL layer requires function calls to do this, leading to significant code waste at the calling site, an extra frame on the stack and needless runtime instructions for situations where the call is over a constant region that could elide the loop. This is made even worse because the HAL library is not built with -ffunction-sections, so pulling in even one of these tiny cache functions has the effect of importing a 1500-byte object file into the link! Add our own tiny cache layer to include/arch/xtensa/cache.h and use that instead. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for 882c24d - Browse repository at this point
Copy the full SHA 882c24dView commit details -
arch/xtensa: Invalidate bottom of outbound stacks
Both new thread creation and context switch had the same mistake in cache management: the bottom of the stack (the "unused" region between the lower memory bound and the live stack pointer) needs to be invalidated before we switch, because otherwise any dirty lines we might have left over can get flushed out on top of the same thread on another CPU that is putting live data there. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for 7130571 - Browse repository at this point
Copy the full SHA 7130571View commit details -
tests/fifo_api: Move k_fifo off stack
Putting spinlocks (or things containing them) onto the stack is a KERNEL_COHERENCE violation. This doesn't need to be there so just make it static. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for 1aab761 - Browse repository at this point
Copy the full SHA 1aab761View commit details -
arch/xtensa: Remember to spill windows in arch_cohere_stacks()
When we reach this code in interrupt context, our upper GPRs contain a cross-stack call that may still include some registers from the interrupted thread. Those need to go out to memory before we can do our cache coherence dance here. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for 1e3b4af - Browse repository at this point
Copy the full SHA 1e3b4afView commit details -
tests/queue: tests/lifo_usage: Address ADSP/coherence issues
These tests would pass pointers to data on their own stacks to other threads, which is forbidden when CONFIG_KERNEL_COHERENCE (because stack memory isn't cache-coherent). Make the variables static. Also, queue had two sleeps of 2 ticks (having been written in an era where that meant "20-30ms"), and on a device with a 50 kHz tick rate that's not very much time at all. It would sometimes fail spuriously because the spawned threads didn't consume the queue entries in time. How about 10ms of real time instead? Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for 3ead8f1 - Browse repository at this point
Copy the full SHA 3ead8f1View commit details -
tests/p4wq: Fix impossible sleep interval
The code here was written to "get out of the way just long enough for the trivial context switch and callback to execute". But on a machine with 50 kHz ticks, that's not reliably enough time and this was failing spuriously. Which would have been a reasonably forgivable mistake to make had I not written this code with this very machine in mind... Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Andy Ross committedMar 2, 2021 Configuration menu - View commit details
-
Copy full SHA for c780c20 - Browse repository at this point
Copy the full SHA c780c20View commit details