New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
X86 optimize context switch #17788
X86 optimize context switch #17788
Conversation
All checks are passing now. Review history of this comment for details about previous failed status. |
48a4e0c
to
10c6257
Compare
10c6257
to
ee9bead
Compare
ee9bead
to
3f6fa15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No doc changes to review
36845c6
to
3d13f8a
Compare
@ioannisg @vonhust @ruuddw any objections to the two patches which touch the mem domain API and documentation changes? they are fairly trivial, mostly just moved the current thread check into the arch implementations The relevant patches in this PR are
|
@andrewboie sorry for being late on this. So, AFAICS, it is the arch-specific code which decides to re-program MPU (ARC, ARM), if the functions are called for the current thread. Does this mean that this principle is no longer needed, generally, e.g. for other architectures, such as x86? In that case, I believe we lack some documentation in arm_core_mpu.c, arc_core_mpu.c, or in the corresponding API header, that clarifies why these checks are required. IMHO, maybe some notes on the function docs would suffice... |
I'm not quite sure what you mean, it depends on the arch code. On x86 it's any thread because there is arch-specific state that has to be managed for any thread that was in the domain being modified (in this case the per-thread pagetables) So if I destroy a memory domain, I need to update all the per-thread page tables for the threads that were running in it, because they are no longer in a memory domain. The key thing about per-thread state management is it removes having to compute anything at context switch, at the expense of more overhead when the memory domain is configured.
The checks are required because the arch code was assuming that any changes were for the active thread and the MPU hardware gets immediately reprogrammed. For 2.1 I'm prioritizing #13074 for at least ARMv7 MPU and these will get rewritten |
Moving the check for 'MPU HW' reprogramming or not to the arch specific code is ok for me if it helps other arches. Not sure yet if it would help or be required for #13074 - managing such a 'TLB' would preferably be generic and not arch specific code. I can imagine a 3 layer approach instead of current 2 layers as well: logical/api mem domain management -> logical to physical mapping (and lazy programming/TLB management) -> physical MPU programming (as in current arch specific code). |
That doesn't matter, it manages per-thread data so it will require this |
So if I destroy a memory domain, I need to update all the per-thread page tables for the threads that were running in it, because they are no longer in a memory domain. OK, so I understand that ARC and ARM only call the on-line reprogramming if it is the current thread, while in x86 it is any thread. Thanks for clarifying. So, as I said above (maybe not clear enough) is that I would like us to document the reasoning why we only need to do on-line re-programming if it is the current thread in ARM Core MPU code. Earlier, these checks for current-thread were not present in arch-code, because they were in kernel/mem_domain, but since they are now in ARM core mpu, i think it would be nice to comment why we do them, to help explicitly the ARM developer understand the code better. |
3d13f8a
to
e02e37a
Compare
Populate thread->stack_obj earlier in the thread initialization process such that it is set when z_new_thread() is called. There was nothing specific about its position, or the rest of the code in that CONFIG_USERSPACE block, so just move it all up.. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Some options like stack canaries use more stack space, and on x86 this is not quite enough for ztest's main thread stack to be 512 bytes. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Has the same effect of catching stack overflows, but makes debugging with GDB simpler since we won't get errors when inspecting such regions. Making these areas non-present was more than we needed, read-only is sufficient. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Currently page tables have to be re-computed in an expensive operation on context switch. Here we reserve some room in the page tables such that we can have per-thread page table data, which will be much simpler to update on context switch at the expense of memory. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
These turned out to be quite useful when debugging MMU issues, commit them to the tree. The output format is virtually the same as gen_mmu_x86.py's verbose output. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Wrapper to assembly code working with CR3 register. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Lets us know what set of page tables were in use when the error occurred. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Need to enumerate the constraints on adding a partition to a memory domain, some may not be obvious. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
The current API was assuming too much, in that it expected that arch-specific memory domain configuration is only maintained in some global area, and updates to domains that are not currently active have no effect. This was true when all memory domain state was tracked in page tables or MPU registers, but no longer works when arch-specific memory management information is stored in thread-specific areas. This is needed for: zephyrproject-rtos#13441 zephyrproject-rtos#13074 zephyrproject-rtos#15135 Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Previously, context switching on x86 with memory protection enabled involved walking the page tables, de-configuring all the partitions in the outgoing thread's memory domain, and then configuring all the partitions in the incoming thread's domain, on a global set of page tables. We now have a much faster design. Each thread has reserved in its stack object a number of pages to store page directories and page tables pertaining to the system RAM area. Each thread also has a toplevel PDPT which is configured to use the per-thread tables for system RAM, and the global tables for the rest of the address space. The result of this is on context switch, at most we just have to update the CR3 register to the incoming thread's PDPT. The x86_mmu_api test was making too many assumptions and has been adjusted to work with the new design. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Makes the code that defines stacks, and code referencing areas within the stack object, much clearer. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
e02e37a
to
cceebb2
Compare
The PR does the following:
Major changes:
Minor changes:
Please read individual commit messages for more details.
This patch leaves the build-time page table generation alone, although this new code doesn't care how the master page tables were created and when we switch to runtime generated page tables at boot, this code can remain the same.
Addresses #15135 on x86
Fixes: #13003
Contains groundwork necessary for: #13441 #13074 #15223