-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plat/kvm/arm: Rework early cache clean & invalidate #1049
plat/kvm/arm: Rework early cache clean & invalidate #1049
Conversation
Marking as Draft to discuss PIE with @mogasergiu and get feedback from @kubanrob. |
plat/kvm/arm/entry64.S
Outdated
* expensive to invalidate the whole cache. In this case, just | ||
* just need to invalidate what we are going to use: | ||
* DTB, TEXT, DATA, BSS, and bootstack. | ||
#ifndef CONFIG_OPTIMIZE_PIE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does OPTIMIZE_PIE
imply that the Linux Boot Protocol is used? At a first glance this seems unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unrelated indeed (also notice this is an ifndef
). I do not make an explicit check for CONFIG_KVM_BOOT_PROTO_LXBOOT
but rely on the MMU heuristic to also cover for QEMU virt, and potentially other platforms that don't use the linux boot protocol but exhibit the same behavior. Tbh I'm between two minds about this. On the one hand implicit detection looks like a bad idea, on the other hand it allows more flexibility. If we want to be on the safe side I can make this conditional to BOOT_PROTO_LXBOOT
OR BOOT_PROTO_QEMU_VIRT
.
What do you think?
324357d
to
8fe3016
Compare
After discussing with @mogasergiu I am updating this to also have PIE rely on the bootloader to have the cache clean, but still unconditionally invalidate the entire image region as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome optimizations! 💯
A few minor comments.
The arm64 linux boot protocol provides the requirements for the system's state before jumping into the kernel [1]. Among these it is required that upon entry: - The MMU is off - The D-cache for the region corresponging to the loaded image must be cleaned to the point of coherence. Skip expensive cache clean & invalidate operations if the MMU is found to be disabled at boot. Although this heuristic is not strictly required it provides with some additional confidence that the bootloader behaves as expected. With a clean cache, additionally optmize cache invalidations by limiting them to the regions accessed before the MMU is enabled. [1] https://www.kernel.org/doc/Documentation/arm64/booting.txt Signed-off-by: Michalis Pappas <michalis@unikraft.io>
To maximize boot performance we only invalidate the cache for regions accessed before enabling the MMU. This makes the clean & invalidate step of the entire image region after enabling the MMU redundant. Signed-off-by: Michalis Pappas <michalis@unikraft.io>
The current implementation of clean and invalidate by region uses a barrier after every cache line. This is unnecessary and expensive. Use a barrier once, at the end of the operation. Signed-off-by: Michalis Pappas <michalis@unikraft.io>
8fe3016
to
74ce86b
Compare
Updated commit message, fixed conditional includes according to coding style based on comments from @mogasergiu. Replaced |
This has been successfully tested on QEMU both natively and emulated, and on Firecracker. @kubanrob besides the above open question, please let me know if you need to do any testing on your side, otherwise we're close to merge. |
Right, that too, forgot about it, thanks for keeping it in mind! 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very much appreciated that you took PIE into consideration. Thank you! 🛩️
Reviewed-by: Sergiu Moga sergiu@unikraft.io
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved-by: Razvan Deaconescu razvand@unikraft.io
To maximize boot performance we only invalidate the cache for regions accessed before enabling the MMU. This makes the clean & invalidate step of the entire image region after enabling the MMU redundant. Signed-off-by: Michalis Pappas <michalis@unikraft.io> Reviewed-by: Sergiu Moga <sergiu@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> Tested-by: Unikraft CI <monkey@unikraft.io> GitHub-Closes: #1049
The current implementation of clean and invalidate by region uses a barrier after every cache line. This is unnecessary and expensive. Use a barrier once, at the end of the operation. Signed-off-by: Michalis Pappas <michalis@unikraft.io> Reviewed-by: Sergiu Moga <sergiu@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> Tested-by: Unikraft CI <monkey@unikraft.io> GitHub-Closes: #1049
✅ Checkpatch passed Beep boop! I ran Unikraft's
|
Prerequisite checklist
checkpatch.uk
on your commit series before opening this PR;Base target
arm64
]kvm
]app-python3
or N/A]Additional configuration
Description of changes
During boot the following cache operations are required:
The arm64 linux boot protocol provides the requirements for the system's state before jumping into the kernel [1[. Among these it is required that upon entry:
These conditions provide the opportunity of optimization the existing boot code. Specifically we can skip the extremely expensive clean & invalidating of the entire image on LXBOOT and QEMU virt, and futhermore limit further cache invalidation to the regions accessed before enabling the MMU.
Additionally to the above, optimize the existing cache and invalidate functionality by using a single barrier at the end of the operation.