Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plat/kvm/arm: Rework early cache clean & invalidate #1049

Closed

Conversation

michpappas
Copy link
Member

Prerequisite checklist

  • Read the contribution guidelines regarding submitting new changes to the project;
  • Tested your changes against relevant architectures and platforms;
  • Ran the checkpatch.uk on your commit series before opening this PR;
  • Updated relevant documentation.

Base target

  • Architecture(s): [arm64]
  • Platform(s): [kvm]
  • Application(s): [e.g. app-python3 or N/A]

Additional configuration

Description of changes

During boot the following cache operations are required:

  • Clean the cache to the PoC to avoid evicted dirty cache lines to overwrite subsequent writes.
  • With the MMU / caches off, perform any accesses necessary
  • Invalidate any memory accessed to avoid clean cache lines to shadow what we previously wrote into memory.

The arm64 linux boot protocol provides the requirements for the system's state before jumping into the kernel [1[. Among these it is required that upon entry:

  • The MMU is off
  • The D-cache for the region corresponging to the loaded image must be cleaned to the point of coherence.

These conditions provide the opportunity of optimization the existing boot code. Specifically we can skip the extremely expensive clean & invalidating of the entire image on LXBOOT and QEMU virt, and futhermore limit further cache invalidation to the regions accessed before enabling the MMU.

Additionally to the above, optimize the existing cache and invalidate functionality by using a single barrier at the end of the operation.

@michpappas michpappas requested review from a team as code owners August 14, 2023 13:31
@michpappas michpappas marked this pull request as draft August 14, 2023 13:31
@michpappas
Copy link
Member Author

Marking as Draft to discuss PIE with @mogasergiu and get feedback from @kubanrob.

@michpappas michpappas requested review from kubanrob and mogasergiu and removed request for a team August 14, 2023 13:33
@razvand razvand added kind/enhancement New feature or request area/plat Unikraft Patform plat/kvm Unikraft for KVM kind/proposal lang/c Issues or PRs to do with C/C++ arch/arm64 labels Aug 14, 2023
@razvand razvand added this to the v0.14.0 (Prometheus) milestone Aug 14, 2023
* expensive to invalidate the whole cache. In this case, just
* just need to invalidate what we are going to use:
* DTB, TEXT, DATA, BSS, and bootstack.
#ifndef CONFIG_OPTIMIZE_PIE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does OPTIMIZE_PIE imply that the Linux Boot Protocol is used? At a first glance this seems unrelated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unrelated indeed (also notice this is an ifndef). I do not make an explicit check for CONFIG_KVM_BOOT_PROTO_LXBOOT but rely on the MMU heuristic to also cover for QEMU virt, and potentially other platforms that don't use the linux boot protocol but exhibit the same behavior. Tbh I'm between two minds about this. On the one hand implicit detection looks like a bad idea, on the other hand it allows more flexibility. If we want to be on the safe side I can make this conditional to BOOT_PROTO_LXBOOT OR BOOT_PROTO_QEMU_VIRT.
What do you think?

@michpappas michpappas force-pushed the arm64_rework_cache_invalidate branch from 324357d to 8fe3016 Compare August 15, 2023 09:20
@michpappas michpappas changed the title RFC plat/kvm/arm: Rework early cache clean & invalidate plat/kvm/arm: Rework early cache clean & invalidate Aug 15, 2023
@michpappas michpappas changed the title plat/kvm/arm: Rework early cache clean & invalidate RFC:plat/kvm/arm: Rework early cache clean & invalidate Aug 15, 2023
@michpappas
Copy link
Member Author

After discussing with @mogasergiu I am updating this to also have PIE rely on the bootloader to have the cache clean, but still unconditionally invalidate the entire image region as libukreloc won't invalidate individual cache lines right now.

@michpappas michpappas marked this pull request as ready for review August 15, 2023 14:18
@michpappas michpappas changed the title RFC:plat/kvm/arm: Rework early cache clean & invalidate plat/kvm/arm: Rework early cache clean & invalidate Aug 15, 2023
Copy link
Member

@mogasergiu mogasergiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome optimizations! 💯
A few minor comments.

plat/kvm/arm/entry64.S Show resolved Hide resolved
plat/kvm/arm/entry64.S Outdated Show resolved Hide resolved
plat/kvm/arm/entry64.S Outdated Show resolved Hide resolved
The arm64 linux boot protocol provides the requirements for
the system's state before jumping into the kernel [1]. Among
these it is required that upon entry:

- The MMU is off
- The D-cache for the region corresponging to the loaded image
  must be cleaned to the point of coherence.

Skip expensive cache clean & invalidate operations if the MMU is
found to be disabled at boot. Although this heuristic is not
strictly required it provides with some additional confidence
that the bootloader behaves as expected.

With a clean cache, additionally optmize cache invalidations by
limiting them to the regions accessed before the MMU is enabled.

[1] https://www.kernel.org/doc/Documentation/arm64/booting.txt

Signed-off-by: Michalis Pappas <michalis@unikraft.io>
To maximize boot performance we only invalidate the cache for regions
accessed before enabling the MMU. This makes the clean & invalidate
step of the entire image region after enabling the MMU redundant.

Signed-off-by: Michalis Pappas <michalis@unikraft.io>
The current implementation of clean and invalidate by region uses
a barrier after every cache line. This is unnecessary and expensive.
Use a barrier once, at the end of the operation.

Signed-off-by: Michalis Pappas <michalis@unikraft.io>
@michpappas michpappas force-pushed the arm64_rework_cache_invalidate branch from 8fe3016 to 74ce86b Compare August 15, 2023 15:16
@michpappas
Copy link
Member Author

Updated commit message, fixed conditional includes according to coding style based on comments from @mogasergiu. Replaced CONFIG_OPTIMIZE_PIE with CONFIG_LIBUKRELOC based on internal discussion with @skuenzer.

@michpappas
Copy link
Member Author

This has been successfully tested on QEMU both natively and emulated, and on Firecracker. @kubanrob besides the above open question, please let me know if you need to do any testing on your side, otherwise we're close to merge.

@mogasergiu
Copy link
Member

Updated commit message, fixed conditional includes according to coding style based on comments from @mogasergiu. Replaced CONFIG_OPTIMIZE_PIE with CONFIG_LIBUKRELOC based on internal discussion with @skuenzer.

Right, that too, forgot about it, thanks for keeping it in mind! 🙏

Copy link
Member

@mogasergiu mogasergiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very much appreciated that you took PIE into consideration. Thank you! 🛩️

Reviewed-by: Sergiu Moga sergiu@unikraft.io

@razvand razvand requested a review from kubanrob August 15, 2023 18:09
Copy link
Contributor

@razvand razvand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved-by: Razvan Deaconescu razvand@unikraft.io

unikraft-bot pushed a commit that referenced this pull request Aug 15, 2023
To maximize boot performance we only invalidate the cache for regions
accessed before enabling the MMU. This makes the clean & invalidate
step of the entire image region after enabling the MMU redundant.

Signed-off-by: Michalis Pappas <michalis@unikraft.io>
Reviewed-by: Sergiu Moga <sergiu@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
Tested-by: Unikraft CI <monkey@unikraft.io>
GitHub-Closes: #1049
unikraft-bot pushed a commit that referenced this pull request Aug 15, 2023
The current implementation of clean and invalidate by region uses
a barrier after every cache line. This is unnecessary and expensive.
Use a barrier once, at the end of the operation.

Signed-off-by: Michalis Pappas <michalis@unikraft.io>
Reviewed-by: Sergiu Moga <sergiu@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
Tested-by: Unikraft CI <monkey@unikraft.io>
GitHub-Closes: #1049
@unikraft-bot unikraft-bot added the ci/merged Merged by CI label Aug 15, 2023
@unikraft-bot
Copy link
Member

Checkpatch passed

Beep boop! I ran Unikraft's checkpatch.pl support script on your pull request and it all looks good!

SHA commit checkpatch
cc88672 plat/kvm: Do not unconditionally clean and invalidate the cache
caba225 plat/kvm/arm: Do not clean & invalidate the cache after enabling the MMU
74ce86b plat/common/arm: Do not use a barrier on every invalidation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch/arm64 area/plat Unikraft Patform ci/merged Merged by CI kind/enhancement New feature or request kind/proposal lang/c Issues or PRs to do with C/C++ plat/kvm Unikraft for KVM
Projects
Status: Done!
Development

Successfully merging this pull request may close these issues.

None yet

5 participants