-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[BPF] scale trampoline stride in policy programs #10602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[BPF] scale trampoline stride in policy programs #10602
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adjusts the trampoline stride in policy programs to better handle long jumps when jit_harden is enabled, reducing the likelihood of jumps exceeding the allowed distance.
- Introduces an atomic field for caching the current trampoline stride and updates it dynamically when errors are encountered.
- Updates BPF policy program building and associated tests to use the new stride parameter.
- Adds error handling in the BPF system call code to detect potential JIT errors.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
felix/dataplane/linux/bpf_ep_mgr.go | Introduces an atomic trampoline stride field and adjusts policy program loading with retry logic. |
felix/bpf/polprog/pol_prog_builder.go | Adds support for passing and storing the trampoline stride to the builder. |
felix/bpf/bpf_syscall.go | Updates error handling for loading BPF programs with potential JIT errors. |
felix/bpf/asm/asm_test.go | Updates tests to use the new TrampolineStrideDefault constant. |
felix/bpf/asm/asm.go | Replaces the hard-coded interval with the new trampoline stride and adds a setter with validation. |
adc8169
to
992cd65
Compare
When jit_harden is enabled, some long jump may get over the allowed distance of +/-15bit when rewritten by the kernel. When this happens, we may retry and recompile the policy programs with the trampolines that make the jumps shorter. If that still fails we may retry with even more dense trampolines. We reduce teh stride by 1/4 with eveery retry to coverge reasonably fast, but not to oveshoot to much. Reductions by about 1/2 (2 iterrations) may be good enough for most programs. We remember the newly found stride for the given node so that we do not need to keep figuring out for other policy programs. It is likely that many policies would face the same issue. Note that the hardedning already increases the amount of executed instructions and adding some extra trampolines have probably much smaller effect than in a non-hardened program.
992cd65
to
720983c
Compare
LGTM to me. I feel it will be better to run the bpf UTs, FVs with jit_harden set to 2. WDYT? |
Description
Related issues/PRs
refs bottlerocket-os/bottlerocket#4567
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*
label.docs-pr-required
: This change requires a change to the documentation that has not been completed yet.docs-completed
: This change has all necessary documentation completed.docs-not-required
: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*
label.release-note-required
: This PR has user-facing changes. Most PRs should have this label.release-note-not-required
: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate
: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr
: This PR is related to install and requires a corresponding change to the operator.