Skip to content

[BPF] scale trampoline stride in policy programs #10602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tomastigera
Copy link
Contributor

@tomastigera tomastigera commented Jun 25, 2025

Description

When jit_harden is enabled, some long jump may get over the allowed
distance of +/-15bit when rewritten by the kernel. When this happens, we
may retry and recompile the policy programs with the trampolines that
make the jumps shorter. If that still fails we may retry with even more
dense trampolines.

We remember the newly found stride for the given node so that we do not
need to keep figuring out for other policy programs. It is likely that
many policies would face the same issue.

Note that the hardedning already increases the amount of executed
instructions and adding some extra trampolines have probably much
smaller effect than in a non-hardened program.

Related issues/PRs

refs bottlerocket-os/bottlerocket#4567

Todos

  • Tests
  • Documentation
  • Release note

Release Note

ebpf: Fix large policy programs in case jit_harden is set, e.g. like in Bottlerocket

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

@Copilot Copilot AI review requested due to automatic review settings June 25, 2025 21:56
@tomastigera tomastigera requested a review from a team as a code owner June 25, 2025 21:56
@marvin-tigera marvin-tigera added this to the Calico v3.31.0 milestone Jun 25, 2025
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Jun 25, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adjusts the trampoline stride in policy programs to better handle long jumps when jit_harden is enabled, reducing the likelihood of jumps exceeding the allowed distance.

  • Introduces an atomic field for caching the current trampoline stride and updates it dynamically when errors are encountered.
  • Updates BPF policy program building and associated tests to use the new stride parameter.
  • Adds error handling in the BPF system call code to detect potential JIT errors.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
felix/dataplane/linux/bpf_ep_mgr.go Introduces an atomic trampoline stride field and adjusts policy program loading with retry logic.
felix/bpf/polprog/pol_prog_builder.go Adds support for passing and storing the trampoline stride to the builder.
felix/bpf/bpf_syscall.go Updates error handling for loading BPF programs with potential JIT errors.
felix/bpf/asm/asm_test.go Updates tests to use the new TrampolineStrideDefault constant.
felix/bpf/asm/asm.go Replaces the hard-coded interval with the new trampoline stride and adds a setter with validation.

@tomastigera tomastigera added docs-not-required Docs not required for this change and removed docs-pr-required Change is not yet documented labels Jun 25, 2025
@tomastigera tomastigera force-pushed the tomas-bpf-harden-fix branch from adc8169 to 992cd65 Compare June 26, 2025 21:34
When jit_harden is enabled, some long jump may get over the allowed
distance of +/-15bit when rewritten by the kernel. When this happens, we
may retry and recompile the policy programs with the trampolines that
make the jumps shorter. If that still fails we may retry with even more
dense trampolines.

We reduce teh stride by 1/4 with eveery retry to coverge reasonably
fast, but not to oveshoot to much. Reductions by about 1/2 (2
iterrations) may be good enough for most programs.

We remember the newly found stride for the given node so that we do not
need to keep figuring out for other policy programs. It is likely that
many policies would face the same issue.

Note that the hardedning already increases the amount of executed
instructions and adding some extra trampolines have probably much
smaller effect than in a non-hardened program.
@tomastigera tomastigera force-pushed the tomas-bpf-harden-fix branch from 992cd65 to 720983c Compare June 27, 2025 17:13
@sridhartigera
Copy link
Member

LGTM to me. I feel it will be better to run the bpf UTs, FVs with jit_harden set to 2. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required Docs not required for this change release-note-required Change has user-facing impact (no matter how small)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants