Skip to content

SPMI: Handle BRANCH26 relocs similar to REL32 relocs #116340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 6, 2025

Conversation

jakobbotsch
Copy link
Member

BRANCH26 relocs were handled by seeing if we can fit a relative offset to the actual target in the instruction, and if so, doing that. Otherwise we would hardcode a jump to the end of the code.

It turns out that outside macOS we never hit the former case, so we always hardcode a jump to the end of the code section. That makes diffing work outside macOS.

On macOS, it happens rarely that the target (that comes from the original SPMI collection) is actually within range of the code that was allocated during SPMI replay. When that happens we end up with a two different immediates provided to the instruction by the base and the diff compilers, and hence diffing fails.

Now, NearDiffer::compareOffsets actually has code that tries to compensate for relative offsets like this, by seeing if the absolute offset computed by "address + instrLen + immediate" is the same for the base and the diff compilers. However, it turns out that LLVM does not scale the immediate in the b/bl instruction representation. So for the above to work, it would actually need to be something like instrLen + immediate << 2. We could do that, but it seems better to align this with x64, which just skips all of the troubles by hardcoding the lower bits of the absolute offsets as if it was the relative offset.

BRANCH26 relocs were handled by seeing if we can fit a relative offset
to the actual target in the instruction, and if so, doing that.
Otherwise we would hardcode a jump to the end of the code.

It turns out that outside macOS we never hit the former case, so we
always hardcode a jump to the end of the code section. That makes
diffing work outside macOS.

On macOS, it happens rarely that the target (that comes from the
original SPMI collection) is actually within range of the code that was
allocated during SPMI replay. When that happens we end up with a two
different immediates provided to the instruction by the base and the
diff compilers, and hence diffing fails.

Now, `NearDiffer::compareOffsets` actually has code that tries to
compensate for relative offsets like this, by seeing if the absolute
offset computed by "address + instrLen + immediate" is the same for the
base and the diff compilers. However, it turns out that LLVM does not
scale the immediate in the b/bl instruction representation. So for the
above to work, it would actually need to be something like `instrLen +
immediate << 2`. We could do that, but it seems better to align this
with x64, which just skips all of the troubles by hardcoding the lower
bits of the absolute offsets as if it was the relative offset.
@Copilot Copilot AI review requested due to automatic review settings June 5, 2025 15:10
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jun 5, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adjusts ARM64 BRANCH26 relocation handling to mimic x64’s IMAGE_REL_BASED_REL32 strategy by embedding the target’s lower bits directly, removing previous range checks and jump-stub fallback.

  • Removed delta calculation and FitsInRel28 check
  • Always call PutArm64Rel28 with the absolute target value
  • Added comment aligning behavior with x64 rel32 handling
Comments suppressed due to low confidence (1)

src/coreclr/tools/superpmi/superpmi-shared/compileresult.cpp:860

  • Passing tmp.target (an absolute address) directly to PutArm64Rel28 will embed the wrong immediate. You need to derive the 28-bit relative displacement or mask the lower bits of tmp.target (e.g., tmp.target & Rel28Mask) so the instruction encodes the correct branch offset.
PutArm64Rel28((UINT32*)address, (INT32)tmp.target);

@jakobbotsch jakobbotsch added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jun 5, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @kunalspathak

There's a superpmi-diffs run over in #116299 that now shows no diffs (it was showing spurious diffs without this change).

@jakobbotsch jakobbotsch requested a review from kunalspathak June 5, 2025 16:03
@kunalspathak
Copy link
Member

do you know when/why it broke?

@kunalspathak
Copy link
Member

Ah, we found out because we have (will) start running differ natively on osx, i assume.

@jakobbotsch
Copy link
Member Author

Ah, we found out because we have (will) start running differ natively on osx, i assume.

Yes, I expect this has always been causing spurious diffs when running natively on macOS.

@jakobbotsch
Copy link
Member Author

@kunalspathak can you approve?

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak kunalspathak merged commit 3c77442 into dotnet:main Jun 6, 2025
97 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants