Skip to content

plan9asm: infer target features for ISA-specific asm functions#8

Merged
cpunion merged 7 commits into
xgo-dev:mainfrom
cpunion:fix/x86-target-feature-attrs
Mar 7, 2026
Merged

plan9asm: infer target features for ISA-specific asm functions#8
cpunion merged 7 commits into
xgo-dev:mainfrom
cpunion:fix/x86-target-feature-attrs

Conversation

@cpunion
Copy link
Copy Markdown
Collaborator

@cpunion cpunion commented Mar 7, 2026

Summary

  • infer function-level target-features from ISA-specific asm instructions instead of emitting unused static attribute groups
  • cover x86 CRC32, PCLMULQDQ, SSSE3 PSHUFB, AESNI, and arm64 CRC instructions
  • add a stdlib hash/crc32 amd64 compile regression test

Why

hash/crc32/crc32_amd64.s was translated without attaching target-features to functions using llvm.x86.sse42.crc32.* / llvm.x86.pclmulqdq. On Ubuntu x86_64 with LLVM 19 this led to backend selection failures during llgo CI.

Validation

  • go test ./...
  • validated from llgo in Linux amd64 docker with this plan9asm tree wired in:
    • llgo test -timeout=20m github.com/goplus/llgo/test/std/archive/tar
    • llgo test -timeout=20m -tags testGC github.com/goplus/llgo/test/std/hash/crc32

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors how target-features are handled for ISA-specific assembly functions during LLVM IR generation. Instead of relying on predefined static attribute groups, the system now dynamically infers the necessary target-features directly from the assembly instructions within each function. This change resolves backend selection failures encountered in llgo CI for hash/crc32_amd64.s and ensures that the generated LLVM IR correctly specifies required CPU features, improving compatibility and robustness across different LLVM versions and target architectures.

Highlights

  • Dynamic Target Feature Inference: Implemented dynamic inference of target-features for ISA-specific assembly functions based on the instructions they contain, replacing static attribute groups.
  • Expanded Feature Coverage: Extended feature detection to cover x86 CRC32, PCLMULQDQ, SSSE3 PSHUFB, AESNI, and ARM64 CRC instructions.
  • Attribute Group Management: Introduced a new featureAttrRegistry to efficiently manage and emit unique target-features attribute groups as needed.
  • Regression Test Added: Added a new regression test for hash/crc32 on AMD64 to validate the compilation of standard library assembly with the newly inferred target features.
  • Removed Static Attributes: Removed hardcoded static target-features attribute groups from AMD64 and ARM64 translation logic, streamlining the code.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • amd64_translate.go
    • Removed hardcoded static attribute groups for SSE4.2 CRC32 and PCLMULQDQ + SSE4.1.
  • arm64_translate.go
    • Removed the hardcoded static attribute group for ARM64 CRC features.
  • feature_attrs.go
    • Added a new file defining featureAttrRegistry to manage unique target-features attributes.
    • Added inferFuncTargetFeatures to detect required ISA features from assembly instructions (e.g., CRC32, PCLMULQDQ, PSHUFB, AES instructions for AMD64, and CRC for ARM64).
  • stdlib_crc32_amd64_test.go
    • Added a new test case to compile hash/crc32_amd64.s using the new translation logic.
    • Verified that the generated LLVM IR contains the expected target-features attributes.
  • translate.go
    • Integrated the featureAttrRegistry and inferFuncTargetFeatures to dynamically assign target-features to functions based on their content.
    • Replaced the previous static approach for applying target-features.
Activity
  • The pull request was created by cpunion.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 7, 2026

Codecov Report

❌ Patch coverage is 51.79856% with 402 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@a5f1e3b). Learn more about missing BASE report.

Files with missing lines Patch % Lines
amd64_lower_vec.go 48.14% 194 Missing and 72 partials ⚠️
cmd/plan9asmscan/main.go 39.31% 68 Missing and 3 partials ⚠️
amd64_ctx.go 75.60% 10 Missing and 10 partials ⚠️
amd64_lower_fp.go 17.39% 18 Missing and 1 partial ⚠️
amd64_lower_branch.go 28.57% 9 Missing and 1 partial ⚠️
translate.go 44.44% 5 Missing ⚠️
arm64_lower_arith.go 50.00% 4 Missing ⚠️
translate_module_direct.go 57.14% 2 Missing and 1 partial ⚠️
amd64_lower_mov.go 60.00% 1 Missing and 1 partial ⚠️
feature_attrs.go 96.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main       #8   +/-   ##
=======================================
  Coverage        ?   32.44%           
=======================================
  Files           ?       41           
  Lines           ?    12640           
  Branches        ?        0           
=======================================
  Hits            ?     4101           
  Misses          ?     7806           
  Partials        ?      733           
Flag Coverage Δ
unittests 32.44% <51.79%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to dynamically infer required CPU features from assembly instructions, which is a significant improvement over hardcoding them. The changes are well-structured, and the inclusion of a regression test is commendable. I have one suggestion to improve the implementation of the feature set for better performance and code clarity.

Comment thread feature_attrs.go
Comment on lines +43 to +89
func inferFuncTargetFeatures(arch Arch, fn Func) string {
var featureSet []string
add := func(features ...string) {
for _, feature := range features {
if feature == "" {
continue
}
exists := false
for _, v := range featureSet {
if v == feature {
exists = true
break
}
}
if !exists {
featureSet = append(featureSet, feature)
}
}
}

for _, ins := range fn.Instrs {
op := strings.ToUpper(string(ins.Op))
switch arch {
case ArchAMD64:
switch {
case strings.HasPrefix(op, "CRC32"):
add("+crc32", "+sse4.2")
case op == "PCLMULQDQ":
add("+pclmul", "+sse4.1")
case op == "PSHUFB" || op == "VPSHUFB":
add("+ssse3")
case op == "AESENC" || op == "AESENCLAST" || op == "AESDEC" || op == "AESDECLAST" || op == "AESIMC" || op == "AESKEYGENASSIST":
add("+aes")
}
case ArchARM64:
if strings.HasPrefix(op, "CRC32") {
add("+crc")
}
}
}

if len(featureSet) == 0 {
return ""
}
sort.Strings(featureSet)
return strings.Join(featureSet, ",")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of inferFuncTargetFeatures uses a slice to store unique features, which leads to an inefficient O(n) lookup for each feature addition. A more idiomatic and performant approach in Go for handling sets of unique items is to use a map[string]struct{}. This improves the feature addition to an average of O(1) time complexity and makes the code cleaner.

func inferFuncTargetFeatures(arch Arch, fn Func) string {
	featureSet := make(map[string]struct{})
	add := func(features ...string) {
		for _, feature := range features {
			if feature != "" {
				featureSet[feature] = struct{}{}
			}
		}
	}

	for _, ins := range fn.Instrs {
		op := strings.ToUpper(string(ins.Op))
		switch arch {
		case ArchAMD64:
			switch {
			case strings.HasPrefix(op, "CRC32"):
				add("+crc32", "+sse4.2")
			case op == "PCLMULQDQ":
				add("+pclmul", "+sse4.1")
			case op == "PSHUFB" || op == "VPSHUFB":
				add("+ssse3")
			case op == "AESENC" || op == "AESENCLAST" || op == "AESDEC" || op == "AESDECLAST" || op == "AESIMC" || op == "AESKEYGENASSIST":
				add("+aes")
			}
		case ArchARM64:
			if strings.HasPrefix(op, "CRC32") {
				add("+crc")
			}
		}
	}

	if len(featureSet) == 0 {
		return ""
	}
	features := make([]string, 0, len(featureSet))
	for feature := range featureSet {
		features = append(features, feature)
	}
	sort.Strings(features)
	return strings.Join(features, ",")
}

Comment thread arm64_translate.go
@@ -27,7 +27,7 @@ func emitARM64Prelude(b *strings.Builder) {
b.WriteString("\n")
// Attribute group used by some functions to enable optional ISA features.
// (Example: "+crc" for hash/crc32 arm64 fast paths.)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on lines 28–29 describes behavior that was removed by this PR. attributes #0 is no longer emitted here — the comment now floats above a bare b.WriteString("\n") and is misleading. Consider removing it.

Comment thread feature_attrs.go
"strings"
)

const featureAttrBase = 200
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

featureAttrBase = 200 is a magic number with no rationale documented. The intent is presumably to avoid collisions with manually-assigned attribute group numbers (#0, #1, etc.), but there's no comment, and no enforcement. If a caller passes a sig.Attrs like "#200", LLVM will silently see two definitions of attributes #200 in the module (which is invalid IR). A short comment explaining the choice would help future contributors.

Comment thread feature_attrs.go
add("+pclmul", "+sse4.1")
case op == "PSHUFB" || op == "VPSHUFB":
add("+ssse3")
case op == "AESENC" || op == "AESENCLAST" || op == "AESDEC" || op == "AESDECLAST" || op == "AESIMC" || op == "AESKEYGENASSIST":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHA-NI instructions (SHA256MSG1, SHA256MSG2, SHA256RNDS2) appear to be handled in the lowering layer but are not covered here — a function using only SHA instructions would receive no target-features attribute, likely causing LLVM backend failures. Either add the +sha case or add a comment marking this as a known gap.

Comment thread feature_attrs.go
}
}

for _, ins := range fn.Instrs {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strings.ToUpper(string(ins.Op)) allocates two strings per instruction (type conversion + ToUpper). Since this is called for every instruction in every function, normalising Op to uppercase at parse time (or storing it normalised) would eliminate this per-instruction allocation cost.

Comment thread translate.go
if sig.Ret == "" {
return "", fmt.Errorf("missing return type for %q", name)
}
if sig.Attrs == "" {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guard sig.Attrs == "" means there's no way for a caller to explicitly opt out of feature inference while keeping an empty Attrs. An empty string from the caller and "no manual override" are indistinguishable. If a future caller wants a function with no attribute group, it can't express that. A sentinel value or an explicit DisableFeatureInference bool field on FuncSig would make the intent unambiguous.

}
if !strings.Contains(ll, `"target-features"="+pclmul,+sse4.1"`) {
t.Fatalf("missing pclmul target-features attr:\n%s", ll)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test checks that the expected feature strings appear somewhere in the generated IR, but not that they're attached to the correct functions. A regression where both attribute sets are applied to every function would still pass. Consider asserting the per-function define line contains the expected #NNN reference, matching against what attrRegistry emits for each function.

@fennoai
Copy link
Copy Markdown

fennoai Bot commented Mar 7, 2026

Good approach — moving from static prelude attributes to per-function inference eliminates spurious attribute groups and fixes real CI failures. The core logic in feature_attrs.go is clean and correct for the covered cases. A few items worth addressing: SHA-NI instructions appear to be lowered but aren't covered by the inference (would cause LLVM failures on SHA-heavy code); the stale comment in arm64_translate.go should be removed; and the test asserts feature strings exist in the IR but not that they're bound to the right functions.

@cpunion cpunion merged commit 822503d into xgo-dev:main Mar 7, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant