plan9asm: infer target features for ISA-specific asm functions#8
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors how Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8 +/- ##
=======================================
Coverage ? 32.44%
=======================================
Files ? 41
Lines ? 12640
Branches ? 0
=======================================
Hits ? 4101
Misses ? 7806
Partials ? 733
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to dynamically infer required CPU features from assembly instructions, which is a significant improvement over hardcoding them. The changes are well-structured, and the inclusion of a regression test is commendable. I have one suggestion to improve the implementation of the feature set for better performance and code clarity.
| func inferFuncTargetFeatures(arch Arch, fn Func) string { | ||
| var featureSet []string | ||
| add := func(features ...string) { | ||
| for _, feature := range features { | ||
| if feature == "" { | ||
| continue | ||
| } | ||
| exists := false | ||
| for _, v := range featureSet { | ||
| if v == feature { | ||
| exists = true | ||
| break | ||
| } | ||
| } | ||
| if !exists { | ||
| featureSet = append(featureSet, feature) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| for _, ins := range fn.Instrs { | ||
| op := strings.ToUpper(string(ins.Op)) | ||
| switch arch { | ||
| case ArchAMD64: | ||
| switch { | ||
| case strings.HasPrefix(op, "CRC32"): | ||
| add("+crc32", "+sse4.2") | ||
| case op == "PCLMULQDQ": | ||
| add("+pclmul", "+sse4.1") | ||
| case op == "PSHUFB" || op == "VPSHUFB": | ||
| add("+ssse3") | ||
| case op == "AESENC" || op == "AESENCLAST" || op == "AESDEC" || op == "AESDECLAST" || op == "AESIMC" || op == "AESKEYGENASSIST": | ||
| add("+aes") | ||
| } | ||
| case ArchARM64: | ||
| if strings.HasPrefix(op, "CRC32") { | ||
| add("+crc") | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if len(featureSet) == 0 { | ||
| return "" | ||
| } | ||
| sort.Strings(featureSet) | ||
| return strings.Join(featureSet, ",") | ||
| } |
There was a problem hiding this comment.
The current implementation of inferFuncTargetFeatures uses a slice to store unique features, which leads to an inefficient O(n) lookup for each feature addition. A more idiomatic and performant approach in Go for handling sets of unique items is to use a map[string]struct{}. This improves the feature addition to an average of O(1) time complexity and makes the code cleaner.
func inferFuncTargetFeatures(arch Arch, fn Func) string {
featureSet := make(map[string]struct{})
add := func(features ...string) {
for _, feature := range features {
if feature != "" {
featureSet[feature] = struct{}{}
}
}
}
for _, ins := range fn.Instrs {
op := strings.ToUpper(string(ins.Op))
switch arch {
case ArchAMD64:
switch {
case strings.HasPrefix(op, "CRC32"):
add("+crc32", "+sse4.2")
case op == "PCLMULQDQ":
add("+pclmul", "+sse4.1")
case op == "PSHUFB" || op == "VPSHUFB":
add("+ssse3")
case op == "AESENC" || op == "AESENCLAST" || op == "AESDEC" || op == "AESDECLAST" || op == "AESIMC" || op == "AESKEYGENASSIST":
add("+aes")
}
case ArchARM64:
if strings.HasPrefix(op, "CRC32") {
add("+crc")
}
}
}
if len(featureSet) == 0 {
return ""
}
features := make([]string, 0, len(featureSet))
for feature := range featureSet {
features = append(features, feature)
}
sort.Strings(features)
return strings.Join(features, ",")
}| @@ -27,7 +27,7 @@ func emitARM64Prelude(b *strings.Builder) { | |||
| b.WriteString("\n") | |||
| // Attribute group used by some functions to enable optional ISA features. | |||
| // (Example: "+crc" for hash/crc32 arm64 fast paths.) | |||
There was a problem hiding this comment.
The comment on lines 28–29 describes behavior that was removed by this PR. attributes #0 is no longer emitted here — the comment now floats above a bare b.WriteString("\n") and is misleading. Consider removing it.
| "strings" | ||
| ) | ||
|
|
||
| const featureAttrBase = 200 |
There was a problem hiding this comment.
featureAttrBase = 200 is a magic number with no rationale documented. The intent is presumably to avoid collisions with manually-assigned attribute group numbers (#0, #1, etc.), but there's no comment, and no enforcement. If a caller passes a sig.Attrs like "#200", LLVM will silently see two definitions of attributes #200 in the module (which is invalid IR). A short comment explaining the choice would help future contributors.
| add("+pclmul", "+sse4.1") | ||
| case op == "PSHUFB" || op == "VPSHUFB": | ||
| add("+ssse3") | ||
| case op == "AESENC" || op == "AESENCLAST" || op == "AESDEC" || op == "AESDECLAST" || op == "AESIMC" || op == "AESKEYGENASSIST": |
There was a problem hiding this comment.
SHA-NI instructions (SHA256MSG1, SHA256MSG2, SHA256RNDS2) appear to be handled in the lowering layer but are not covered here — a function using only SHA instructions would receive no target-features attribute, likely causing LLVM backend failures. Either add the +sha case or add a comment marking this as a known gap.
| } | ||
| } | ||
|
|
||
| for _, ins := range fn.Instrs { |
There was a problem hiding this comment.
strings.ToUpper(string(ins.Op)) allocates two strings per instruction (type conversion + ToUpper). Since this is called for every instruction in every function, normalising Op to uppercase at parse time (or storing it normalised) would eliminate this per-instruction allocation cost.
| if sig.Ret == "" { | ||
| return "", fmt.Errorf("missing return type for %q", name) | ||
| } | ||
| if sig.Attrs == "" { |
There was a problem hiding this comment.
The guard sig.Attrs == "" means there's no way for a caller to explicitly opt out of feature inference while keeping an empty Attrs. An empty string from the caller and "no manual override" are indistinguishable. If a future caller wants a function with no attribute group, it can't express that. A sentinel value or an explicit DisableFeatureInference bool field on FuncSig would make the intent unambiguous.
| } | ||
| if !strings.Contains(ll, `"target-features"="+pclmul,+sse4.1"`) { | ||
| t.Fatalf("missing pclmul target-features attr:\n%s", ll) | ||
| } |
There was a problem hiding this comment.
The test checks that the expected feature strings appear somewhere in the generated IR, but not that they're attached to the correct functions. A regression where both attribute sets are applied to every function would still pass. Consider asserting the per-function define line contains the expected #NNN reference, matching against what attrRegistry emits for each function.
|
Good approach — moving from static prelude attributes to per-function inference eliminates spurious attribute groups and fixes real CI failures. The core logic in |
Summary
target-featuresfrom ISA-specific asm instructions instead of emitting unused static attribute groupshash/crc32amd64 compile regression testWhy
hash/crc32/crc32_amd64.swas translated without attachingtarget-featuresto functions usingllvm.x86.sse42.crc32.*/llvm.x86.pclmulqdq. On Ubuntu x86_64 with LLVM 19 this led to backend selection failures during llgo CI.Validation
go test ./...llgoin Linux amd64 docker with this plan9asm tree wired in:llgo test -timeout=20m github.com/goplus/llgo/test/std/archive/tarllgo test -timeout=20m -tags testGC github.com/goplus/llgo/test/std/hash/crc32