seed(elixir-pattern-match-refactor): SKLD-bench v2.1 challenge pool (120 challenges) by ty13r · Pull Request #15 · ty13r/skillforge

ty13r · 2026-04-11T13:55:36Z

SKLD-bench v2.1 challenge pool: elixir-pattern-match-refactor

Seventh and final family being shipped this morning (ecto-schema-changeset #9, ecto-query-writer #10, oban-worker #11, security-linter #12, ecto-sandbox-test #13, phoenix-liveview #14).

Partial-state recovery note

The overnight drafting subagent completed easy/medium/hard tiers + 15 fixtures + 12 goldens before the Max subscription rate limit cut it off before the legendary tier was started at all. This PR completes the family with hand-authored:

family.json (with known_gaps declaration)
seed.json — gen 0 SkillGenome with 11 starter variants
evaluation/score.py — refactor-quality scorer
evaluation/criteria.json
evaluation/environment.yml
challenges/_calibration.json — generated post-hoc

Pool stats

Total challenges: 120 (rich curve target 150 — 80% of target)
Tier distribution: easy 35 / medium 47 / hard 38 / legendary 0 ⚠️
Held-out: 20 balanced across tiers
Capability coverage: 10 capabilities + 1 foundation = 11 dimensions

⚠️ Known gap: legendary tier empty

The legendary tier has 0 of 30 target challenges. Drafting was cut off before any legendary challenges were written.

One orphaned golden/elixir-pattern-match-refactor-legendary-01.ex file exists (~75 lines of reference output) but no corresponding challenge JSON. Future augmentation can use it as a starting point.

Impact: 120 challenges across easy/medium/hard still provide substantial coverage across 11 dimensions. Champion fitness curves won't have a meaningful "legendary" anchor until augmented.

Per-capability primary coverage

Capability	Primary	Notes
with-expressions ⭐	17	✅
refactor-philosophy (F)	16	✅
defensive-nil-checks-elimination ⭐	14	✅
enum-vs-recursion-choice	13	✅
pipe-operator-flows	12	✅
guard-clauses	10	below 12
recursive-functions	9	below 12
function-head-pattern-matching	8	below 12
binary-pattern-matching-basic	8	below 12
map-and-struct-destructuring	7	below 12
cond-and-if-reduction	6	below 12

⭐ = highest-priority capabilities per the research dossier. with-expressions is the bridge between pattern matching and error handling; defensive-nil-checks-elimination is the most-cited single complaint.

6 capabilities are below the 12-per-cap rich target. All are covered across the remaining 3 tiers. Augmentation is a follow-up.

Score.py validation

Check	Result	Target	Status
Sanity (golden `easy-01` defensive-nil elimination)	0.9593	≥0.9	✅
Discrimination (`ruby_style_user_service` fixture)	0.2703	<0.7 (fail)	✅

Discrimination headroom: 0.69 — excellent.

Score.py approach

This family scores refactor quality via structural counting rather than fixed substring matches:

Positive signals (rewarded):

Multi-clause function heads (same name, multiple def with different patterns)
|> pipe operator usage
with expressions
when guard clauses
Map/struct destructure in function heads (%User{id: id})
List head/tail patterns ([h | t])
Binary patterns (<<"prefix", rest::binary>>)
Enum.map / Enum.reduce / Enum.filter

Anti-patterns (penalized):

if / case / cond keyword counts (total ≤2 for refactored output)
is_nil() defensive guards
x && x.field Ruby-style safe-nav pun
Intermediate temp/tmp/result variables breaking pipe flow
String.starts_with? / ends_with? instead of binary patterns
Complex function calls in guards

Research provenance

38 citations across 11 capabilities in research.md. Key sources:

BoothIQ "150k lines of vibe-coded Elixir" post-mortem: "Claude writes Ruby-style Elixir — if/then/else chains, defensive nil-checking, early returns"
HN troupo: "writes Java even if it's Elixir"
HN dnautics: "case functioncall() do nil -> ... end instead of idiomatic if var = functioncall() do"
Elixir Forum Alex66: "Still correcting if/else chains that should be pattern matches"
Dashbit, José Valim on idiomatic Elixir

Tier methodology

Heuristic per SEEDING-PLAN.md item 4.

🤖 Generated with Claude Code

Authors the complete SKLD-bench v2.1 family for elixir-pattern-match-refactor per the workstream plan in taxonomy/elixir/SEEDING-PLAN.md. Seventh and FINAL family shipped this morning. The drafting subagent authored 120 challenges + 15 test fixtures + 12 golden references before hitting the Max subscription rate limit BEFORE the legendary tier was written at all. This commit completes the family with hand-authored family.json, seed.json, score.py, criteria.json, environment.yml, and _calibration.json. Pool stats: - 120 total challenges (rich curve target 150) - Tier distribution: 35 easy / 47 medium / 38 hard / 0 legendary - 10 capabilities + 1 foundation = 11 dimensions covered - 15 test fixtures, 12 golden references - 20 challenges held out (~17% balanced across tiers) Known gap: legendary tier has 0 challenges (target 30). The drafting agent completed easy/medium/hard tiers but was cut off before the legendary tier was started. One orphaned legendary golden reference file exists but has no corresponding challenge JSON. The family ships as-is because 120 challenges across easy/medium/hard already provide substantial evaluation coverage. Per-capability primary counts (rich target 12-16): - with-expressions: 17 - refactor-philosophy (foundation): 16 - defensive-nil-checks-elimination: 14 - enum-vs-recursion-choice: 13 - pipe-operator-flows: 12 - guard-clauses: 10 [below 12] - recursive-functions: 9 [below 12] - function-head-pattern-matching: 8 [below 12] - binary-pattern-matching-basic: 8 [below 12] - map-and-struct-destructuring: 7 [below 12] - cond-and-if-reduction: 6 [below 12] Score.py: regex-based structural scorer. Counts function heads per name (more = better — indicates multi-clause pattern matching), counts if/case/cond constructs (fewer = better), detects pipe usage, with expressions, defensive is_nil checks, Ruby-style `x && x.field` puns, intermediate temp vars breaking pipe flow. Score.py validation: - Sanity (golden easy-01, defensive-nil elimination): 0.9593 (above 0.9 target) - Discrimination (ruby_style fixture): 0.2703 (well below 0.7 pass) - Discrimination headroom: 0.69 This is the most-cited Elixir+Claude complaint (per research). The pool teaches Claude to write idiomatic Elixir by refactoring Ruby/Java-style imperative code into pattern-matched function heads, pipes, and with expressions. Tier methodology: heuristic per SEEDING-PLAN.md item 4. Research: 38 citations across 11 capabilities (see research.md). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Covers Phases 0-5 of PLAN-V2.1.3, the $53 API incident, Bible rewrite, and the 6-workstream frontend sprint (PR #36). Updates PROGRESS.md with frontend sprint completion entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ty13r merged commit 95ec4ae into main Apr 11, 2026

ty13r deleted the seed/elixir-pattern-match-refactor branch April 11, 2026 13:55

ty13r mentioned this pull request Apr 11, 2026

fix(elixir-seeds): audit gaps across 5 families (goldens, fixtures, discrimination) #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seed(elixir-pattern-match-refactor): SKLD-bench v2.1 challenge pool (120 challenges)#15

seed(elixir-pattern-match-refactor): SKLD-bench v2.1 challenge pool (120 challenges)#15
ty13r merged 1 commit intomainfrom
seed/elixir-pattern-match-refactor

ty13r commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ty13r commented Apr 11, 2026

SKLD-bench v2.1 challenge pool: elixir-pattern-match-refactor

Partial-state recovery note

Pool stats

⚠️ Known gap: legendary tier empty

Per-capability primary coverage

Score.py validation

Score.py approach

Research provenance

Tier methodology

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant