Conversation
Authors the complete SKLD-bench v2.1 family for elixir-pattern-match-refactor per the workstream plan in taxonomy/elixir/SEEDING-PLAN.md. Seventh and FINAL family shipped this morning. The drafting subagent authored 120 challenges + 15 test fixtures + 12 golden references before hitting the Max subscription rate limit BEFORE the legendary tier was written at all. This commit completes the family with hand-authored family.json, seed.json, score.py, criteria.json, environment.yml, and _calibration.json. Pool stats: - 120 total challenges (rich curve target 150) - Tier distribution: 35 easy / 47 medium / 38 hard / 0 legendary - 10 capabilities + 1 foundation = 11 dimensions covered - 15 test fixtures, 12 golden references - 20 challenges held out (~17% balanced across tiers) Known gap: legendary tier has 0 challenges (target 30). The drafting agent completed easy/medium/hard tiers but was cut off before the legendary tier was started. One orphaned legendary golden reference file exists but has no corresponding challenge JSON. The family ships as-is because 120 challenges across easy/medium/hard already provide substantial evaluation coverage. Per-capability primary counts (rich target 12-16): - with-expressions: 17 - refactor-philosophy (foundation): 16 - defensive-nil-checks-elimination: 14 - enum-vs-recursion-choice: 13 - pipe-operator-flows: 12 - guard-clauses: 10 [below 12] - recursive-functions: 9 [below 12] - function-head-pattern-matching: 8 [below 12] - binary-pattern-matching-basic: 8 [below 12] - map-and-struct-destructuring: 7 [below 12] - cond-and-if-reduction: 6 [below 12] Score.py: regex-based structural scorer. Counts function heads per name (more = better — indicates multi-clause pattern matching), counts if/case/cond constructs (fewer = better), detects pipe usage, with expressions, defensive is_nil checks, Ruby-style `x && x.field` puns, intermediate temp vars breaking pipe flow. Score.py validation: - Sanity (golden easy-01, defensive-nil elimination): 0.9593 (above 0.9 target) - Discrimination (ruby_style fixture): 0.2703 (well below 0.7 pass) - Discrimination headroom: 0.69 This is the most-cited Elixir+Claude complaint (per research). The pool teaches Claude to write idiomatic Elixir by refactoring Ruby/Java-style imperative code into pattern-matched function heads, pipes, and with expressions. Tier methodology: heuristic per SEEDING-PLAN.md item 4. Research: 38 citations across 11 capabilities (see research.md). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ty13r
pushed a commit
that referenced
this pull request
Apr 13, 2026
Covers Phases 0-5 of PLAN-V2.1.3, the $53 API incident, Bible rewrite, and the 6-workstream frontend sprint (PR #36). Updates PROGRESS.md with frontend sprint completion entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SKLD-bench v2.1 challenge pool: elixir-pattern-match-refactor
Seventh and final family being shipped this morning (ecto-schema-changeset #9, ecto-query-writer #10, oban-worker #11, security-linter #12, ecto-sandbox-test #13, phoenix-liveview #14).
Partial-state recovery note
The overnight drafting subagent completed easy/medium/hard tiers + 15 fixtures + 12 goldens before the Max subscription rate limit cut it off before the legendary tier was started at all. This PR completes the family with hand-authored:
family.json(withknown_gapsdeclaration)seed.json— gen 0 SkillGenome with 11 starter variantsevaluation/score.py— refactor-quality scorerevaluation/criteria.jsonevaluation/environment.ymlchallenges/_calibration.json— generated post-hocPool stats
The legendary tier has 0 of 30 target challenges. Drafting was cut off before any legendary challenges were written.
One orphaned
golden/elixir-pattern-match-refactor-legendary-01.exfile exists (~75 lines of reference output) but no corresponding challenge JSON. Future augmentation can use it as a starting point.Impact: 120 challenges across easy/medium/hard still provide substantial coverage across 11 dimensions. Champion fitness curves won't have a meaningful "legendary" anchor until augmented.
Per-capability primary coverage
⭐ = highest-priority capabilities per the research dossier.
with-expressionsis the bridge between pattern matching and error handling;defensive-nil-checks-eliminationis the most-cited single complaint.6 capabilities are below the 12-per-cap rich target. All are covered across the remaining 3 tiers. Augmentation is a follow-up.
Score.py validation
easy-01defensive-nil elimination)ruby_style_user_servicefixture)Discrimination headroom: 0.69 — excellent.
Score.py approach
This family scores refactor quality via structural counting rather than fixed substring matches:
Positive signals (rewarded):
defwith different patterns)|>pipe operator usagewithexpressionswhenguard clauses%User{id: id})[h | t])<<"prefix", rest::binary>>)Enum.map/Enum.reduce/Enum.filterAnti-patterns (penalized):
if/case/condkeyword counts (total ≤2 for refactored output)is_nil()defensive guardsx && x.fieldRuby-style safe-nav punString.starts_with?/ends_with?instead of binary patternsResearch provenance
38 citations across 11 capabilities in
research.md. Key sources:case functioncall() do nil -> ... endinstead of idiomaticif var = functioncall() do"Tier methodology
Heuristic per SEEDING-PLAN.md item 4.
🤖 Generated with Claude Code