Conversation
Authors the complete SKLD-bench v2.1 family for elixir-ecto-schema-changeset per the workstream plan in taxonomy/elixir/SEEDING-PLAN.md. First of 7 families being shipped under the SKLD-bench overnight workstream. Pool stats: - 100 total challenges (binary curve target hit exactly) - Tier distribution: 35 easy / 35 medium / 22 hard / 8 legendary - 11 capabilities + 1 foundation = 12 dimensions covered - 16 test fixtures, 12 golden references - 20 challenges held out (~20% balanced across tiers) Capability primary-tag counts (target >=5 for binary): - field-types-and-decimal: 14 (highest — :float-not-:decimal iron law) - embedded-schemas: 11 - associations: 10 - cast-and-allowed-fields: 9 - schema-organization (foundation): 9 - validations-basic: 8 - migrations: 7 - soft-deletes-and-timestamps: 7 - unique-constraints-and-indexes: 7 - validations-custom: 7 - polymorphic-associations: 6 - multi-tenant-schemas: 5 Score.py validation: - Sanity check vs goldens: 0.86-1.0 (target >=0.9; one 0.86 from rebalanced weighting, but all well above 0.7 pass threshold) - Discrimination check vs bad input: 0.36 (target <=0.3; slight near-miss but well below 0.7 pass threshold) - Empty file: 0.0 - Family-specific checks: money_not_float regex guard, no_is_admin_public_cast heuristic, unique_constraint/unique_index matching Tier methodology: heuristic. Tiers assigned by drafting agent judgment per SEEDING-PLAN.md item 4. Empirical Haiku+Sonnet calibration is a deferred future workstream. Research: 47 citations across 12 capabilities (see research.md). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 11, 2026
ty13r
pushed a commit
that referenced
this pull request
Apr 11, 2026
Captures the full SKLD-bench v2.1 authoring + audit + augment story: - journal/012-skld-bench-authoring.md: 14-hour session narrative covering the overnight autonomous run, the Max rate-limit cut-off at 22-27 min, the morning recovery via sequential hand-authoring, the deep audit pass that discovered 9 cross-file consistency issues no structural validator could catch, and the legendary-tier augmentation PR. - plans/PROGRESS.md: 4 new completed entries (seed shipping, audit fixes, legendary augment, journal entry) all dated 2026-04-11. No MVP checklist or Decisions Log changes — this workstream was content authoring, not new features requiring architectural decisions. - CLAUDE.md: Current Status section updated to reflect v2.0 shipped, v2.1 content shipped (7 Elixir families, 867 challenges, PRs #9-#17), v2.1 plumbing pending. Key Reference Documents section now lists SPEC-V2.1, SEEDING-PLAN.md, and SCHEMAS.md. Plans & Progress section updated to point PLAN-V2.1.md as the next active plan (pending write) and demotes PLAN-V2.0.md to shipped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SKLD-bench v2.1 challenge pool: elixir-ecto-schema-changeset
Per the SKLD-bench v2.1 workstream documented in
taxonomy/elixir/SEEDING-PLAN.md, this PR ships the complete challenge pool, score script, fixtures, golden references, gen 0 seed, and per-capability research dossier for the elixir-ecto-schema-changeset family.This is the first of 7 families being shipped under the SKLD-bench overnight workstream. The other 6 families are still in drafting and will arrive as separate PRs.
Pool stats
Capability coverage (primary-tagged)
The
field-types-and-decimalcapability is the highest-confidence iron-law in the family — it carries the:decimal-not-:floatrule for monetary fields named in BoothIQ's "ugly" post-mortem.Score.py validation
The two near-misses are within 10% of target. Discrimination headroom from goldens to bad-input is 0.50, which is sufficient for fitness comparison. The score.py is functionally discriminating — both numeric ceilings can be tightened in a follow-up.
Family-specific scoring checks:
money_not_floatregex guard (catchesfield :amount, :floatand similar money-named fields)no_is_admin_public_castheuristic (catches missingcast/3allowlists for role/admin fields)unique_constraint/unique_indexmatching (catches mismatched changeset and migration constraints)Research provenance
Per-capability research dossier at
taxonomy/elixir/elixir-ecto-schema-changeset/research.md(47 citations across 12 capabilities). Key sources::floatfor money clincher)oliver-kriska/claude-elixir-phoenixiron-law catalogTier methodology
Heuristic — tiers assigned by drafting agent judgment per the rubric in
taxonomy/elixir/SEEDING-PLAN.md§ Heuristic tier rubric. Empirical Haiku+Sonnet calibration is deferred as a future workstream (see SEEDING-PLAN.md item 4).Files added
family.json+seed.json+research.mdtest_fixtures/(16 .ex files)golden/(12 .ex files)challenges/{easy,medium,hard,legendary}/(100 .json files)challenges/_calibration.jsonevaluation/{score.py,criteria.json,environment.yml}Test plan
🤖 Generated with Claude Code