Skip to content

v1.2.0

Choose a tag to compare

@thettwe thettwe released this 04 Apr 16:54
· 250 commits to develop since this release
9afe2c9

v1.2.0

New Features

  • Statistical confusable gate — bigram ratio detection at priority 24 (within fast-path cutoff), achieving 78% confusable detection rate with zero ML models
  • MLP classifier for confusable/compound detection (F1=0.900) with ONNX export
  • CMS multi-signal scoring for confusable detection (ngram + collocation thresholds)
  • Neural reranker v2 — 3-gate reranking with MLM logit wiring and 19-feature MLP
  • Mandatory compound expansion from 63 → 3,315 via template mining
  • Title/suffix compound detection layer
  • POS-based V+particle detection for broken compounds
  • 28 confusable pair benchmark sentences added

Improvements

  • Error budget relaxed from per-sentence skip to heavy-error-only guard
  • CMS threshold reduction extended to curated confusable pairs
  • Dual MLP/LightGBM training pipelines with configurable MLM logit wiring
  • Benchmark confidence gap flag for analysis
  • Pipeline hardening to reduce FPR on expanded benchmark
  • 34 code review issues resolved across 24 files

Stats

  • Benchmark: 1,138 sentences (28 confusable)
  • Confusable detection: 18/23 (78%)
  • Overall TP: 506 (+38 from v1.1.0)
  • 4,758 tests, 75% coverage