Skip to content

v1.3.0

Choose a tag to compare

@thettwe thettwe released this 07 Apr 23:52
· 215 commits to develop since this release
5b24ef9

v1.3.0 — Candidate Fusion Pipeline

Highlights

  • Candidate fusion pipeline: Calibrated Noisy-OR fusion replaces the single-winner selection. All validation strategies can now fire at every position, and a confidence-based arbiter determines which errors to surface — significantly reducing false positives.
  • Confidence gates: Word error suppression and context evidence guards filter out low-confidence results on structurally-clean text.
  • Pipeline conflict resolution: Override matrix and candidate arbiter for resolving cross-strategy conflicts.

Added

  • StatisticalConfusableStrategy (priority 24): Bidirectional bigram ratio for detecting confusable word pairs in context.
  • ConfusableCompoundClassifierStrategy (priority 47): MLP-based compound word detection using ONNX.
  • Data-driven calibration: Per-strategy breakpoints and reliability weights loaded from YAML.
  • Error.source_strategy field tracks which strategy produced each error.
  • SpellCheckerBuilder.with_candidate_fusion() for fluent configuration.
  • Expanded confusable pairs from 37 to 87, mined from production database.
  • Expanded grammar rules: additional auxiliary verbs, null-copula fix, grammar frequency guard.

Changed

  • ConfigPresets.accurate() now enables candidate fusion by default.
  • NERConfig split into lightweight module for faster imports.
  • Lowered default strategy confidences and confusable thresholds for better recall.
  • BrokenCompound span extraction uses word positions directly instead of sentence.find().
  • N-gram strategy applies error correction to all context words.

Fixed

  • ထင်ရှား no longer incorrectly flagged as aspiration error.
  • ကုမ်ပဏီ now correctly suggests ကုမ္ပဏီ instead of ကုန်ပဏီ.
  • Removed disputed ha-htoe correction for နူနာ per MLC dictionary.
  • CalibrationData loader handles invalid thresholds gracefully.
  • Narrowed overly broad exception catches in classifier and suggestion pipeline.
  • Neural reranker warns on missing stats file instead of silently using unnormalized features.