Skip to content

Audit fixes: MLX chat template, Ollama export preflight, migration intent preservation, smol-135m warning#17

Merged
mfwolffe merged 15 commits into
trunkfrom
audit-fixes
May 3, 2026
Merged

Audit fixes: MLX chat template, Ollama export preflight, migration intent preservation, smol-135m warning#17
mfwolffe merged 15 commits into
trunkfrom
audit-fixes

Conversation

@espadonne
Copy link
Copy Markdown
Contributor

Summary

Fixes the four P0 + four P1/P2 issues surfaced by the brutal-auditor pass over DLM+Sway. Verified end-to-end: a trained adapter (qwen2.5-coder-1.5b, finding-04 corpus) now round-trips through dlm prompt --backend auto (MLX, with chat template) → dlm export --target ollama (preflight clean, valid Modelfile) → ollama run (verbatim trained Fortran answer).

P0 — documented happy path was broken

  • MLX backend skipped chat template. dlm prompt defaulted to --backend auto → MLX on Apple Silicon, and MlxBackend.generate fed the raw query to mlx_lm.generate. Trained INSTRUCTION adapters returned base-style completion noise on every default invocation. Fix: render through tokenizer.apply_chat_template before generation, mirroring format_chat_prompt on the PyTorch path. (Includes two cherry-picked commits from sprint/45 that fixed the upstream MLX adapter-binding bug; required for the chat-template fix to be verifiable e2e.)
  • Export preflight check_chat_template read the wrong file. Modern HF tokenizers (Qwen2.5+, Llama-3.x) write to a sibling chat_template.jinja, not inline in tokenizer_config.json. Fix: fall back to checking the sibling file.
  • Export preflight check_tokenizer_vocab used the wrong invariant. Strict equality between BPE vocab (151643 for Qwen2.5) and GGUF tokens (151936) refused every Qwen-family export — the 293-token gap is reserved/special-token slots in the embedding matrix. Fix: include added_tokens in the count, and relax the runner check from != to > (only "tokenizer addresses indices the model has no embeddings for" is unsafe).
  • Modelfile emitted invalid PARAMETER draft_model directive. draft_model is a runtime/API option in Ollama, not a Modelfile PARAMETER — ollama create failed with unknown parameter 'draft_model'. Fix: keep the suggested-pairing comment, drop the PARAMETER line, document the OLLAMA_DRAFT_MODEL env-var path.

P1

  • Smol-135m has no warning despite measured architectural floor. Audit-13 follow-up findings 02 + 05 showed every LoRA recipe on this base actively degrades general-chat capability; smollm2-135m had no warning field, no refusal hook. Fix: added capability_warning: str | None to BaseModelSpec, populated on smollm2-135m citing the audit findings, surfaced at dlm train start.
  • Migration v1→v15 silently dropped user-tuned fields. A v1 doc with lora_r: 8 lost the explicit pin during migration because 8 happens to match the current schema default. CLAUDE.md calls v2-v11 "additive identity" — that contract was honored at behavior level (default match) but not at intent level (a future default change would silently override the user's pin). Fix: collect the post-migration dict's field paths and pass them to the serializer as force-emit overrides.

P2

  • dlm cache show reported 0 entries after a successful train. By design — the tokenized-section cache only fires for runs whose frontmatter declares training.sources (in-body sections go through TRL's tokenizer). The output gave no hint of this. Fix: surface "not used (doc has no training.sources directive)" or "disabled (training.cache.enabled = false)" when the cache is gated off.
  • dlm doctor MLX line was a bare yes/no. With the chat-template fix landed, the original silent-failure mode is closed; the residual surface is "what is MLX for?". Fix: annotate the line as yes (prompt-only; default backend for dlm prompt on Apple Silicon).

Test plan

  • pytest tests/unit/ — 4139 passed, 4 skipped (no regressions)
  • mypy --strict src/dlm — clean (300 source files)
  • ruff check src/ tests/ — clean
  • E2E: dlm prompt --backend auto on the audit-13 finding-04 adapter returns the verbatim trained Fortran answer
  • E2E: dlm export --target ollama --quant Q4_K_M --no-imatrix succeeds; ollama create accepts the Modelfile
  • E2E: ollama run dlm-01kqdwahnj7fd72eq4j4fxbj2v:v0002 returns the trained sorting-routine signature
  • Repro: v1 doc with explicit lora_r: 8, lora_alpha: 16, num_epochs: 3, seed: 42, export.default_quant: Q4_K_M migrates to v15 with all fields preserved (was: only learning_rate survived)

Audit context

Full audit report at /tmp/dlm-audit/REPORT.md (local-only); this branch addresses every flagged P0 and the P1/P2 issues that were code-fixable in scope. Verdict on the original audit's 9 promises moves from "Working tool, gated by two trust-killing P0 bugs" to "Working tool, documented happy path verified end-to-end."

mfwolffe added 15 commits May 1, 2026 23:52
Two bugs combined to make `dlm prompt --backend mlx` produce
base-model behavior even with a fully-trained PEFT LoRA adapter:

1. `target_modules` from PEFT is bare (`q_proj`), but mlx-lm's
   `linear_to_lora_layers` matches `named_modules()` keys inside
   each transformer block via exact equality. The FQN within a
   block is `self_attn.q_proj`, so no keys ever matched and
   `linear_to_lora_layers` silently left the model un-wrapped.

2. PEFT and mlx-lm use different LoRA tensor layouts:
   PEFT lora_A=[r,in], lora_B=[out,r]; mlx-lm lora_a=[in,r],
   lora_b=[r,out]. mlx-lm's `model.load_weights(strict=False)`
   silently skipped the mismatched shapes, leaving zero overlay.

The user-visible failure was "trained model behaves identically
to base" — surfaced during the audit-13 follow-up Finding 04
direct-query smoke test.
Even with the conversion fix, an unconvertible adapter (architecture
whose layers don't follow the self_attn/mlp convention) would still
fall through to base-model output silently. Add a post-load guard
that walks the model's `trainable_parameters` and raises
`MlxConversionError` when zero `lora_a`/`lora_b` parameters are
present. Surfaces the failure as a clear message pointing at
`--backend pytorch` instead of letting the trained adapter behave
identically to the base.
@mfwolffe mfwolffe merged commit 683f54c into trunk May 3, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants