Audit fixes: MLX chat template, Ollama export preflight, migration intent preservation, smol-135m warning by espadonne · Pull Request #17 · tenseleyFlow/DocumentLanguageModel

espadonne · 2026-05-02T04:39:42Z

Summary

Fixes the four P0 + four P1/P2 issues surfaced by the brutal-auditor pass over DLM+Sway. Verified end-to-end: a trained adapter (qwen2.5-coder-1.5b, finding-04 corpus) now round-trips through dlm prompt --backend auto (MLX, with chat template) → dlm export --target ollama (preflight clean, valid Modelfile) → ollama run (verbatim trained Fortran answer).

P0 — documented happy path was broken

MLX backend skipped chat template. dlm prompt defaulted to --backend auto → MLX on Apple Silicon, and MlxBackend.generate fed the raw query to mlx_lm.generate. Trained INSTRUCTION adapters returned base-style completion noise on every default invocation. Fix: render through tokenizer.apply_chat_template before generation, mirroring format_chat_prompt on the PyTorch path. (Includes two cherry-picked commits from sprint/45 that fixed the upstream MLX adapter-binding bug; required for the chat-template fix to be verifiable e2e.)
Export preflight check_chat_template read the wrong file. Modern HF tokenizers (Qwen2.5+, Llama-3.x) write to a sibling chat_template.jinja, not inline in tokenizer_config.json. Fix: fall back to checking the sibling file.
Export preflight check_tokenizer_vocab used the wrong invariant. Strict equality between BPE vocab (151643 for Qwen2.5) and GGUF tokens (151936) refused every Qwen-family export — the 293-token gap is reserved/special-token slots in the embedding matrix. Fix: include added_tokens in the count, and relax the runner check from != to > (only "tokenizer addresses indices the model has no embeddings for" is unsafe).
Modelfile emitted invalid PARAMETER draft_model directive. draft_model is a runtime/API option in Ollama, not a Modelfile PARAMETER — ollama create failed with unknown parameter 'draft_model'. Fix: keep the suggested-pairing comment, drop the PARAMETER line, document the OLLAMA_DRAFT_MODEL env-var path.

P1

Smol-135m has no warning despite measured architectural floor. Audit-13 follow-up findings 02 + 05 showed every LoRA recipe on this base actively degrades general-chat capability; smollm2-135m had no warning field, no refusal hook. Fix: added capability_warning: str | None to BaseModelSpec, populated on smollm2-135m citing the audit findings, surfaced at dlm train start.
Migration v1→v15 silently dropped user-tuned fields. A v1 doc with lora_r: 8 lost the explicit pin during migration because 8 happens to match the current schema default. CLAUDE.md calls v2-v11 "additive identity" — that contract was honored at behavior level (default match) but not at intent level (a future default change would silently override the user's pin). Fix: collect the post-migration dict's field paths and pass them to the serializer as force-emit overrides.

P2

dlm cache show reported 0 entries after a successful train. By design — the tokenized-section cache only fires for runs whose frontmatter declares training.sources (in-body sections go through TRL's tokenizer). The output gave no hint of this. Fix: surface "not used (doc has no training.sources directive)" or "disabled (training.cache.enabled = false)" when the cache is gated off.
dlm doctor MLX line was a bare yes/no. With the chat-template fix landed, the original silent-failure mode is closed; the residual surface is "what is MLX for?". Fix: annotate the line as yes (prompt-only; default backend for dlm prompt on Apple Silicon).

Test plan

pytest tests/unit/ — 4139 passed, 4 skipped (no regressions)
mypy --strict src/dlm — clean (300 source files)
ruff check src/ tests/ — clean
E2E: dlm prompt --backend auto on the audit-13 finding-04 adapter returns the verbatim trained Fortran answer
E2E: dlm export --target ollama --quant Q4_K_M --no-imatrix succeeds; ollama create accepts the Modelfile
E2E: ollama run dlm-01kqdwahnj7fd72eq4j4fxbj2v:v0002 returns the trained sorting-routine signature
Repro: v1 doc with explicit lora_r: 8, lora_alpha: 16, num_epochs: 3, seed: 42, export.default_quant: Q4_K_M migrates to v15 with all fields preserved (was: only learning_rate survived)

Audit context

Full audit report at /tmp/dlm-audit/REPORT.md (local-only); this branch addresses every flagged P0 and the P1/P2 issues that were code-fixable in scope. Verdict on the original audit's 9 promises moves from "Working tool, gated by two trust-killing P0 bugs" to "Working tool, documented happy path verified end-to-end."

Two bugs combined to make `dlm prompt --backend mlx` produce base-model behavior even with a fully-trained PEFT LoRA adapter: 1. `target_modules` from PEFT is bare (`q_proj`), but mlx-lm's `linear_to_lora_layers` matches `named_modules()` keys inside each transformer block via exact equality. The FQN within a block is `self_attn.q_proj`, so no keys ever matched and `linear_to_lora_layers` silently left the model un-wrapped. 2. PEFT and mlx-lm use different LoRA tensor layouts: PEFT lora_A=[r,in], lora_B=[out,r]; mlx-lm lora_a=[in,r], lora_b=[r,out]. mlx-lm's `model.load_weights(strict=False)` silently skipped the mismatched shapes, leaving zero overlay. The user-visible failure was "trained model behaves identically to base" — surfaced during the audit-13 follow-up Finding 04 direct-query smoke test.

Even with the conversion fix, an unconvertible adapter (architecture whose layers don't follow the self_attn/mlp convention) would still fall through to base-model output silently. Add a post-load guard that walks the model's `trainable_parameters` and raises `MlxConversionError` when zero `lora_a`/`lora_b` parameters are present. Surfaces the failure as a clear message pointing at `--backend pytorch` instead of letting the trained adapter behave identically to the base.

mfwolffe added 15 commits May 1, 2026 23:52

Apply chat template in MlxBackend.generate

cccade4

Fall back to chat_template.jinja sibling in preflight

07c5ab0

Allow GGUF reserved-slot vocab gap; relax != to >

f569f5d

Drop invalid PARAMETER draft_model from Modelfile; document via comment

a8d5a9b

Warn at train time when base has a registered capability_warning

fd849da

Surface why tokenized cache is empty in 'dlm cache show'

8347af7

Annotate doctor's MLX line with scope (prompt-only, default backend)

86cbca9

Preserve user-explicit frontmatter fields across migration

f6fb244

Apply ruff format

ab0be61

Cover collect_dict_field_paths branches

e45b009

Cover both branches of doctor's MLX-availability render line

aeb361e

Cover MLX backend's chat-template helper

12a36d1

Cover OSError branch when reading chat_template.jinja sibling

292472b

mfwolffe merged commit 683f54c into trunk May 3, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit fixes: MLX chat template, Ollama export preflight, migration intent preservation, smol-135m warning#17

Audit fixes: MLX chat template, Ollama export preflight, migration intent preservation, smol-135m warning#17
mfwolffe merged 15 commits into
trunkfrom
audit-fixes

espadonne commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

espadonne commented May 2, 2026

Summary

P0 — documented happy path was broken

P1

P2

Test plan

Audit context

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants