Audit fixes: MLX chat template, Ollama export preflight, migration intent preservation, smol-135m warning#17
Merged
Merged
Conversation
Two bugs combined to make `dlm prompt --backend mlx` produce base-model behavior even with a fully-trained PEFT LoRA adapter: 1. `target_modules` from PEFT is bare (`q_proj`), but mlx-lm's `linear_to_lora_layers` matches `named_modules()` keys inside each transformer block via exact equality. The FQN within a block is `self_attn.q_proj`, so no keys ever matched and `linear_to_lora_layers` silently left the model un-wrapped. 2. PEFT and mlx-lm use different LoRA tensor layouts: PEFT lora_A=[r,in], lora_B=[out,r]; mlx-lm lora_a=[in,r], lora_b=[r,out]. mlx-lm's `model.load_weights(strict=False)` silently skipped the mismatched shapes, leaving zero overlay. The user-visible failure was "trained model behaves identically to base" — surfaced during the audit-13 follow-up Finding 04 direct-query smoke test.
Even with the conversion fix, an unconvertible adapter (architecture whose layers don't follow the self_attn/mlp convention) would still fall through to base-model output silently. Add a post-load guard that walks the model's `trainable_parameters` and raises `MlxConversionError` when zero `lora_a`/`lora_b` parameters are present. Surfaces the failure as a clear message pointing at `--backend pytorch` instead of letting the trained adapter behave identically to the base.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the four P0 + four P1/P2 issues surfaced by the brutal-auditor pass over DLM+Sway. Verified end-to-end: a trained adapter (
qwen2.5-coder-1.5b, finding-04 corpus) now round-trips throughdlm prompt --backend auto(MLX, with chat template) →dlm export --target ollama(preflight clean, valid Modelfile) →ollama run(verbatim trained Fortran answer).P0 — documented happy path was broken
dlm promptdefaulted to--backend auto→ MLX on Apple Silicon, andMlxBackend.generatefed the raw query tomlx_lm.generate. Trained INSTRUCTION adapters returned base-style completion noise on every default invocation. Fix: render throughtokenizer.apply_chat_templatebefore generation, mirroringformat_chat_prompton the PyTorch path. (Includes two cherry-picked commits from sprint/45 that fixed the upstream MLX adapter-binding bug; required for the chat-template fix to be verifiable e2e.)check_chat_templateread the wrong file. Modern HF tokenizers (Qwen2.5+, Llama-3.x) write to a siblingchat_template.jinja, not inline intokenizer_config.json. Fix: fall back to checking the sibling file.check_tokenizer_vocabused the wrong invariant. Strict equality between BPE vocab (151643 for Qwen2.5) and GGUF tokens (151936) refused every Qwen-family export — the 293-token gap is reserved/special-token slots in the embedding matrix. Fix: includeadded_tokensin the count, and relax the runner check from!=to>(only "tokenizer addresses indices the model has no embeddings for" is unsafe).PARAMETER draft_modeldirective.draft_modelis a runtime/API option in Ollama, not a Modelfile PARAMETER —ollama createfailed withunknown parameter 'draft_model'. Fix: keep the suggested-pairing comment, drop the PARAMETER line, document theOLLAMA_DRAFT_MODELenv-var path.P1
capability_warning: str | NonetoBaseModelSpec, populated on smollm2-135m citing the audit findings, surfaced atdlm trainstart.lora_r: 8lost the explicit pin during migration because 8 happens to match the current schema default. CLAUDE.md calls v2-v11 "additive identity" — that contract was honored at behavior level (default match) but not at intent level (a future default change would silently override the user's pin). Fix: collect the post-migration dict's field paths and pass them to the serializer as force-emit overrides.P2
dlm cache showreported 0 entries after a successful train. By design — the tokenized-section cache only fires for runs whose frontmatter declarestraining.sources(in-body sections go through TRL's tokenizer). The output gave no hint of this. Fix: surface "not used (doc has notraining.sourcesdirective)" or "disabled (training.cache.enabled = false)" when the cache is gated off.dlm doctorMLX line was a bare yes/no. With the chat-template fix landed, the original silent-failure mode is closed; the residual surface is "what is MLX for?". Fix: annotate the line asyes (prompt-only; default backend fordlm prompton Apple Silicon).Test plan
pytest tests/unit/— 4139 passed, 4 skipped (no regressions)mypy --strict src/dlm— clean (300 source files)ruff check src/ tests/— cleandlm prompt --backend autoon the audit-13 finding-04 adapter returns the verbatim trained Fortran answerdlm export --target ollama --quant Q4_K_M --no-imatrixsucceeds;ollama createaccepts the Modelfileollama run dlm-01kqdwahnj7fd72eq4j4fxbj2v:v0002returns the trained sorting-routine signaturelora_r: 8,lora_alpha: 16,num_epochs: 3,seed: 42,export.default_quant: Q4_K_Mmigrates to v15 with all fields preserved (was: onlylearning_ratesurvived)Audit context
Full audit report at
/tmp/dlm-audit/REPORT.md(local-only); this branch addresses every flagged P0 and the P1/P2 issues that were code-fixable in scope. Verdict on the original audit's 9 promises moves from "Working tool, gated by two trust-killing P0 bugs" to "Working tool, documented happy path verified end-to-end."