v1.7.0 — custom encoder framework + DeepSeek-V4-Flash + Granite 4.1 8B recipes
Custom encoder framework (#52)
Models that ship a Python encoding_*.py instead of a Jinja chat_template are now first-class. Two new ModelConfig fields, both None-defaulted so existing recipes keep working unchanged:
custom_encoder_module— filesystem path to a Python module exportingencode_messages(messages: list[dict], **kw) -> str.custom_encoder_kwargs— kwargs forwarded toencode_messages(e.g.{thinking_mode = "non-thinking"}for DeepSeek-V4).
On engine init, abliterix imports the module and monkey-patches tokenizer.apply_chat_template so the rest of the pipeline (tokenisation, hidden-state extraction, steering) stays unchanged. Fails fast on a missing file or missing encode_messages symbol.
DeepSeek-V4-Flash recipe (#52)
DSV4-Flash (284B / 13B active, 256 routed experts top-6, 43 layers) end-to-end:
configs/deepseek_v4_flash.toml— HF + EGA path, 4× B200 sizing (165 GiB/card cap perb200_real_vram.md),experts_implementation="eager"perblackwell_grouped_mm_eager.md.quick_start/deploy_dsv4_flash.sh— pod prereq checks, resolves BF16 source (prefersunsloth/DeepSeek-V4-Flash, falls back todeepseek-ai/DeepSeek-V4-Flash+ offline dequant), pre-flight 1-token smoke test, launch.quick_start/_dsv4_dequant_fp4_experts.py— FP4 expert dequant helper. Strategy 1: trust the modeling code (Mxfp4Config(dequantize=True)). Strategy 2: manual NVFP4 e2m1 unpack on safetensors shards. Strategy 3: diagnostic dump.quick_start/probe_dsv4_residual.py— mHC residual sanity probe answering one go/no-go question (does Sinkhorn-iterated residual mixing preserve a clean rank-1 refusal direction?) before burning 50 trials of optimiser budget. Reports per-layer residual delta, SVD top-1 concentration ratio, and consecutive-layer cosine drift with a reference profile from MiniMax-M2.7.
Engine-level detection: when transformers' AutoConfig reports quant_method="fp8" AND expert_dtype="fp4", abliterix prints a yellow warning pointing the user at the pre-dequant path (the hybrid layout can't be dequanted in-memory and EGA needs writable BF16 expert tensors).
Why HF + BF16 + DIRECT + EGA, not vLLM (verified 2026-05-05): vLLM 0.20.0+ registers DeepseekV4ForCausalLM and runs inference natively in FP8+FP4 (vllm-project/vllm#40760), but its Unsupported list excludes LoRA, EP, PP, in-place weight editing, AND hooks — every abliterix steering primitive. HF on BF16-dequanted weights remains the only working path.
Granite 4.1 8B recipes (#53)
Six BF16 LoRA configs for ibm-granite/granite-4.1-8b (dense GraniteForCausalLM, 40 layers, Apache-2.0) covering the V1 broad sweep + five quality-tuned Pareto-frontier strategies:
granite4.1_8b.toml— baseline 50-trial mean directiongranite4.1_8b_lowkl.toml— narrow search around V1's trial-7 Pareto point (13/200 refusals @ KL 0.3185)granite4.1_8b_klfirst.toml— KL-first SRA below the refusal cliff (residual-output path only)granite4.1_8b_sra_quality.toml— SRA (Surgical Refusal Ablation) over the mean signalgranite4.1_8b_ot_quality.toml— optimal_transport direction with k/v pruninggranite4.1_8b_cosmic_quality.toml— COSMIC cosine-similarity direction selection
quick_start/deploy_granite41_8b.sh runs on a single GPU (≥24 GiB VRAM, 48/80 GiB recommended) with GPU/disk/dataset pre-flight checks before installing deps.
Housekeeping
- Fix: stale
scripts/dequant_dsv4_fp4.pypath in the DSV4 hybrid-quant warning corrected toquick_start/_dsv4_dequant_fp4_experts.py. .gitignore: added/.scratch/for one-off scratch dirs.
Full Changelog: v1.6.0...v1.7.0