Merged
Conversation
Co-authored-by: Guojun Chen <gjchen@live.com> Co-authored-by: Yuxiao Guo <yuxiao.guo@outlook.com> Co-authored-by: Yuqing Xia <Xiayuqing0622@outlook.com> Co-authored-by: Jilong Xue <xuejilong@gmail.com> Co-authored-by: Lingxiao Ma <xysmlx@gmail.com> Co-authored-by: Liu Heng <18821707235@163.com> Co-authored-by: Zheng QiHang <zhengqihang0915@qq.com>
xiayuqing0622
approved these changes
Feb 14, 2026
There was a problem hiding this comment.
Pull request overview
This PR cuts a v0.1.3 release that adds GLM-5 support alongside DeepSeek-V3.2, introduces a small benchmarking suite, and refactors/extends the Python-side TileRT model/ops wrappers to support the new kernels and weight formats.
Changes:
- Add GLM-5 model args + integrate GLM-5 dispatch paths across many DeepSeek v3.2 ops/modules (shared implementation via shape/dim-based dispatch).
- Introduce a benchmark harness and update the generation CLI to support model selection and sampling options.
- Refactor core model utilities/base classes and add new shared model primitives (
python/models/common.py).
Reviewed changes
Copilot reviewed 63 out of 69 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| python/tilert_init.py | Simplifies init/force-init wrappers to call ops without placeholder tensors. |
| python/profiler/init.py | Adds profiler package docstring. |
| python/models/utils.py | Fixes conditional to guard rope correction on factor is not None. |
| python/models/preprocess/init.py | Removes preprocess package exports (WeightLoader no longer re-exported). |
| python/models/glm_5/params.py | Adds GLM-5 params module stub/docstring. |
| python/models/glm_5/model_args.py | Introduces ModelArgsGLM5 with GLM-5 hyperparameters. |
| python/models/glm_5/init.py | Package init (empty in diff). |
| python/models/deepseek_v3_2/temp_var_indices.py | Adds named temp-var indices + validation helper. |
| python/models/deepseek_v3_2/refs/kernel.py | Adds reference kernels (tilelang/triton) for fp8 ops/quant/dequant. |
| python/models/deepseek_v3_2/refs/init.py | Exposes reference kernel helpers via package exports. |
| python/models/deepseek_v3_2/ops/up_gate_silu.py | Adds TileRT op wrapper. |
| python/models/deepseek_v3_2/ops/unproj_o_allreduce.py | Adds unified DS/GLM5 unproj+allreduce module + weight converters. |
| python/models/deepseek_v3_2/ops/topk.py | Adds top-k wrappers + TopK nn.Module wrapper. |
| python/models/deepseek_v3_2/ops/top_p.py | Adds unified top-p dispatch for DS/GLM5. |
| python/models/deepseek_v3_2/ops/top1_allreduce.py | Adds top1 allreduce wrapper. |
| python/models/deepseek_v3_2/ops/sparse_index.py | Adds sparse index + sparse index topk wrappers (GLM5 paths). |
| python/models/deepseek_v3_2/ops/rotate.py | Adds unified rotate op + Rotate module. |
| python/models/deepseek_v3_2/ops/rmsnorm_up_gate_silu.py | Adds fused RMSNorm+UpGateSiLU module + algorithm selection. |
| python/models/deepseek_v3_2/ops/rmsnorm_quant.py | Adds unified RMSNorm(+optional quant) wrapper for DS/GLM5. |
| python/models/deepseek_v3_2/ops/rmsnorm_proj_top1.py | Adds RMSNorm+proj+top1 wrapper. |
| python/models/deepseek_v3_2/ops/rmsnorm_kv.py | Adds KV RMSNorm module. |
| python/models/deepseek_v3_2/ops/rmsnorm_head_proj.py | Adds head projection module + GLM5/DS dispatch. |
| python/models/deepseek_v3_2/ops/rmsnorm_expert_proj.py | Adds expert-projection module wrapper. |
| python/models/deepseek_v3_2/ops/qkv_rope.py | Adds unified QKV RoPE wrapper + module. |
| python/models/deepseek_v3_2/ops/projx_wis.py | Adds projection wrapper/module for indexer score weights. |
| python/models/deepseek_v3_2/ops/projq_wqb.py | Adds Q projection wrapper/module for KV-LoRA (GLM5 support). |
| python/models/deepseek_v3_2/ops/projo_wkvb.py | Adds O projection wrapper/module for KV-LoRA (GLM5 support). |
| python/models/deepseek_v3_2/ops/layernorm_rope_rotate.py | Adds LayerNorm+RoPE+rotate wrapper/module. |
| python/models/deepseek_v3_2/ops/head_proj.py | Adds head projection wrapper. |
| python/models/deepseek_v3_2/ops/flash_sparse_mla.py | Adds flash sparse MLA wrapper + combine module. |
| python/models/deepseek_v3_2/ops/expert_select.py | Adds expert select wrappers (two-stage DS vs one-stage GLM5). |
| python/models/deepseek_v3_2/ops/eh_proj_allreduce.py | Adds EH proj + allreduce module with DS/GLM5 dispatch. |
| python/models/deepseek_v3_2/ops/down_allreduce.py | Adds down+allreduce wrappers + module with DS/GLM5 dispatch. |
| python/models/deepseek_v3_2/ops/init.py | Exposes deepseek_v3_2 ops package API. |
| python/models/deepseek_v3_2/modules/mtp_preprocess.py | Adds MTP preprocess module + weight converter. |
| python/models/deepseek_v3_2/modules/mtp.py | Adds MTP module wiring (preprocess + moe + head). |
| python/models/deepseek_v3_2/modules/moe.py | Adds MoE module wiring + GLM5 algorithm selection. |
| python/models/deepseek_v3_2/modules/mlp.py | Adds MLP module wiring + GLM5 algorithm selection. |
| python/models/deepseek_v3_2/modules/mla.py | Adds MLA module wiring + GLM5 algorithm selection + cache vars. |
| python/models/deepseek_v3_2/modules/dsa.py | Refactors DSA module temp-var allocation using named indices. |
| python/models/deepseek_v3_2/modules/init.py | Adds modules package exports list. |
| python/models/deepseek_v3_2/model_args.py | Updates DS v3.2 defaults (dtype/seq-len/etc) and adds arch_name, kv_cache_pad, quant params. |
| python/models/deepseek_v3_2/dsa_mtp_e2e_show_hands.py | Removes legacy “show hands” E2E module. |
| python/models/common.py | Adds shared linear/RMSNorm/parallel layers + fp8 reference-kernel usage. |
| python/models/base.py | Refactors base module + adds SerializableTileRTModule + converter base class. |
| python/generate.py | Adds model selection + sampling args and integrates benchmark modes. |
| python/benchmark/short_prompt.py | Adds short-prompt benchmark. |
| python/benchmark/long_prompt.py | Adds long-prompt benchmark. |
| python/benchmark/coding_prompt.py | Adds coding-prompt benchmark. |
| python/benchmark/init.py | Adds benchmark utilities/types + markdown table printer. |
| python/init.py | Removes ShowHandsGenerator export from top-level package. |
| assets/perf.png | Asset changes for docs/benchmarks (binary). |
| assets/generate.gif | Asset changes for docs/benchmarks (binary). |
| assets/glm5-mtp.png | Adds GLM5 MTP benchmark figure for README. |
| assets/glm5-without-mtp.png | Adds GLM5 non-MTP benchmark figure for README. |
| assets/logo.png | Adds/updates logo asset for README/site. |
| README.md | Updates release notes + GLM5 benchmarking figures + new weight conversion workflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
kdtree
approved these changes
Feb 14, 2026
jlxue
approved these changes
Feb 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v0.1.3 release. GLM-5 lands.