Pulse · NVIDIA/TensorRT-LLM · GitHub

July 15, 2025 – July 22, 2025

Overview

209 Active pull requests

41 Active issues

Could not load contribution data

Please try again later

2 Releases published by 1 person

v1.0.0rc3
published Jul 16, 2025
v1.0.0rc4
published Jul 22, 2025

140 Pull requests merged by 72 people

[nvbug/5361223] doc: Update Llama4 deployment guide: update config & note concurrency
#6222 merged Jul 22, 2025
set NVIDIA_IMEX_CHANNELS for dlcluster slurm job only
#6234 merged Jul 22, 2025
[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM
#6152 merged Jul 22, 2025
[Issue 6193] Fix gemma3vl weight loader
#6233 merged Jul 22, 2025
Add register_fake for finegrained_mixed_dtype_gemm torch_op
#6255 merged Jul 22, 2025
fix: bindings unit tests for nanobind
#6221 merged Jul 22, 2025
test: update test list for RTX6KD
#6213 merged Jul 22, 2025
doc: update known issues
#6247 merged Jul 22, 2025
[TRTLLM-6537][infra] extend multi-gpu tests related file list
#6139 merged Jul 22, 2025
chore: bump version to 1.0.0rc5
#6252 merged Jul 22, 2025
doc: add supported data modality and types on multimodal serve
#5988 merged Jul 22, 2025
chore: Mass integration of release/0.21 (part 4)
#6211 merged Jul 22, 2025
bug: [https://nvbugs/5368507] Fix test_generate_with_seed.
#6206 merged Jul 22, 2025
[fix] Fix flaky mistral E2E test
#6230 merged Jul 22, 2025
Update model-feature document
#6243 merged Jul 22, 2025
feat: moe prepare support topk % 4 != 0
#5742 merged Jul 22, 2025
tests: add timeout_manager to tensorrt flow test cases
#5942 merged Jul 22, 2025
feat: Refactor the fetching request logic
#5786 merged Jul 22, 2025
[TRTLLM-5059][feat] Add KV cache reuse support for multimodal models
#5444 merged Jul 21, 2025
[Chore] Replace MODEL_CACHE_DIR with LLM_MODELS_ROOT and unwaive triton_server/test_triton.py::test_gpt_ib[gpt-ib]
#5859 merged Jul 21, 2025
disable gc
#6227 merged Jul 21, 2025
[chore] Clean up quickstart_advanced.py
#6021 merged Jul 21, 2025
[fix] Correct the returned value of has_spec_drafter
#6178 merged Jul 21, 2025
test: Enable GB200 torch compile multi gpu tests
#6145 merged Jul 21, 2025
[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve
#5717 merged Jul 21, 2025
[Infra] - Waive failed cases on recent post-merge
#6212 merged Jul 21, 2025
[TRTLLM-4279] feat: Multistream initial support for torch compile flow
#5847 merged Jul 21, 2025
doc: add Deprecation Policy section
#5784 merged Jul 21, 2025
infra: [TRTLLM-5250] Add sanity check stage for ngc-release images (Build wheels for devel image)
#4656 merged Jul 21, 2025
feat: nanobind bindings
#6185 merged Jul 21, 2025
test: [CI] remove closed bugs
#6201 merged Jul 21, 2025
[TRTLLM-5863][feat] Support Weight-Only-Quantization in PyTorch Workflow
#5850 merged Jul 21, 2025
add model-feature supported matrix doc
#5914 merged Jul 21, 2025
[None] infra:Update dependencies for DLFW 25.06
#5967 merged Jul 21, 2025
[fix] Fix can_use_alltoall in fused_moe_wide_ep.py
#6173 merged Jul 21, 2025
doc: remove cuda_graph_config: {} from doc since cuda_graph enabled b…
#6150 merged Jul 21, 2025
fix: Flush stale PlanParams with custom attention mask
#6163 merged Jul 21, 2025
test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test
#5826 merged Jul 21, 2025
enh: Lift expectation of single image per sample in Gemma3 VLM
#6195 merged Jul 21, 2025
W4A8 GEMM
#6005 merged Jul 20, 2025
[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction
#5616 merged Jul 20, 2025
fix: Ensure mlx5 library is installed for deep_ep and remove deprecated python bindings
#6189 merged Jul 20, 2025
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120
#6065 merged Jul 20, 2025
DeepEP LL support variable hidden size and tokens num
#6141 merged Jul 20, 2025
[fix]: Skip prompt length checking for generation only requests
#6146 merged Jul 19, 2025
[TRTLLM-6452][feat]: Two-model engine KV cache reuse support
#6133 merged Jul 19, 2025
[refactor] Unify name of NGram speculative decoding
#5937 merged Jul 19, 2025
[Disaggregated] Add retry knobs and handling
#5808 merged Jul 18, 2025
[Issue 5927][fix] Avoid memory calls during broadcast for single GPU
#6010 merged Jul 18, 2025
[https://nvbugs/5393961][fix] record kv-cache size in MLACacheFormatter
#6181 merged Jul 18, 2025
[nvbug/5393888][nvbug/5393042] Always use py_seq_slot
#6147 merged Jul 18, 2025
[nvbugs/5354884][fix] Update beam search workspace estimation to new upper bound
#5926 merged Jul 18, 2025
[nvbugs/5369799] fix: Update disaggregation handling in sampler
#5762 merged Jul 18, 2025
feat(eagle3):support qwen3 dense model
#5879 merged Jul 18, 2025
enh: Add script to map tests <-> jenkins stages & vice-versa
#5177 merged Jul 18, 2025
[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests
#6095 merged Jul 18, 2025
feat: Remove padding in attention DP.
#6064 merged Jul 18, 2025
[ci] Speedup beam search unit tests with fixtures for LLM
#5843 merged Jul 18, 2025
infra: fix single-GPU stage failed will not raise error
#6165 merged Jul 18, 2025
fix: NVBug 5385576 py_batch_idx issue
#6153 merged Jul 18, 2025
update broken link of PyTorchModelEngine in arch_overview
#6171 merged Jul 18, 2025
refactor: Enhanced handling of decoder requests and logits within the batch manager
#6055 merged Jul 18, 2025
[fix]: Revert commit 388b491
#6143 merged Jul 18, 2025
[Infra] - Waive failed tests in post-merge
#6176 merged Jul 18, 2025
chore: add more log in FmhaDispatcher
#6170 merged Jul 18, 2025
[TRTLLM-6091][docs] Update docs/trtllm sampler 1.0
#5833 merged Jul 18, 2025
[None][infra] Update the allow list of CI trigger
#6168 merged Jul 18, 2025
[TRTLLM-5179] - Update bot help messages
#5277 merged Jul 18, 2025
fix single_disagg_test
#6166 merged Jul 18, 2025
[Doc][Qwen3] update qwen3 into support-matrix
#6161 merged Jul 18, 2025
[None][infra] Cherry-pick #6128 and #6130 from main branch
#6151 merged Jul 18, 2025
feat: add support for Modelopt fp8_pb_wo quantization scheme
#6106 merged Jul 18, 2025
[https://nvbugs/5387375] fix(scaffolding): fix scaffolding aime test in test_e2e
#6140 merged Jul 18, 2025
fix TMA error with GEMM+AR on TP=2
#6075 merged Jul 18, 2025
fix: Unable to load phi4-model with tp_size>1
#6093 merged Jul 18, 2025
[TRTLLM-6368] Update deepep dispatch API
#6037 merged Jul 18, 2025
Revert "feat: nanobind bindings (#5961)"
#6160 merged Jul 18, 2025
feat: Add support for benchmarking individual gemms in MOE benchmark
#6080 merged Jul 17, 2025
Refactor KVCacheManager: Simplify token availability calculation and …
#6134 merged Jul 17, 2025
fix: Update trtllm args issues with extra nested config (#5996)
#6114 merged Jul 17, 2025
[fix] Fixes KV Cache overrides in trtllm-bench
#6103 merged Jul 17, 2025
[fix] Fix Mistral3VLM weight-loading & enable in pre-merge
#6105 merged Jul 17, 2025
[fix] Remove duplicated KVCache transmission check
#6022 merged Jul 17, 2025
[fix] Update jenkins container images
#6094 merged Jul 17, 2025
feat: nanobind bindings
#5961 merged Jul 17, 2025
[TRTLLM-6352][feat] Migrate EAGLE3 and draft/target speculation to Drafter
#6007 merged Jul 17, 2025
test: fix PytestUnknownMarkWarning: Unknown pytest.mark.timeout
#6115 merged Jul 17, 2025
fix: Fix DeepSeek R1 CI
#6129 merged Jul 17, 2025
test: update max_beam_width to 1 due to torchsampler changes.
#6101 merged Jul 17, 2025
chores: unwaive a few tests for v1.0
#6107 merged Jul 17, 2025
[TRTLLM-6406, TRTLLM-5172] feat: Enable guided decoding with overlap scheduler
#6000 merged Jul 17, 2025
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service
#5234 merged Jul 17, 2025
[Infra] - Add wiave list for pytest when using slurm
#6130 merged Jul 17, 2025
fix: convert venv_prefix to str before comparison with base_prefix
#6121 merged Jul 17, 2025
CI: update multi gpu test trigger file list
#6131 merged Jul 17, 2025
[None][infra] Set up the initial config for CodeRabbit
#6128 merged Jul 17, 2025
doc: sync llm api example
#6122 merged Jul 17, 2025
[fix] Release slots with spec decode + disagg (#5975)
#6032 merged Jul 17, 2025
Feat: Add vectorized loading for finalize kernel in MoE Trtllm backend
#5919 merged Jul 17, 2025
optimize: ADP schedule optimization
#6087 merged Jul 17, 2025
infra: fix SBSA test stage
#6113 merged Jul 17, 2025
[fix] Performance Optimization for MNNVL TwoShot Kernel
#5934 merged Jul 17, 2025
[TRTLLM-6070] docs: Add initial documentation for trtllm-bench CLI. (…
#6112 merged Jul 17, 2025
test: Update Llama4 Scout FP4 & FP8 accuracy tests
#5901 merged Jul 17, 2025
[TRTLLM-6070] docs: Add initial documentation for trtllm-bench CLI.
#5734 merged Jul 17, 2025
doc: merge main branch docs into 1.0 doc branch
#6099 merged Jul 17, 2025
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support
#5644 merged Jul 16, 2025
Fix: Enhance ModelConfig for kv cache size calculations
#5868 merged Jul 16, 2025
fix: Fix triton backend build [nvbug 5396469]
#6098 merged Jul 16, 2025
[refactor] Clean up drafter/resource manager creation logic
#5805 merged Jul 16, 2025
[Whisper] add whisper support
#6083 merged Jul 16, 2025
[fix] Correct handling of NVFP4 block scaling factors in preprocessing for MoE
#6073 merged Jul 16, 2025
Fix TMA error with GEMM+AR on TP=2
#6071 merged Jul 16, 2025
[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats
#5372 merged Jul 16, 2025
fix: Update trtllm args issues with extra nested config
#5996 merged Jul 16, 2025
Add documentation for eagle3+disagg+dynamo
#6072 merged Jul 16, 2025
[Infra] - Waive failed cases in post-merge on main
#6096 merged Jul 16, 2025
[TRTLLM-6071] doc: Add trtllm-eval 1.0 doc
#5877 merged Jul 16, 2025
infra: [TRTLLM-5879] Spilt single GPU test and multi GPU test into 2 pipelines
#5199 merged Jul 16, 2025
chore: Cleanup disable_fp4_allgather.
#6006 merged Jul 16, 2025
update spec_dec
#6079 merged Jul 16, 2025
BlockManager copy constructor fix
#5982 merged Jul 16, 2025
add release notes for 0.21 release
#6049 merged Jul 16, 2025
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend
#5752 merged Jul 16, 2025
fix: Add $HOME/.local/bin to PATH when running docker in local user mode
#6062 merged Jul 16, 2025
tests: add QA test cases
#5959 merged Jul 16, 2025
[nvbug/5387226] chore: add propogation for trust_remote_code to AutoConfig
#6001 merged Jul 16, 2025
[nvbug/5359218][tests] add test llm api test case on lookahead with chunked prefill
#6051 merged Jul 16, 2025
feat: Add deepseek-lite tests for RTX pro 6000
#5903 merged Jul 16, 2025
Cherry Pick: PR #6076
#6088 merged Jul 16, 2025
[TRTLLM-6471] Infra: Upgrade NIXL to 0.3.1
#5991 merged Jul 16, 2025
feat: use session abstraction in data transceiver and cache formatter
#5611 merged Jul 16, 2025
[nvbug/5347489][nvbug/5388036] increase timeout in disagg worker test
#6041 merged Jul 16, 2025
[None] - Waive L0 tests
#6082 merged Jul 16, 2025
chroe: upgrade modelopt to 0.33
#6058 merged Jul 16, 2025
chore: Bump version to 1.0.0rc4
#6086 merged Jul 16, 2025
fix: Unable to load phi4-model with tp_size>1
#5962 merged Jul 16, 2025
[fix] Fix Triton build
#6076 merged Jul 16, 2025
feat: Add support for Triton request cancellation
#5898 merged Jul 16, 2025
feat/add latency support for trtllm bench
#3730 merged Jul 15, 2025

69 Pull requests opened by 58 people

[fix] Correct handling of NVFP4 block scaling factors in preprocessing for MoE
#6069 opened Jul 15, 2025
[Draft] Inter-request kv cache manager support for HSTU
#6077 opened Jul 16, 2025
support JIT mha.cu for SPEC_DEC in runtime
#6078 opened Jul 16, 2025
[TRTLLM-6471] Infra: Upgrade NIXL to 0.3.1 and unwaive test
#6084 opened Jul 16, 2025
[TRTLLM-6444] Add some UCX trouble shooting docs and print UCX related logs
#6085 opened Jul 16, 2025
ucx establish connection with zmq
#6090 opened Jul 16, 2025
[draft][TRTLLM-1302][feat]: topk logprobs for TRT backend & top1 logprob for PyT backend
#6097 opened Jul 16, 2025
Draft:FP8 R1
#6100 opened Jul 16, 2025
[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model
#6104 opened Jul 16, 2025
[nvbug/5393849]: phi4-mini will generate garbage outputs with tp_size>1 with trt backend
#6108 opened Jul 17, 2025
Add disagg launcher scripts
#6109 opened Jul 17, 2025
add safe chunked broadcast
#6110 opened Jul 17, 2025
feat: Support server reload
#6116 opened Jul 17, 2025
[TRTLLM-5061] chore: add tags to API reference
#6123 opened Jul 17, 2025
[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (#5570)
#6124 opened Jul 17, 2025
perf: Add MOE support for dynamic cluster shapes and custom epilogue …
#6126 opened Jul 17, 2025
Cherry-pick moe sort (and all its dependencies)
#6127 opened Jul 17, 2025
infra: [TRTLLM-6499] Split L0_Test into two pipeline by single GPU and multi GPU(For SBSA)
#6132 opened Jul 17, 2025
[TRTLLM-6549] chore: record delay introduced by disaggregated serving in kv cache measure
#6135 opened Jul 17, 2025
[nvbug/5322354] fix PD + MTP + overlap scheduler accuracy issue
#6136 opened Jul 17, 2025
[Perf]: Add residual, norm for nemotron_nas models
#6157 opened Jul 17, 2025
[linting] Enable ruff on more files (wave 2/N)
#6162 opened Jul 17, 2025
DON'T MERGE: log paused requests info
#6174 opened Jul 18, 2025
[TRTLLM-6357][test] Add accuracy tests for Qwen3
#6177 opened Jul 18, 2025
feat: Support Aggregate mode for phi4-mm
#6184 opened Jul 18, 2025
fix: Ensure that Python stub generation works against libnvidia-ml stubs
#6188 opened Jul 18, 2025
DRAFT Changes for multi stream executor
#6190 opened Jul 18, 2025
[AutoDeploy] merge feat/ad-2025-07-07
#6196 opened Jul 18, 2025
[nvbugs/5361178] feat: json_schema support in trtllm-serve using xgrammar
#6197 opened Jul 18, 2025
Draft: Qwen3: Fix eagle hidden states
#6199 opened Jul 20, 2025
[https://nvbugs/5378031] Hopper W4A8 MoE supports ModelOpt ckpt for PyT backend
#6200 opened Jul 20, 2025
Draft: Nanobind integration tests
#6203 opened Jul 20, 2025
[TRTLLM-6445] feat: Enable AllReduce-associated fusion patterns in Llama3/4.
#6205 opened Jul 20, 2025
Draft: Deepseek: Start Eagle work
#6210 opened Jul 21, 2025
Draft: Feat/support lora cuda graph
#6215 opened Jul 21, 2025
feat: Enable TRTLLM sampler by default
#6216 opened Jul 21, 2025
[TRTLLM-6650][feat] Enhance beam search support with CUDA graph integration
#6217 opened Jul 21, 2025
[5830][feat] Improve LoRA cache memory control
#6220 opened Jul 21, 2025
[TRTLLM-6651][feat] Enable Overlap scheduler + Beam Search in TRTLLM Sampler
#6223 opened Jul 21, 2025
[nvbugs/5401261][fix] Fix Triton backend disaggregated serving support
#6224 opened Jul 21, 2025
Bump version to 0.21.1
#6225 opened Jul 21, 2025
Change the all-reduce strategy to NCCL
#6226 opened Jul 21, 2025
[fix] Allow custom model config for Kimi-K2
#6228 opened Jul 21, 2025
Waive flaky tests
#6229 opened Jul 21, 2025
Remove input_sf swizzle for module WideEPMoE
#6231 opened Jul 21, 2025
Auto-enable ngram with concurrency <= 32.
#6232 opened Jul 21, 2025
[Fix][Nvbug 5401163] Fix bug of MoE on tp > 1 with trtllm moe backend
#6235 opened Jul 21, 2025
[nvbug/5376229]: Remove flash-attn dependency from test_ptp_quickstart_multimodal
#6236 opened Jul 21, 2025
[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoShot Allreduce
#6237 opened Jul 21, 2025
fix: nvbug_5398806
#6239 opened Jul 21, 2025
Add Acceptance Rate calculation to benchmark_serving
#6240 opened Jul 22, 2025
[PERF] Don't use hmac encryption for loopback interfaces
#6241 opened Jul 22, 2025
Chore: remove duplicate should_stop_processing check
#6242 opened Jul 22, 2025
[feat] Support NVFP4 KV Cache
#6244 opened Jul 22, 2025
[TRTLLM-5627] feat: Implement pytorch sampler for MTP
#6245 opened Jul 22, 2025
feat: Add support of scheduling attention dp request
#6246 opened Jul 22, 2025
update disagg slurm scripts
#6248 opened Jul 22, 2025
[KV Cache Manager] Dead code elimination, we no longer record/fetch through WindowBlockManager:: mContextBlocksByHash
#6249 opened Jul 22, 2025
Improve TransferAgentTest.SyncMessage
#6250 opened Jul 22, 2025
@coderabbitai title
#6251 opened Jul 22, 2025
feat: support SharedTensor on MultimodalParams
#6254 opened Jul 22, 2025
[Infra] - Check all steps for test name and also check the test in waives.txt also exists in l0 or qa test list.
#6256 opened Jul 22, 2025
[TRTLLM-4279] fix: Add fake implementations for several custom ops and add a protection test
#6257 opened Jul 22, 2025
[https://nvbugs/5402719][fix]: Skip CUDA graph dummy request in spec decoding
#6258 opened Jul 22, 2025
[fix]: Revert commit 48ddc3d & add test for disagg server with different max_num_tokens
#6259 opened Jul 22, 2025
[https://nvbugs/5387771] fix deadlocks due to insufficient numSemaphores
#6262 opened Jul 22, 2025
[TRTLLM-6654][feat] Add support for external multimodal embeddings
#6263 opened Jul 22, 2025
add eagle3 one model disagg accuracy test
#6264 opened Jul 22, 2025
fix: Fixing kv_cache_events unit tests [nvbug5362412]
#6265 opened Jul 22, 2025

14 Issues closed by 10 people

[LOAD MODEL] Can't load gemma3 models
#6193 closed Jul 22, 2025
Empty response when setting --reasoning_parser "deepseek-r1" for EXAONE 4.0 32B model
#6183 closed Jul 22, 2025
[AutoDeploy] Create export patch registry
#5728 closed Jul 22, 2025
Running TensorRT-LLM Using Docker Setup failure
#6238 closed Jul 22, 2025
Using Tensorrt Whisper in Triton with your examples
#6144 closed Jul 21, 2025
[PERF] Qwen-VLs. Speedup by avoiding MPI overhead
#5927 closed Jul 18, 2025
Attention Pattern Matching with Inductor Utilities
#4404 closed Jul 18, 2025
test
#6155 closed Jul 17, 2025
Guided decoding is not supported with overlap scheduler
#5858 closed Jul 17, 2025
Qwen3-235B-A22B-FP8 error in sampling
#5803 closed Jul 16, 2025
[AutoDeploy] ABC for Graph Transformation
#4328 closed Jul 15, 2025
[AutoDeploy] Configuration System for Transformation
#4366 closed Jul 15, 2025
[AutoDeploy] Arch1: Configurable transformation pipeline
#4327 closed Jul 15, 2025
[AutoDeploy] Example Transformation in new configuration system
#4367 closed Jul 15, 2025

27 Issues opened by 23 people

Set batch sizes in --extra_llm_api_options or check if we can avoid explicitly setting it
#6261 opened Jul 22, 2025
Switch to use torch-opt in the trtllm-bench dashboard
#6260 opened Jul 22, 2025
multigpu gptManagerBenchmark MPI worldSize is expected to be equal to tp*pp*cp when participantIds are not specified
#6219 opened Jul 21, 2025
[Infefficient design] ModelRunnerCpp - optionally return Tensor instead of List
#6218 opened Jul 21, 2025
Pytorch workflow Logits Processor problem
#6214 opened Jul 21, 2025
Can't perform streaming
#6207 opened Jul 21, 2025
Qwen3 pytorch backend stuck when sampling params n > 1
#6204 opened Jul 20, 2025
Support `LogitsPostProcessorConfig` in Triton Backend
#6202 opened Jul 20, 2025
How to specify stop_token_ids and repetition_penalty in trtllm-serve serve
#6198 opened Jul 19, 2025
[Feature Request] Introduce a LayerNorm module in PyTorch workflow
#6187 opened Jul 18, 2025
[Feature Request] Simplify QK layernorm implementation in PyTorch workflow
#6186 opened Jul 18, 2025
convert_checkpoint.py fails on Qwen/Qwen3-0.6B: unexpected qkv weight shape
#6182 opened Jul 18, 2025
[QES] Could I use `mpi_run` with trtllm-pytorch backend?
#6179 opened Jul 18, 2025
[Feature Request] Add optimized support for DiT model
#6175 opened Jul 18, 2025
Error when deploying tensorrtllm backend using triton
#6169 opened Jul 18, 2025
Error serve Qwen3 model quantization int8 with modelopt
#6167 opened Jul 18, 2025
Add support for meta-llama/Llama-Guard-4-12B
#6164 opened Jul 18, 2025
Add support for meta-llama/Llama-Prompt-Guard-2-86M
#6159 opened Jul 17, 2025
speculative decoding for B200
#6158 opened Jul 17, 2025
Sesame CSM in Tensorrt-llm
#6156 opened Jul 17, 2025
Function tooling like in vllm
#6154 opened Jul 17, 2025
Optimize torch.compile's recompilation limit
#6142 opened Jul 17, 2025
w4fp8 with mtp load model error
#6137 opened Jul 17, 2025
[AutoDeploy] Architecture for VLM support
#6120 opened Jul 17, 2025
The performance of pytorch backend is worse than that of C backend
#6119 opened Jul 17, 2025
from tensorrt_llm.bindings import DataType, GptJsonConfig ImportError
#6102 opened Jul 16, 2025
Deploying qwen2.5-1.5B VRAM in 3090 exploded
#6091 opened Jul 16, 2025

87 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[Infra] Add nightly pipeline to generate lock files
#5798 commented on Jul 18, 2025 • 66 new comments
Add basic Nemo Ckpt Lora Loading in pytorch flow
#6019 commented on Jul 22, 2025 • 53 new comments
feat: best_of/n for pytorch workflow
#5997 commented on Jul 22, 2025 • 30 new comments
feat: Core Metrics Implementation
#5785 commented on Jul 22, 2025 • 26 new comments
[TRTLLM-5633] - Merge current waive list with the TOT waive list
#5198 commented on Jul 22, 2025 • 23 new comments
feat: LLM sleep & wakeup Part 1: virtual device memory
#5034 commented on Jul 22, 2025 • 15 new comments
[nvbugs/5302040] feat. Add whisper support (Bert Attention on SM100 and GPTAttention for cross attention on SM100)
#5527 commented on Jul 21, 2025 • 13 new comments
Add vLLM KV Pool support for XQA kernel
#6013 commented on Jul 22, 2025 • 13 new comments
infra: [TRTLLM-5873] Use build stage wheels to speed up docker release image build
#4939 commented on Jul 18, 2025 • 9 new comments
[feat] low precision all2all
#6047 commented on Jul 22, 2025 • 8 new comments
fix/improve kvcache allocation in PyTorch runtime
#5933 commented on Jul 22, 2025 • 8 new comments
[feat]: support logit_bias
#5354 commented on Jul 18, 2025 • 7 new comments
[TRTLLM-5061] chore: add tags to API reference
#5707 commented on Jul 22, 2025 • 7 new comments
Draft: [TRTLLM-4719][enhance] Refactor to add scenarios to trtllm-bench
#6023 commented on Jul 21, 2025 • 7 new comments
Blackwell/Hopper MoE Gemm2+Finalize fusion
#3294 commented on Jul 17, 2025 • 6 new comments
[fix] Add detokenization-based stop word logic to LLM API
#5948 commented on Jul 22, 2025 • 6 new comments
feat: spec dec with external API
#5990 commented on Jul 22, 2025 • 6 new comments
Mtp optimizations round1
#5689 commented on Jul 22, 2025 • 5 new comments
doc: Refactor documents and examples of disaggregated serving and wide ep
#6054 commented on Jul 22, 2025 • 5 new comments
[feat] : Add FP8 context MLA support for SM120
#6059 commented on Jul 21, 2025 • 5 new comments
[1/N] Add NCCL Symmetric Integration for All Reduce
#4500 commented on Jul 19, 2025 • 5 new comments
[TRTLLM-5312] - Add bot run rules for triton tests
#4988 commented on Jul 22, 2025 • 4 new comments
doc: Adding disaggregated serving page to features section for 1.0 docs [TRTLLM-6086]
#6024 commented on Jul 22, 2025 • 4 new comments
tests: add test_chunked_prefill for llama4
#5549 commented on Jul 22, 2025 • 3 new comments
feat:[AutoDeploy] Support llama4 FP8
#5935 commented on Jul 22, 2025 • 2 new comments
[TRTLLM-5996][feat] FP8 blockwise scaling GEMM support on Blackwell
#5987 commented on Jul 17, 2025 • 2 new comments
[security] add nspce allow list for false positive secrets
#5797 commented on Jul 22, 2025 • 2 new comments
Draft: [TRTLLM-5234][feature] Add a serve subcommand to trtllm-bench
#6025 commented on Jul 22, 2025 • 2 new comments
[feat] Implement pytorch sampler for MTP
#5627 commented on Jul 17, 2025 • 2 new comments
[FIX] Fix of build with ENABLE_MULTI_DEVICE=0. Fix Qwen-VL fail with request wo MM data
#6063 commented on Jul 22, 2025 • 2 new comments
[TRTLLM-5508] feat: check input tokens + improve error handling
#5170 commented on Jul 22, 2025 • 1 new comment
Nixl agent
#5488 commented on Jul 17, 2025 • 1 new comment
[Infra][TRTLLM-6224] - Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1
#5678 commented on Jul 21, 2025 • 1 new comment
[fix] improve head_dim calculation in Qwen config
#5913 commented on Jul 21, 2025 • 1 new comment
[Issue/5952][feat] Support JSON Schema in OpenAI-Compatible API
#5957 commented on Jul 16, 2025 • 1 new comment
Pass mode & directory
#5983 commented on Jul 21, 2025 • 1 new comment
[PERF] Move calculation Qwen2-VL's rotary_cos_sin to LLM worker process
#6004 commented on Jul 22, 2025 • 1 new comment
[TRTLLM-4413][infra] Add additional nightly build of wheel package with local version besides ordinary one
#4570 commented on Jul 18, 2025 • 1 new comment
[feat] Support NVFP4 KV Cache
#5132 commented on Jul 17, 2025 • 0 new comments
[refactor] Simplification of Speculative decoding configs - Part 2
#5936 commented on Jul 22, 2025 • 0 new comments
Add new debug hooks to trace the memory usage and where the process hangs at
#5943 commented on Jul 18, 2025 • 0 new comments
Assertion failed: Must set crossKvCacheFraction for encoder-decoder model
#2419 commented on Jul 16, 2025 • 0 new comments
Pip installed tensorrt-llm for ubutnu24.04 seems broken,both on host system and docker container.
#4459 commented on Jul 16, 2025 • 0 new comments
[MLPerf] Potential fix for response queue size
#5972 commented on Jul 21, 2025 • 0 new comments
[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache
#5974 commented on Jul 22, 2025 • 0 new comments
[PERF] MM models. Transfer images main->worker in low precision
#6068 commented on Jul 16, 2025 • 0 new comments
feat: Simplify and Improve Whisper Example
#5984 commented on Jul 16, 2025 • 0 new comments
InternVL3 support
#4625 commented on Jul 16, 2025 • 0 new comments
Slower than expected inference with p-tuning/prompt embedding enabled, when PromptTuningConfig is provided
#5889 commented on Jul 15, 2025 • 0 new comments
chore: set default device to cpu on Multimodal models
#5994 commented on Jul 22, 2025 • 0 new comments
speculative decoding in gemma3
#6067 commented on Jul 15, 2025 • 0 new comments
How to run cross attention with different number of q and kv tokens
#6066 commented on Jul 15, 2025 • 0 new comments
Why Are Certain Layer Normalization Parameters Cast to trt_llm_config.dtype Instead of Using float32 for Better Precision?
#6052 commented on Jul 15, 2025 • 0 new comments
Sample script works with TP_SIZE=2 but fails with PP_SIZE=2
#5970 commented on Jul 15, 2025 • 0 new comments
Mistral Small 3.2 architecture
#5968 commented on Jul 15, 2025 • 0 new comments
[Feature Request] Direct JSON Schema Support in OpenAI-Compatible API
#5952 commented on Jul 15, 2025 • 0 new comments
DS-V3 W4FP8 is OOM with 1.0.0rc2, but 1.0.0rc1 is fine
#5950 commented on Jul 15, 2025 • 0 new comments
Only Build CPP benchmarks
#5930 commented on Jul 15, 2025 • 0 new comments
Add fmha test
#6050 commented on Jul 17, 2025 • 0 new comments
trtllm-serve crashes for qwen3 235b a22b fp8, if enable_block_reuse false + tp 8
#5872 commented on Jul 15, 2025 • 0 new comments
为什么执行 python scripts/build_wheel.py --clean 克隆速度那么慢
#5871 commented on Jul 15, 2025 • 0 new comments
optimize: ADP schedule optimization
#6061 commented on Jul 16, 2025 • 0 new comments
How to enable deepep(especially cuda graph in generation) in PD disaggrated serving
#5869 commented on Jul 15, 2025 • 0 new comments
[Infra]Add rtx pro 6000 post-merge tests for DS
#5126 commented on Jul 22, 2025 • 0 new comments
[TRTLLM-5990]doc:trtllm-serve doc improvement.
#5220 commented on Jul 22, 2025 • 0 new comments
Draft: chore: Make GEMM config enums human readable for better logging
#4111 commented on Jul 17, 2025 • 0 new comments
fix: Enable num_return_sequences (`n`) support in PyTorch backend
#5415 commented on Jul 17, 2025 • 0 new comments
test: test debug hook
#3317 commented on Jul 18, 2025 • 0 new comments
Title: KeyError: 'gemma3' error in GemmaConfig.from_hugging_face when converting Gemma 3 model
#4825 commented on Jul 22, 2025 • 0 new comments
DS-R1 W4AFP8 slow-down over DS-R1 FP8 unless ep=None
#5928 commented on Jul 22, 2025 • 0 new comments
Broken checkpoint converter for whisper
#5210 commented on Jul 21, 2025 • 0 new comments
Rebase fp8 blockwise gemm autotune
#5635 commented on Jul 17, 2025 • 0 new comments
[TRTLLM-5966][feat] Initial steps towards Helix parallelism support
#5668 commented on Jul 17, 2025 • 0 new comments
Force KV Cache Offload
#3130 commented on Jul 18, 2025 • 0 new comments
[feat] add support for Eclairv2 model
#5686 commented on Jul 16, 2025 • 0 new comments
[AutoDeploy] Arch4: Quantization and Mixed Dtype Support
#5046 commented on Jul 18, 2025 • 0 new comments
Error in mpi4py when using official sanity check code from TensorRT-LLM v0.19.0rc0 on NVIDIA 5090
#3705 commented on Jul 18, 2025 • 0 new comments
hopper-style context MLA
#5713 commented on Jul 21, 2025 • 0 new comments
Move existing transformation into new configurable system
#4403 commented on Jul 17, 2025 • 0 new comments
Qwen3MoeForCausalLM Not Supported when exporting to TensorRT-LLM format
#5978 commented on Jul 17, 2025 • 0 new comments
[AutoDeploy] Investigate torch.export as a preprocessing step to InferenceOptimizer
#4704 commented on Jul 17, 2025 • 0 new comments
refactor: decoding inputs, part 2
#5799 commented on Jul 18, 2025 • 0 new comments
[WIP] Block diffusion
#5890 commented on Jul 16, 2025 • 0 new comments
[AutoDeploy] Better PyExecutor Integration
#4307 commented on Jul 17, 2025 • 0 new comments
fix: compatibility with CUDA < 12.9 on `__CUDA_ARCH_SPECIFIC__` macro
#5917 commented on Jul 22, 2025 • 0 new comments
Lookahead Decoding broken with Prompt Embedding Table/Multimodal
#6009 commented on Jul 17, 2025 • 0 new comments
Profiling time by Nsight Compute is too long
#5979 commented on Jul 16, 2025 • 0 new comments