-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Insights: NVIDIA/TensorRT-LLM
Overview
Could not load contribution data
Please try again later
140 Pull requests merged by 72 people
-
[nvbug/5361223] doc: Update Llama4 deployment guide: update config & note concurrency
#6222 merged
Jul 22, 2025 -
set NVIDIA_IMEX_CHANNELS for dlcluster slurm job only
#6234 merged
Jul 22, 2025 -
[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM
#6152 merged
Jul 22, 2025 -
[Issue 6193] Fix gemma3vl weight loader
#6233 merged
Jul 22, 2025 -
Add register_fake for finegrained_mixed_dtype_gemm torch_op
#6255 merged
Jul 22, 2025 -
fix: bindings unit tests for nanobind
#6221 merged
Jul 22, 2025 -
test: update test list for RTX6KD
#6213 merged
Jul 22, 2025 -
doc: update known issues
#6247 merged
Jul 22, 2025 -
[TRTLLM-6537][infra] extend multi-gpu tests related file list
#6139 merged
Jul 22, 2025 -
chore: bump version to 1.0.0rc5
#6252 merged
Jul 22, 2025 -
doc: add supported data modality and types on multimodal serve
#5988 merged
Jul 22, 2025 -
chore: Mass integration of release/0.21 (part 4)
#6211 merged
Jul 22, 2025 -
bug: [https://nvbugs/5368507] Fix test_generate_with_seed.
#6206 merged
Jul 22, 2025 -
[fix] Fix flaky mistral E2E test
#6230 merged
Jul 22, 2025 -
Update model-feature document
#6243 merged
Jul 22, 2025 -
feat: moe prepare support topk % 4 != 0
#5742 merged
Jul 22, 2025 -
tests: add timeout_manager to tensorrt flow test cases
#5942 merged
Jul 22, 2025 -
feat: Refactor the fetching request logic
#5786 merged
Jul 22, 2025 -
[TRTLLM-5059][feat] Add KV cache reuse support for multimodal models
#5444 merged
Jul 21, 2025 -
disable gc
#6227 merged
Jul 21, 2025 -
[chore] Clean up quickstart_advanced.py
#6021 merged
Jul 21, 2025 -
[fix] Correct the returned value of has_spec_drafter
#6178 merged
Jul 21, 2025 -
test: Enable GB200 torch compile multi gpu tests
#6145 merged
Jul 21, 2025 -
[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve
#5717 merged
Jul 21, 2025 -
[Infra] - Waive failed cases on recent post-merge
#6212 merged
Jul 21, 2025 -
[TRTLLM-4279] feat: Multistream initial support for torch compile flow
#5847 merged
Jul 21, 2025 -
doc: add Deprecation Policy section
#5784 merged
Jul 21, 2025 -
infra: [TRTLLM-5250] Add sanity check stage for ngc-release images (Build wheels for devel image)
#4656 merged
Jul 21, 2025 -
feat: nanobind bindings
#6185 merged
Jul 21, 2025 -
test: [CI] remove closed bugs
#6201 merged
Jul 21, 2025 -
[TRTLLM-5863][feat] Support Weight-Only-Quantization in PyTorch Workflow
#5850 merged
Jul 21, 2025 -
add model-feature supported matrix doc
#5914 merged
Jul 21, 2025 -
[None] infra:Update dependencies for DLFW 25.06
#5967 merged
Jul 21, 2025 -
[fix] Fix can_use_alltoall in fused_moe_wide_ep.py
#6173 merged
Jul 21, 2025 -
doc: remove cuda_graph_config: {} from doc since cuda_graph enabled b…
#6150 merged
Jul 21, 2025 -
fix: Flush stale
PlanParams
with custom attention mask#6163 merged
Jul 21, 2025 -
test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test
#5826 merged
Jul 21, 2025 -
enh: Lift expectation of single image per sample in Gemma3 VLM
#6195 merged
Jul 21, 2025 -
W4A8 GEMM
#6005 merged
Jul 20, 2025 -
[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction
#5616 merged
Jul 20, 2025 -
fix: Ensure mlx5 library is installed for deep_ep and remove deprecated python bindings
#6189 merged
Jul 20, 2025 -
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120
#6065 merged
Jul 20, 2025 -
DeepEP LL support variable hidden size and tokens num
#6141 merged
Jul 20, 2025 -
[fix]: Skip prompt length checking for generation only requests
#6146 merged
Jul 19, 2025 -
[TRTLLM-6452][feat]: Two-model engine KV cache reuse support
#6133 merged
Jul 19, 2025 -
[refactor] Unify name of NGram speculative decoding
#5937 merged
Jul 19, 2025 -
[Disaggregated] Add retry knobs and handling
#5808 merged
Jul 18, 2025 -
[Issue 5927][fix] Avoid memory calls during broadcast for single GPU
#6010 merged
Jul 18, 2025 -
[https://nvbugs/5393961][fix] record kv-cache size in MLACacheFormatter
#6181 merged
Jul 18, 2025 -
[nvbug/5393888][nvbug/5393042] Always use
py_seq_slot
#6147 merged
Jul 18, 2025 -
[nvbugs/5354884][fix] Update beam search workspace estimation to new upper bound
#5926 merged
Jul 18, 2025 -
[nvbugs/5369799] fix: Update disaggregation handling in sampler
#5762 merged
Jul 18, 2025 -
feat(eagle3):support qwen3 dense model
#5879 merged
Jul 18, 2025 -
enh: Add script to map tests <-> jenkins stages & vice-versa
#5177 merged
Jul 18, 2025 -
[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests
#6095 merged
Jul 18, 2025 -
feat: Remove padding in attention DP.
#6064 merged
Jul 18, 2025 -
[ci] Speedup beam search unit tests with fixtures for LLM
#5843 merged
Jul 18, 2025 -
infra: fix single-GPU stage failed will not raise error
#6165 merged
Jul 18, 2025 -
fix: NVBug 5385576 py_batch_idx issue
#6153 merged
Jul 18, 2025 -
update broken link of PyTorchModelEngine in arch_overview
#6171 merged
Jul 18, 2025 -
refactor: Enhanced handling of decoder requests and logits within the batch manager
#6055 merged
Jul 18, 2025 -
[fix]: Revert commit 388b491
#6143 merged
Jul 18, 2025 -
[Infra] - Waive failed tests in post-merge
#6176 merged
Jul 18, 2025 -
chore: add more log in FmhaDispatcher
#6170 merged
Jul 18, 2025 -
[TRTLLM-6091][docs] Update docs/trtllm sampler 1.0
#5833 merged
Jul 18, 2025 -
[None][infra] Update the allow list of CI trigger
#6168 merged
Jul 18, 2025 -
[TRTLLM-5179] - Update bot help messages
#5277 merged
Jul 18, 2025 -
fix single_disagg_test
#6166 merged
Jul 18, 2025 -
[Doc][Qwen3] update qwen3 into support-matrix
#6161 merged
Jul 18, 2025 -
[None][infra] Cherry-pick #6128 and #6130 from main branch
#6151 merged
Jul 18, 2025 -
feat: add support for Modelopt fp8_pb_wo quantization scheme
#6106 merged
Jul 18, 2025 -
[https://nvbugs/5387375] fix(scaffolding): fix scaffolding aime test in test_e2e
#6140 merged
Jul 18, 2025 -
fix TMA error with GEMM+AR on TP=2
#6075 merged
Jul 18, 2025 -
fix: Unable to load phi4-model with tp_size>1
#6093 merged
Jul 18, 2025 -
[TRTLLM-6368] Update deepep dispatch API
#6037 merged
Jul 18, 2025 -
Revert "feat: nanobind bindings (#5961)"
#6160 merged
Jul 18, 2025 -
feat: Add support for benchmarking individual gemms in MOE benchmark
#6080 merged
Jul 17, 2025 -
Refactor KVCacheManager: Simplify token availability calculation and …
#6134 merged
Jul 17, 2025 -
fix: Update trtllm args issues with extra nested config (#5996)
#6114 merged
Jul 17, 2025 -
[fix] Fixes KV Cache overrides in trtllm-bench
#6103 merged
Jul 17, 2025 -
[fix] Fix Mistral3VLM weight-loading & enable in pre-merge
#6105 merged
Jul 17, 2025 -
[fix] Remove duplicated KVCache transmission check
#6022 merged
Jul 17, 2025 -
[fix] Update jenkins container images
#6094 merged
Jul 17, 2025 -
feat: nanobind bindings
#5961 merged
Jul 17, 2025 -
[TRTLLM-6352][feat] Migrate EAGLE3 and draft/target speculation to Drafter
#6007 merged
Jul 17, 2025 -
test: fix PytestUnknownMarkWarning: Unknown pytest.mark.timeout
#6115 merged
Jul 17, 2025 -
fix: Fix DeepSeek R1 CI
#6129 merged
Jul 17, 2025 -
test: update max_beam_width to 1 due to torchsampler changes.
#6101 merged
Jul 17, 2025 -
chores: unwaive a few tests for v1.0
#6107 merged
Jul 17, 2025 -
[TRTLLM-6406, TRTLLM-5172] feat: Enable guided decoding with overlap scheduler
#6000 merged
Jul 17, 2025 -
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service
#5234 merged
Jul 17, 2025 -
[Infra] - Add wiave list for pytest when using slurm
#6130 merged
Jul 17, 2025 -
fix: convert venv_prefix to str before comparison with base_prefix
#6121 merged
Jul 17, 2025 -
CI: update multi gpu test trigger file list
#6131 merged
Jul 17, 2025 -
[None][infra] Set up the initial config for CodeRabbit
#6128 merged
Jul 17, 2025 -
doc: sync llm api example
#6122 merged
Jul 17, 2025 -
[fix] Release slots with spec decode + disagg (#5975)
#6032 merged
Jul 17, 2025 -
Feat: Add vectorized loading for finalize kernel in MoE Trtllm backend
#5919 merged
Jul 17, 2025 -
optimize: ADP schedule optimization
#6087 merged
Jul 17, 2025 -
infra: fix SBSA test stage
#6113 merged
Jul 17, 2025 -
[fix] Performance Optimization for MNNVL TwoShot Kernel
#5934 merged
Jul 17, 2025 -
[TRTLLM-6070] docs: Add initial documentation for trtllm-bench CLI. (…
#6112 merged
Jul 17, 2025 -
test: Update Llama4 Scout FP4 & FP8 accuracy tests
#5901 merged
Jul 17, 2025 -
[TRTLLM-6070] docs: Add initial documentation for trtllm-bench CLI.
#5734 merged
Jul 17, 2025 -
doc: merge main branch docs into 1.0 doc branch
#6099 merged
Jul 17, 2025 -
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support
#5644 merged
Jul 16, 2025 -
Fix: Enhance ModelConfig for kv cache size calculations
#5868 merged
Jul 16, 2025 -
fix: Fix triton backend build [nvbug 5396469]
#6098 merged
Jul 16, 2025 -
[refactor] Clean up drafter/resource manager creation logic
#5805 merged
Jul 16, 2025 -
[Whisper] add whisper support
#6083 merged
Jul 16, 2025 -
[fix] Correct handling of NVFP4 block scaling factors in preprocessing for MoE
#6073 merged
Jul 16, 2025 -
Fix TMA error with GEMM+AR on TP=2
#6071 merged
Jul 16, 2025 -
[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats
#5372 merged
Jul 16, 2025 -
fix: Update trtllm args issues with extra nested config
#5996 merged
Jul 16, 2025 -
Add documentation for eagle3+disagg+dynamo
#6072 merged
Jul 16, 2025 -
[Infra] - Waive failed cases in post-merge on main
#6096 merged
Jul 16, 2025 -
[TRTLLM-6071] doc: Add trtllm-eval 1.0 doc
#5877 merged
Jul 16, 2025 -
infra: [TRTLLM-5879] Spilt single GPU test and multi GPU test into 2 pipelines
#5199 merged
Jul 16, 2025 -
chore: Cleanup disable_fp4_allgather.
#6006 merged
Jul 16, 2025 -
update spec_dec
#6079 merged
Jul 16, 2025 -
BlockManager copy constructor fix
#5982 merged
Jul 16, 2025 -
add release notes for 0.21 release
#6049 merged
Jul 16, 2025 -
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend
#5752 merged
Jul 16, 2025 -
fix: Add $HOME/.local/bin to PATH when running docker in local user mode
#6062 merged
Jul 16, 2025 -
tests: add QA test cases
#5959 merged
Jul 16, 2025 -
[nvbug/5387226] chore: add propogation for trust_remote_code to AutoConfig
#6001 merged
Jul 16, 2025 -
[nvbug/5359218][tests] add test llm api test case on lookahead with chunked prefill
#6051 merged
Jul 16, 2025 -
feat: Add deepseek-lite tests for RTX pro 6000
#5903 merged
Jul 16, 2025 -
Cherry Pick: PR #6076
#6088 merged
Jul 16, 2025 -
[TRTLLM-6471] Infra: Upgrade NIXL to 0.3.1
#5991 merged
Jul 16, 2025 -
feat: use session abstraction in data transceiver and cache formatter
#5611 merged
Jul 16, 2025 -
[nvbug/5347489][nvbug/5388036] increase timeout in disagg worker test
#6041 merged
Jul 16, 2025 -
[None] - Waive L0 tests
#6082 merged
Jul 16, 2025 -
chroe: upgrade modelopt to 0.33
#6058 merged
Jul 16, 2025 -
chore: Bump version to 1.0.0rc4
#6086 merged
Jul 16, 2025 -
fix: Unable to load phi4-model with tp_size>1
#5962 merged
Jul 16, 2025 -
[fix] Fix Triton build
#6076 merged
Jul 16, 2025 -
feat: Add support for Triton request cancellation
#5898 merged
Jul 16, 2025 -
feat/add latency support for trtllm bench
#3730 merged
Jul 15, 2025
69 Pull requests opened by 58 people
-
[fix] Correct handling of NVFP4 block scaling factors in preprocessing for MoE
#6069 opened
Jul 15, 2025 -
[Draft] Inter-request kv cache manager support for HSTU
#6077 opened
Jul 16, 2025 -
support JIT mha.cu for SPEC_DEC in runtime
#6078 opened
Jul 16, 2025 -
[TRTLLM-6471] Infra: Upgrade NIXL to 0.3.1 and unwaive test
#6084 opened
Jul 16, 2025 -
[TRTLLM-6444] Add some UCX trouble shooting docs and print UCX related logs
#6085 opened
Jul 16, 2025 -
ucx establish connection with zmq
#6090 opened
Jul 16, 2025 -
[draft][TRTLLM-1302][feat]: topk logprobs for TRT backend & top1 logprob for PyT backend
#6097 opened
Jul 16, 2025 -
Draft:FP8 R1
#6100 opened
Jul 16, 2025 -
[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model
#6104 opened
Jul 16, 2025 -
[nvbug/5393849]: phi4-mini will generate garbage outputs with tp_size>1 with trt backend
#6108 opened
Jul 17, 2025 -
Add disagg launcher scripts
#6109 opened
Jul 17, 2025 -
add safe chunked broadcast
#6110 opened
Jul 17, 2025 -
feat: Support server reload
#6116 opened
Jul 17, 2025 -
[TRTLLM-5061] chore: add tags to API reference
#6123 opened
Jul 17, 2025 -
[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (#5570)
#6124 opened
Jul 17, 2025 -
perf: Add MOE support for dynamic cluster shapes and custom epilogue …
#6126 opened
Jul 17, 2025 -
Cherry-pick moe sort (and all its dependencies)
#6127 opened
Jul 17, 2025 -
infra: [TRTLLM-6499] Split L0_Test into two pipeline by single GPU and multi GPU(For SBSA)
#6132 opened
Jul 17, 2025 -
[TRTLLM-6549] chore: record delay introduced by disaggregated serving in kv cache measure
#6135 opened
Jul 17, 2025 -
[nvbug/5322354] fix PD + MTP + overlap scheduler accuracy issue
#6136 opened
Jul 17, 2025 -
[Perf]: Add residual, norm for nemotron_nas models
#6157 opened
Jul 17, 2025 -
[linting] Enable ruff on more files (wave 2/N)
#6162 opened
Jul 17, 2025 -
DON'T MERGE: log paused requests info
#6174 opened
Jul 18, 2025 -
[TRTLLM-6357][test] Add accuracy tests for Qwen3
#6177 opened
Jul 18, 2025 -
feat: Support Aggregate mode for phi4-mm
#6184 opened
Jul 18, 2025 -
fix: Ensure that Python stub generation works against libnvidia-ml stubs
#6188 opened
Jul 18, 2025 -
DRAFT Changes for multi stream executor
#6190 opened
Jul 18, 2025 -
[AutoDeploy] merge feat/ad-2025-07-07
#6196 opened
Jul 18, 2025 -
[nvbugs/5361178] feat: json_schema support in trtllm-serve using xgrammar
#6197 opened
Jul 18, 2025 -
Draft: Qwen3: Fix eagle hidden states
#6199 opened
Jul 20, 2025 -
[https://nvbugs/5378031] Hopper W4A8 MoE supports ModelOpt ckpt for PyT backend
#6200 opened
Jul 20, 2025 -
Draft: Nanobind integration tests
#6203 opened
Jul 20, 2025 -
[TRTLLM-6445] feat: Enable AllReduce-associated fusion patterns in Llama3/4.
#6205 opened
Jul 20, 2025 -
Draft: Deepseek: Start Eagle work
#6210 opened
Jul 21, 2025 -
Draft: Feat/support lora cuda graph
#6215 opened
Jul 21, 2025 -
feat: Enable TRTLLM sampler by default
#6216 opened
Jul 21, 2025 -
[TRTLLM-6650][feat] Enhance beam search support with CUDA graph integration
#6217 opened
Jul 21, 2025 -
[5830][feat] Improve LoRA cache memory control
#6220 opened
Jul 21, 2025 -
[TRTLLM-6651][feat] Enable Overlap scheduler + Beam Search in TRTLLM Sampler
#6223 opened
Jul 21, 2025 -
[nvbugs/5401261][fix] Fix Triton backend disaggregated serving support
#6224 opened
Jul 21, 2025 -
Bump version to 0.21.1
#6225 opened
Jul 21, 2025 -
Change the all-reduce strategy to NCCL
#6226 opened
Jul 21, 2025 -
[fix] Allow custom model config for Kimi-K2
#6228 opened
Jul 21, 2025 -
Waive flaky tests
#6229 opened
Jul 21, 2025 -
Remove input_sf swizzle for module WideEPMoE
#6231 opened
Jul 21, 2025 -
Auto-enable ngram with concurrency <= 32.
#6232 opened
Jul 21, 2025 -
[Fix][Nvbug 5401163] Fix bug of MoE on tp > 1 with trtllm moe backend
#6235 opened
Jul 21, 2025 -
[nvbug/5376229]: Remove flash-attn dependency from test_ptp_quickstart_multimodal
#6236 opened
Jul 21, 2025 -
[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoShot Allreduce
#6237 opened
Jul 21, 2025 -
fix: nvbug_5398806
#6239 opened
Jul 21, 2025 -
Add Acceptance Rate calculation to benchmark_serving
#6240 opened
Jul 22, 2025 -
[PERF] Don't use hmac encryption for loopback interfaces
#6241 opened
Jul 22, 2025 -
Chore: remove duplicate should_stop_processing check
#6242 opened
Jul 22, 2025 -
[feat] Support NVFP4 KV Cache
#6244 opened
Jul 22, 2025 -
[TRTLLM-5627] feat: Implement pytorch sampler for MTP
#6245 opened
Jul 22, 2025 -
feat: Add support of scheduling attention dp request
#6246 opened
Jul 22, 2025 -
update disagg slurm scripts
#6248 opened
Jul 22, 2025 -
Improve TransferAgentTest.SyncMessage
#6250 opened
Jul 22, 2025 -
@coderabbitai title
#6251 opened
Jul 22, 2025 -
feat: support SharedTensor on MultimodalParams
#6254 opened
Jul 22, 2025 -
[TRTLLM-4279] fix: Add fake implementations for several custom ops and add a protection test
#6257 opened
Jul 22, 2025 -
[https://nvbugs/5402719][fix]: Skip CUDA graph dummy request in spec decoding
#6258 opened
Jul 22, 2025 -
[fix]: Revert commit 48ddc3d & add test for disagg server with different max_num_tokens
#6259 opened
Jul 22, 2025 -
[https://nvbugs/5387771] fix deadlocks due to insufficient numSemaphores
#6262 opened
Jul 22, 2025 -
[TRTLLM-6654][feat] Add support for external multimodal embeddings
#6263 opened
Jul 22, 2025 -
add eagle3 one model disagg accuracy test
#6264 opened
Jul 22, 2025 -
fix: Fixing kv_cache_events unit tests [nvbug5362412]
#6265 opened
Jul 22, 2025
14 Issues closed by 10 people
-
[LOAD MODEL] Can't load gemma3 models
#6193 closed
Jul 22, 2025 -
Empty response when setting --reasoning_parser "deepseek-r1" for EXAONE 4.0 32B model
#6183 closed
Jul 22, 2025 -
[AutoDeploy] Create export patch registry
#5728 closed
Jul 22, 2025 -
Running TensorRT-LLM Using Docker Setup failure
#6238 closed
Jul 22, 2025 -
Using Tensorrt Whisper in Triton with your examples
#6144 closed
Jul 21, 2025 -
[PERF] Qwen-VLs. Speedup by avoiding MPI overhead
#5927 closed
Jul 18, 2025 -
Attention Pattern Matching with Inductor Utilities
#4404 closed
Jul 18, 2025 -
test
#6155 closed
Jul 17, 2025 -
Guided decoding is not supported with overlap scheduler
#5858 closed
Jul 17, 2025 -
Qwen3-235B-A22B-FP8 error in sampling
#5803 closed
Jul 16, 2025 -
[AutoDeploy] ABC for Graph Transformation
#4328 closed
Jul 15, 2025 -
[AutoDeploy] Configuration System for Transformation
#4366 closed
Jul 15, 2025 -
[AutoDeploy] Arch1: Configurable transformation pipeline
#4327 closed
Jul 15, 2025 -
[AutoDeploy] Example Transformation in new configuration system
#4367 closed
Jul 15, 2025
27 Issues opened by 23 people
-
Set batch sizes in --extra_llm_api_options or check if we can avoid explicitly setting it
#6261 opened
Jul 22, 2025 -
Switch to use torch-opt in the trtllm-bench dashboard
#6260 opened
Jul 22, 2025 -
[Infefficient design] ModelRunnerCpp - optionally return Tensor instead of List
#6218 opened
Jul 21, 2025 -
Pytorch workflow Logits Processor problem
#6214 opened
Jul 21, 2025 -
Can't perform streaming
#6207 opened
Jul 21, 2025 -
Qwen3 pytorch backend stuck when sampling params n > 1
#6204 opened
Jul 20, 2025 -
Support `LogitsPostProcessorConfig` in Triton Backend
#6202 opened
Jul 20, 2025 -
How to specify stop_token_ids and repetition_penalty in trtllm-serve serve
#6198 opened
Jul 19, 2025 -
[Feature Request] Introduce a LayerNorm module in PyTorch workflow
#6187 opened
Jul 18, 2025 -
[Feature Request] Simplify QK layernorm implementation in PyTorch workflow
#6186 opened
Jul 18, 2025 -
convert_checkpoint.py fails on Qwen/Qwen3-0.6B: unexpected qkv weight shape
#6182 opened
Jul 18, 2025 -
[QES] Could I use `mpi_run` with trtllm-pytorch backend?
#6179 opened
Jul 18, 2025 -
[Feature Request] Add optimized support for DiT model
#6175 opened
Jul 18, 2025 -
Error when deploying tensorrtllm backend using triton
#6169 opened
Jul 18, 2025 -
Error serve Qwen3 model quantization int8 with modelopt
#6167 opened
Jul 18, 2025 -
Add support for meta-llama/Llama-Guard-4-12B
#6164 opened
Jul 18, 2025 -
Add support for meta-llama/Llama-Prompt-Guard-2-86M
#6159 opened
Jul 17, 2025 -
speculative decoding for B200
#6158 opened
Jul 17, 2025 -
Sesame CSM in Tensorrt-llm
#6156 opened
Jul 17, 2025 -
Function tooling like in vllm
#6154 opened
Jul 17, 2025 -
Optimize torch.compile's recompilation limit
#6142 opened
Jul 17, 2025 -
w4fp8 with mtp load model error
#6137 opened
Jul 17, 2025 -
[AutoDeploy] Architecture for VLM support
#6120 opened
Jul 17, 2025 -
The performance of pytorch backend is worse than that of C backend
#6119 opened
Jul 17, 2025 -
from tensorrt_llm.bindings import DataType, GptJsonConfig ImportError
#6102 opened
Jul 16, 2025 -
Deploying qwen2.5-1.5B VRAM in 3090 exploded
#6091 opened
Jul 16, 2025
87 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[Infra] Add nightly pipeline to generate lock files
#5798 commented on
Jul 18, 2025 • 66 new comments -
Add basic Nemo Ckpt Lora Loading in pytorch flow
#6019 commented on
Jul 22, 2025 • 53 new comments -
feat: best_of/n for pytorch workflow
#5997 commented on
Jul 22, 2025 • 30 new comments -
feat: Core Metrics Implementation
#5785 commented on
Jul 22, 2025 • 26 new comments -
[TRTLLM-5633] - Merge current waive list with the TOT waive list
#5198 commented on
Jul 22, 2025 • 23 new comments -
feat: LLM sleep & wakeup Part 1: virtual device memory
#5034 commented on
Jul 22, 2025 • 15 new comments -
[nvbugs/5302040] feat. Add whisper support (Bert Attention on SM100 and GPTAttention for cross attention on SM100)
#5527 commented on
Jul 21, 2025 • 13 new comments -
Add vLLM KV Pool support for XQA kernel
#6013 commented on
Jul 22, 2025 • 13 new comments -
infra: [TRTLLM-5873] Use build stage wheels to speed up docker release image build
#4939 commented on
Jul 18, 2025 • 9 new comments -
[feat] low precision all2all
#6047 commented on
Jul 22, 2025 • 8 new comments -
fix/improve kvcache allocation in PyTorch runtime
#5933 commented on
Jul 22, 2025 • 8 new comments -
[feat]: support logit_bias
#5354 commented on
Jul 18, 2025 • 7 new comments -
[TRTLLM-5061] chore: add tags to API reference
#5707 commented on
Jul 22, 2025 • 7 new comments -
Draft: [TRTLLM-4719][enhance] Refactor to add scenarios to trtllm-bench
#6023 commented on
Jul 21, 2025 • 7 new comments -
Blackwell/Hopper MoE Gemm2+Finalize fusion
#3294 commented on
Jul 17, 2025 • 6 new comments -
[fix] Add detokenization-based stop word logic to LLM API
#5948 commented on
Jul 22, 2025 • 6 new comments -
feat: spec dec with external API
#5990 commented on
Jul 22, 2025 • 6 new comments -
Mtp optimizations round1
#5689 commented on
Jul 22, 2025 • 5 new comments -
doc: Refactor documents and examples of disaggregated serving and wide ep
#6054 commented on
Jul 22, 2025 • 5 new comments -
[feat] : Add FP8 context MLA support for SM120
#6059 commented on
Jul 21, 2025 • 5 new comments -
[1/N] Add NCCL Symmetric Integration for All Reduce
#4500 commented on
Jul 19, 2025 • 5 new comments -
[TRTLLM-5312] - Add bot run rules for triton tests
#4988 commented on
Jul 22, 2025 • 4 new comments -
doc: Adding disaggregated serving page to features section for 1.0 docs [TRTLLM-6086]
#6024 commented on
Jul 22, 2025 • 4 new comments -
tests: add test_chunked_prefill for llama4
#5549 commented on
Jul 22, 2025 • 3 new comments -
feat:[AutoDeploy] Support llama4 FP8
#5935 commented on
Jul 22, 2025 • 2 new comments -
[TRTLLM-5996][feat] FP8 blockwise scaling GEMM support on Blackwell
#5987 commented on
Jul 17, 2025 • 2 new comments -
[security] add nspce allow list for false positive secrets
#5797 commented on
Jul 22, 2025 • 2 new comments -
Draft: [TRTLLM-5234][feature] Add a serve subcommand to trtllm-bench
#6025 commented on
Jul 22, 2025 • 2 new comments -
[feat] Implement pytorch sampler for MTP
#5627 commented on
Jul 17, 2025 • 2 new comments -
[FIX] Fix of build with ENABLE_MULTI_DEVICE=0. Fix Qwen-VL fail with request wo MM data
#6063 commented on
Jul 22, 2025 • 2 new comments -
[TRTLLM-5508] feat: check input tokens + improve error handling
#5170 commented on
Jul 22, 2025 • 1 new comment -
Nixl agent
#5488 commented on
Jul 17, 2025 • 1 new comment -
[Infra][TRTLLM-6224] - Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1
#5678 commented on
Jul 21, 2025 • 1 new comment -
[fix] improve head_dim calculation in Qwen config
#5913 commented on
Jul 21, 2025 • 1 new comment -
[Issue/5952][feat] Support JSON Schema in OpenAI-Compatible API
#5957 commented on
Jul 16, 2025 • 1 new comment -
Pass mode & directory
#5983 commented on
Jul 21, 2025 • 1 new comment -
[PERF] Move calculation Qwen2-VL's rotary_cos_sin to LLM worker process
#6004 commented on
Jul 22, 2025 • 1 new comment -
[TRTLLM-4413][infra] Add additional nightly build of wheel package with local version besides ordinary one
#4570 commented on
Jul 18, 2025 • 1 new comment -
[feat] Support NVFP4 KV Cache
#5132 commented on
Jul 17, 2025 • 0 new comments -
[refactor] Simplification of Speculative decoding configs - Part 2
#5936 commented on
Jul 22, 2025 • 0 new comments -
Add new debug hooks to trace the memory usage and where the process hangs at
#5943 commented on
Jul 18, 2025 • 0 new comments -
Assertion failed: Must set crossKvCacheFraction for encoder-decoder model
#2419 commented on
Jul 16, 2025 • 0 new comments -
Pip installed tensorrt-llm for ubutnu24.04 seems broken,both on host system and docker container.
#4459 commented on
Jul 16, 2025 • 0 new comments -
[MLPerf] Potential fix for response queue size
#5972 commented on
Jul 21, 2025 • 0 new comments -
[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache
#5974 commented on
Jul 22, 2025 • 0 new comments -
[PERF] MM models. Transfer images main->worker in low precision
#6068 commented on
Jul 16, 2025 • 0 new comments -
feat: Simplify and Improve Whisper Example
#5984 commented on
Jul 16, 2025 • 0 new comments -
InternVL3 support
#4625 commented on
Jul 16, 2025 • 0 new comments -
Slower than expected inference with p-tuning/prompt embedding enabled, when PromptTuningConfig is provided
#5889 commented on
Jul 15, 2025 • 0 new comments -
chore: set default device to cpu on Multimodal models
#5994 commented on
Jul 22, 2025 • 0 new comments -
speculative decoding in gemma3
#6067 commented on
Jul 15, 2025 • 0 new comments -
How to run cross attention with different number of q and kv tokens
#6066 commented on
Jul 15, 2025 • 0 new comments -
Why Are Certain Layer Normalization Parameters Cast to trt_llm_config.dtype Instead of Using float32 for Better Precision?
#6052 commented on
Jul 15, 2025 • 0 new comments -
Sample script works with TP_SIZE=2 but fails with PP_SIZE=2
#5970 commented on
Jul 15, 2025 • 0 new comments -
Mistral Small 3.2 architecture
#5968 commented on
Jul 15, 2025 • 0 new comments -
[Feature Request] Direct JSON Schema Support in OpenAI-Compatible API
#5952 commented on
Jul 15, 2025 • 0 new comments -
DS-V3 W4FP8 is OOM with 1.0.0rc2, but 1.0.0rc1 is fine
#5950 commented on
Jul 15, 2025 • 0 new comments -
Only Build CPP benchmarks
#5930 commented on
Jul 15, 2025 • 0 new comments -
Add fmha test
#6050 commented on
Jul 17, 2025 • 0 new comments -
trtllm-serve crashes for qwen3 235b a22b fp8, if enable_block_reuse false + tp 8
#5872 commented on
Jul 15, 2025 • 0 new comments -
为什么执行 python scripts/build_wheel.py --clean 克隆速度那么慢
#5871 commented on
Jul 15, 2025 • 0 new comments -
optimize: ADP schedule optimization
#6061 commented on
Jul 16, 2025 • 0 new comments -
How to enable deepep(especially cuda graph in generation) in PD disaggrated serving
#5869 commented on
Jul 15, 2025 • 0 new comments -
[Infra]Add rtx pro 6000 post-merge tests for DS
#5126 commented on
Jul 22, 2025 • 0 new comments -
[TRTLLM-5990]doc:trtllm-serve doc improvement.
#5220 commented on
Jul 22, 2025 • 0 new comments -
Draft: chore: Make GEMM config enums human readable for better logging
#4111 commented on
Jul 17, 2025 • 0 new comments -
fix: Enable num_return_sequences (`n`) support in PyTorch backend
#5415 commented on
Jul 17, 2025 • 0 new comments -
test: test debug hook
#3317 commented on
Jul 18, 2025 • 0 new comments -
Title: KeyError: 'gemma3' error in GemmaConfig.from_hugging_face when converting Gemma 3 model
#4825 commented on
Jul 22, 2025 • 0 new comments -
DS-R1 W4AFP8 slow-down over DS-R1 FP8 unless ep=None
#5928 commented on
Jul 22, 2025 • 0 new comments -
Broken checkpoint converter for whisper
#5210 commented on
Jul 21, 2025 • 0 new comments -
Rebase fp8 blockwise gemm autotune
#5635 commented on
Jul 17, 2025 • 0 new comments -
[TRTLLM-5966][feat] Initial steps towards Helix parallelism support
#5668 commented on
Jul 17, 2025 • 0 new comments -
Force KV Cache Offload
#3130 commented on
Jul 18, 2025 • 0 new comments -
[feat] add support for Eclairv2 model
#5686 commented on
Jul 16, 2025 • 0 new comments -
[AutoDeploy] Arch4: Quantization and Mixed Dtype Support
#5046 commented on
Jul 18, 2025 • 0 new comments -
Error in mpi4py when using official sanity check code from TensorRT-LLM v0.19.0rc0 on NVIDIA 5090
#3705 commented on
Jul 18, 2025 • 0 new comments -
hopper-style context MLA
#5713 commented on
Jul 21, 2025 • 0 new comments -
Move existing transformation into new configurable system
#4403 commented on
Jul 17, 2025 • 0 new comments -
Qwen3MoeForCausalLM Not Supported when exporting to TensorRT-LLM format
#5978 commented on
Jul 17, 2025 • 0 new comments -
[AutoDeploy] Investigate torch.export as a preprocessing step to InferenceOptimizer
#4704 commented on
Jul 17, 2025 • 0 new comments -
refactor: decoding inputs, part 2
#5799 commented on
Jul 18, 2025 • 0 new comments -
[WIP] Block diffusion
#5890 commented on
Jul 16, 2025 • 0 new comments -
[AutoDeploy] Better PyExecutor Integration
#4307 commented on
Jul 17, 2025 • 0 new comments -
fix: compatibility with CUDA < 12.9 on `__CUDA_ARCH_SPECIFIC__` macro
#5917 commented on
Jul 22, 2025 • 0 new comments -
Lookahead Decoding broken with Prompt Embedding Table/Multimodal
#6009 commented on
Jul 17, 2025 • 0 new comments -
Profiling time by Nsight Compute is too long
#5979 commented on
Jul 16, 2025 • 0 new comments