-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Refactor the topk parallelization part for the routing kernels
#5567
opened Jun 28, 2025 by
ChristinaZ
Loading…
test: Deprecate gpt_model_type "v1" static batching from triton_backe…
#5562
opened Jun 27, 2025 by
mc-nv
Loading…
Implement --served_model_name and improve command line parsing
#5561
opened Jun 27, 2025 by
pathorn
Loading…
[TRTLLM-4926][feat] Reimplement metrics endpoint with stats about requests
#5560
opened Jun 27, 2025 by
pathorn
Loading…
[fix] Use decorator for request cancelation and handle CancelledError
#5559
opened Jun 27, 2025 by
pathorn
Loading…
[nvbug/5337601][fix] Fix disagg + speculative decoding
#5558
opened Jun 27, 2025 by
Tabrizian
Loading…
Refactor moe permute and finalize op by removing duplicated code
#5557
opened Jun 27, 2025 by
limin2021
Loading…
[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend
#5554
opened Jun 27, 2025 by
xuanzic
Loading…
[feat] Support MXFP4 x BF16 Grouped GEMM in FusedMoE Pytorch Module
#5552
opened Jun 27, 2025 by
jinyangyuan-nvidia
Loading…
feat: Optimize TRTLLM Sampler perf single beam single step
#5550
opened Jun 27, 2025 by
dcampora
Loading…
rcca: test default kv_cache_reuse option for pytorch multimodal
#5544
opened Jun 27, 2025 by
StanleySun639
Loading…
[nvbug 5304752][fix]: enhance _check_arguments to filter illegal requests for pytorch backend
#5541
opened Jun 27, 2025 by
LinPoly
Loading…
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.