-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
How to build TensorRT-LLM engine on host and deploy to Jetson Orin Nano Super?
question
Further information is requested
triaged
Issue has been triaged by maintainers
#3149
opened Mar 29, 2025 by
Sesameisgod
When will Gemma 3 be supported?
feature request
New feature or request
triaged
Issue has been triaged by maintainers
#3143
opened Mar 29, 2025 by
bebilli
Executor API: How to get throughput
Investigating
Performance
Issue about performance number
triaged
Issue has been triaged by maintainers
#3142
opened Mar 28, 2025 by
khayamgondal
[Feature] Prompt lookup speculative decoding for LLM API
Community Engagement
feature request
New feature or request
#3138
opened Mar 28, 2025 by
tonyay163
Lookahead decoding and multimodal input support
question
Further information is requested
triaged
Issue has been triaged by maintainers
#3137
opened Mar 28, 2025 by
maxilevi
Force KV Cache Offload
question
Further information is requested
triaged
Issue has been triaged by maintainers
#3130
opened Mar 27, 2025 by
khayamgondal
Model built with ReDrafter produces substantially lower quality outputs
bug
Something isn't working
#3125
opened Mar 27, 2025 by
geaned
2 of 4 tasks
[RFC]Topics you want to discuss with TensorRT-LLM team in the upcoming meet-ups
Community Engagement
RFC
#3124
opened Mar 27, 2025 by
juney-nvidia
CUDA Device Binding Runtime Error When Running GPT-3 in Multi-Node Mode Using Slurm
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#3123
opened Mar 27, 2025 by
glara76
4 tasks
How to implement attention when query and value have different hidden dims?
question
Further information is requested
triaged
Issue has been triaged by maintainers
#3121
opened Mar 27, 2025 by
ChaseMonsterAway
Unable to run Deepseek R1 on blackwell
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#3118
opened Mar 27, 2025 by
pankajroark
1 of 4 tasks
.devcontainer points to internal Docker image
feature request
New feature or request
triaged
Issue has been triaged by maintainers
#3111
opened Mar 26, 2025 by
aspctu
How to reproduce 150 TPS using FP8 + MTP=0 + BSZ=1 on H200?
triaged
Issue has been triaged by maintainers
#3108
opened Mar 26, 2025 by
ghostplant
How to achieve 253 tok/sec with DeepSeek-R1-FP4 on 8xB200
triaged
Issue has been triaged by maintainers
#3058
opened Mar 25, 2025 by
jeffye-dev
[RFC] [PyTorch Flow] Re-implement LlmRequest and Scheduler in pure Python
RFC
#3034
opened Mar 24, 2025 by
QiJune
Same GPU build, same files, but got the error: The engine plan file is generated on an incompatible device, expecting compute 9.0 got compute 8.9, please rebuild.
bug
Something isn't working
#3031
opened Mar 24, 2025 by
JoJoLev
4 tasks
TensorRT-LLM[Branchv0.12.0-jetson] Quick confirmation: Gemma2 not supported yet?
#2974
opened Mar 21, 2025 by
sdecoder
[Question] Why delete q_b_scale kv_b_scale k_b_trans_scale
not a bug
Some known limitation, but not a bug.
#2970
opened Mar 21, 2025 by
nanmi
Request for Reproduction Configuration of DeepSeek-R1 on H200 & B200
triaged
Issue has been triaged by maintainers
#2964
opened Mar 20, 2025 by
xwuShirley
Running into free(): double free detected in tcache 2 when using trtllm-bench in a multi-node scenario
Investigating
triaged
Issue has been triaged by maintainers
#2953
opened Mar 19, 2025 by
snl-nvda
TypeError in convert_checkpoint.py During Model Conversion : nvidia/Llama-3_3-Nemotron-Super-49B-v1
#2952
opened Mar 19, 2025 by
imenselmi
Does trtllm-serve enables prefix caching automatically with Deepseek-R1?
triaged
Issue has been triaged by maintainers
#2932
opened Mar 17, 2025 by
Bihan
What's the throughput of R1 671B using bs=1 without quant?
not a bug
Some known limitation, but not a bug.
#2928
opened Mar 17, 2025 by
ghostplant
Previous Next
ProTip!
Updated in the last three days: updated:>2025-03-26.