NVIDIA / TensorRT-LLM Public

Notifications
Fork 1.3k
Star 10k

Code
Issues 447
Pull requests 142
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: NVIDIA/TensorRT-LLM

TensorRT-LLM Requests

#632 opened Dec 11, 2023 by ncomly-nvidia

Open 15

[Issue Template]Short one-line summary of the issue #270

#783 opened Jan 1, 2024 by juney-nvidia

Open

Labels 37 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

447 Open 1,879 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[bug] SamplingConfig no_repeat_ngram_size=0 trigger a bug

#3150 opened Mar 29, 2025 by weedge

How to build TensorRT-LLM engine on host and deploy to Jetson Orin Nano Super? question

Further information is requested

triaged

Issue has been triaged by maintainers

#3149 opened Mar 29, 2025 by Sesameisgod

[RFC]Feedback collection about TensorRT-LLM 1.0 Release Planning and API Compatibility Commitment Community Engagement RFC

#3148 opened Mar 29, 2025 by juney-nvidia

When will Gemma 3 be supported? feature request

New feature or request

triaged

Issue has been triaged by maintainers

#3143 opened Mar 29, 2025 by bebilli

Executor API: How to get throughput Investigating Performance

Issue about performance number

triaged

Issue has been triaged by maintainers

#3142 opened Mar 28, 2025 by khayamgondal

[Feature] Prompt lookup speculative decoding for LLM API Community Engagement feature request

New feature or request

#3138 opened Mar 28, 2025 by tonyay163

Lookahead decoding and multimodal input support question

Further information is requested

triaged

Issue has been triaged by maintainers

#3137 opened Mar 28, 2025 by maxilevi

Force KV Cache Offload question

Further information is requested

triaged

Issue has been triaged by maintainers

#3130 opened Mar 27, 2025 by khayamgondal

Model built with ReDrafter produces substantially lower quality outputs bug

Something isn't working

#3125 opened Mar 27, 2025 by geaned

2 of 4 tasks

[RFC]Topics you want to discuss with TensorRT-LLM team in the upcoming meet-ups Community Engagement RFC

#3124 opened Mar 27, 2025 by juney-nvidia

CUDA Device Binding Runtime Error When Running GPT-3 in Multi-Node Mode Using Slurm bug

Something isn't working

triaged

Issue has been triaged by maintainers

#3123 opened Mar 27, 2025 by glara76

4 tasks

How to implement attention when query and value have different hidden dims? question

Further information is requested

triaged

Issue has been triaged by maintainers

#3121 opened Mar 27, 2025 by ChaseMonsterAway

Unable to run Deepseek R1 on blackwell bug

Something isn't working

triaged

Issue has been triaged by maintainers

#3118 opened Mar 27, 2025 by pankajroark

1 of 4 tasks

.devcontainer points to internal Docker image feature request

New feature or request

triaged

Issue has been triaged by maintainers

#3111 opened Mar 26, 2025 by aspctu

How to reproduce 150 TPS using FP8 + MTP=0 + BSZ=1 on H200? triaged

Issue has been triaged by maintainers

#3108 opened Mar 26, 2025 by ghostplant

How to achieve 253 tok/sec with DeepSeek-R1-FP4 on 8xB200 triaged

Issue has been triaged by maintainers

#3058 opened Mar 25, 2025 by jeffye-dev

[RFC] [PyTorch Flow] Re-implement LlmRequest and Scheduler in pure Python RFC

#3034 opened Mar 24, 2025 by QiJune

Same GPU build, same files, but got the error: The engine plan file is generated on an incompatible device, expecting compute 9.0 got compute 8.9, please rebuild. bug

Something isn't working

#3031 opened Mar 24, 2025 by JoJoLev

4 tasks

TensorRT-LLM[Branchv0.12.0-jetson] Quick confirmation: Gemma2 not supported yet?

#2974 opened Mar 21, 2025 by sdecoder

[Question] Why delete q_b_scale kv_b_scale k_b_trans_scale not a bug

Some known limitation, but not a bug.

#2970 opened Mar 21, 2025 by nanmi

Request for Reproduction Configuration of DeepSeek-R1 on H200 & B200 triaged

Issue has been triaged by maintainers

#2964 opened Mar 20, 2025 by xwuShirley

Running into free(): double free detected in tcache 2 when using trtllm-bench in a multi-node scenario Investigating triaged

Issue has been triaged by maintainers

#2953 opened Mar 19, 2025 by snl-nvda

TypeError in convert_checkpoint.py During Model Conversion : nvidia/Llama-3_3-Nemotron-Super-49B-v1

#2952 opened Mar 19, 2025 by imenselmi

Does trtllm-serve enables prefix caching automatically with Deepseek-R1? triaged

Issue has been triaged by maintainers

#2932 opened Mar 17, 2025 by Bihan

What's the throughput of R1 671B using bs=1 without quant? not a bug

Some known limitation, but not a bug.

#2928 opened Mar 17, 2025 by ghostplant

Previous 1 2 3 4 5 … 17 18 Next

Previous Next

ProTip! Updated in the last three days: updated:>2025-03-26.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly