[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend #5752

Superjomn · 2025-07-04T07:16:09Z

PR title

Please write the PR title by following template:

[JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] <summary of this PR>

For example, assume I have a PR hope to support a new feature about cache manager of Jira TRTLLM-1000 ticket, it would be like

[TRTLLM-1000][feat] Support a new feature about cache manager

Description

Please explain the issue and the solution in short.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Superjomn · 2025-07-04T09:30:31Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-07-04T09:36:04Z

PR_Github #10985 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-04T10:41:50Z

PR_Github #10985 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8114 completed with status: 'FAILURE'

Superjomn · 2025-07-06T00:06:26Z

/bot run

tensorrt-cicd · 2025-07-06T00:12:13Z

PR_Github #11041 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-06T01:18:50Z

PR_Github #11041 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8162 completed with status: 'FAILURE'

Superjomn · 2025-07-06T07:05:48Z

/bot run --disable-fail-fast

Superjomn · 2025-07-07T02:50:00Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-07-07T02:55:05Z

PR_Github #11089 [ run ] triggered by Bot

Superjomn · 2025-07-07T02:59:10Z

/bot kill

tensorrt-cicd · 2025-07-07T03:04:04Z

PR_Github #11091 [ kill ] triggered by Bot

tensorrt-cicd · 2025-07-07T03:04:05Z

PR_Github #11089 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-07-07T03:04:36Z

PR_Github #11091 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit f9c7310

Superjomn · 2025-07-07T06:37:38Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-07-07T06:42:39Z

PR_Github #11112 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-07T09:49:46Z

PR_Github #11112 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8218 completed with status: 'FAILURE'

Superjomn · 2025-07-07T14:14:04Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-07-07T14:19:15Z

PR_Github #11158 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-07T17:35:24Z

PR_Github #11158 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8251 completed with status: 'FAILURE'

FrankD412 · 2025-07-07T19:20:30Z

tensorrt_llm/bench/benchmark/utils/general.py

@@ -89,6 +89,9 @@ def get_settings(params: dict, dataset_metadata: DatasetMetadata, model: str,
    if extra_llm_api_options:
        with open(extra_llm_api_options, 'r') as f:
            llm_args_dict = yaml.safe_load(f)
+            if "kv_cache_config" in llm_args_dict:
+                kv_cache_dtype = llm_args_dict["kv_cache_config"].get(
+                    "dtype", "auto")

        if "kv_cache_dtype" in llm_args_dict:


Shouldn't we just augment the retrieval of the dtype in this conditional?

Yeah, the "kv_cache_dtype" is merged to KvCacheConfig (a pure python config) as "dtype", so here it tries to fetch it from the kv_cache_config.

Superjomn · 2025-07-08T07:20:36Z

/bot run

tensorrt-cicd · 2025-07-08T07:26:16Z

PR_Github #11247 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-08T09:06:24Z

PR_Github #11247 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8318 completed with status: 'FAILURE'

Superjomn · 2025-07-08T10:19:56Z

/bot run

tensorrt-cicd · 2025-07-08T10:24:55Z

PR_Github #11281 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-08T11:29:04Z

PR_Github #11281 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8344 completed with status: 'FAILURE'

Superjomn · 2025-07-08T13:43:11Z

/bot run

tensorrt-cicd · 2025-07-08T13:48:23Z

PR_Github #11309 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-08T16:04:32Z

PR_Github #11309 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8363 completed with status: 'FAILURE'

Superjomn · 2025-07-09T00:32:55Z

/bot run

tensorrt-cicd · 2025-07-09T00:39:12Z

PR_Github #11358 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-09T02:22:39Z

PR_Github #11358 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8406 completed with status: 'FAILURE'

Superjomn · 2025-07-09T05:36:58Z

/bot run --stage-list "A30-PyTorch-2"

Superjomn · 2025-07-09T05:51:54Z

/bot run --stage-list "A30-PyTorch-2"

tensorrt-cicd · 2025-07-09T05:56:57Z

PR_Github #11400 [ run ] triggered by Bot

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Superjomn · 2025-07-09T07:14:41Z

/bot run

tensorrt-cicd · 2025-07-09T07:19:45Z

PR_Github #11406 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-09T07:19:47Z

PR_Github #11400 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-07-09T11:38:28Z

PR_Github #11406 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8437 completed with status: 'FAILURE'

Superjomn · 2025-07-09T14:07:43Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-07-09T14:12:53Z

PR_Github #11442 [ run ] triggered by Bot

Superjomn requested a review from a team as a code owner July 4, 2025 07:16

Superjomn changed the title ~~[BREAKING CHANGE]: unify KvCacheConfig in LLM class for pytorch backend~~ [TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend Jul 4, 2025

Superjomn force-pushed the change-kvcache-config branch from 5df018b to d0f6cb0 Compare July 6, 2025 00:06

Superjomn force-pushed the change-kvcache-config branch 2 times, most recently from 6abe190 to bf0839b Compare July 6, 2025 07:05

Superjomn force-pushed the change-kvcache-config branch from bf0839b to f9c7310 Compare July 7, 2025 02:59

Superjomn requested review from QiJune and lucaslie July 7, 2025 02:59

Superjomn force-pushed the change-kvcache-config branch from f9c7310 to 67a8191 Compare July 7, 2025 14:12

FrankD412 requested changes Jul 7, 2025

View reviewed changes

Superjomn force-pushed the change-kvcache-config branch from 67a8191 to aff5c11 Compare July 8, 2025 07:20

Superjomn force-pushed the change-kvcache-config branch from aff5c11 to 57dcea6 Compare July 8, 2025 10:09

Superjomn force-pushed the change-kvcache-config branch from 57dcea6 to fefcca4 Compare July 8, 2025 13:42

Superjomn force-pushed the change-kvcache-config branch 2 times, most recently from ff511c3 to 712c60b Compare July 9, 2025 00:28

Superjomn force-pushed the change-kvcache-config branch from 712c60b to d37c6f9 Compare July 9, 2025 05:35

Superjomn force-pushed the change-kvcache-config branch from d37c6f9 to 4c7f2e6 Compare July 9, 2025 05:51

Superjomn requested a review from FrankD412 July 9, 2025 06:18

init

5634f17

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Superjomn force-pushed the change-kvcache-config branch from 4c7f2e6 to 5634f17 Compare July 9, 2025 07:14

[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend #5752

Are you sure you want to change the base?

[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend #5752

Conversation

Superjomn commented Jul 4, 2025

PR title

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

Superjomn commented Jul 4, 2025

Uh oh!

tensorrt-cicd commented Jul 4, 2025

Uh oh!

tensorrt-cicd commented Jul 4, 2025

Uh oh!

Superjomn commented Jul 6, 2025

Uh oh!

tensorrt-cicd commented Jul 6, 2025

Uh oh!

tensorrt-cicd commented Jul 6, 2025

Uh oh!

Superjomn commented Jul 6, 2025

Uh oh!

Superjomn commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

Superjomn commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

Superjomn commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

Superjomn commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

FrankD412 Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Superjomn Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Superjomn commented Jul 8, 2025

Uh oh!

tensorrt-cicd commented Jul 8, 2025

Uh oh!

tensorrt-cicd commented Jul 8, 2025

Uh oh!

Superjomn commented Jul 8, 2025

Uh oh!

tensorrt-cicd commented Jul 8, 2025

Uh oh!

tensorrt-cicd commented Jul 8, 2025

Uh oh!

Superjomn commented Jul 8, 2025

Uh oh!

tensorrt-cicd commented Jul 8, 2025

Uh oh!

tensorrt-cicd commented Jul 8, 2025

Uh oh!

Superjomn commented Jul 9, 2025

Uh oh!

tensorrt-cicd commented Jul 9, 2025

Uh oh!

tensorrt-cicd commented Jul 9, 2025