-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[nvbug 5304752][fix] enhance _check_arguments to filter illegal requests for pytorch backend #5541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/0.21
Are you sure you want to change the base?
[nvbug 5304752][fix] enhance _check_arguments to filter illegal requests for pytorch backend #5541
Conversation
/bot run |
PR_Github #10126 [ run ] triggered by Bot |
PR_Github #10126 [ run ] completed with state |
d0023fa
to
7635b98
Compare
/bot run |
PR_Github #10147 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR_Github #10147 [ run ] completed with state |
@@ -529,8 +529,13 @@ def _check_arguments(self, prompt_len: int, query_len: int, | |||
raise ValueError( | |||
f"PyTorch backend currently only supports `logprobs=1`. Received `logprobs={sampling_params.logprobs}` (Top{sampling_params.logprobs} logprobs). Please set `logprobs=1` in `sampling_params` instead." | |||
) | |||
return | |||
elif self.args.backend == "_autodeploy": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line expected to be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line will not be touched, if backend == "_autodeploy"
, execution will fall in the if statement in line 521, then it will return or raise error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucaslie Can you help confirm if the changes are ok? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that looks good. it's actually already removed on the main
branch:
TensorRT-LLM/tensorrt_llm/llmapi/llm.py
Lines 522 to 533 in 2b0c87e
if self.args.backend in ["pytorch", "_autodeploy"]: | |
# TODO: remove these checks after PyTorch backend | |
# fully support TopK prompt and generation logprobs. | |
if sampling_params.prompt_logprobs: | |
raise ValueError( | |
f"`prompt_logprobs` in sampling_params is not supported in the PyTorch backend yet. Received `prompt_logprobs={sampling_params.prompt_logprobs}`. Please unset this field." | |
) | |
if sampling_params.logprobs and sampling_params.logprobs > 1: | |
raise ValueError( | |
f"PyTorch backend currently only supports `logprobs=1`. Received `logprobs={sampling_params.logprobs}` (Top{sampling_params.logprobs} logprobs). Please set `logprobs=1` in `sampling_params` instead." | |
) | |
return |
if not self.args.enable_chunked_prefill: | ||
max_num_tokens = self.args.max_num_tokens | ||
if max_num_tokens and prompt_len / self.args.parallel_config.cp_size + query_len > max_num_tokens: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should just return an error request instead of stop the system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be catch by OpenaiServer
and it will return error response to the client, the service is still alive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto_deploy may not support the same features, so better to keep the skipping logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be easy, another backend checking before this error raising is enough. Thanks and will do.
if not self.args.enable_chunked_prefill: | ||
max_num_tokens = self.args.max_num_tokens | ||
if max_num_tokens and prompt_len / self.args.parallel_config.cp_size + query_len > max_num_tokens: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto_deploy may not support the same features, so better to keep the skipping logic here.
build_config = BuildConfig() | ||
build_config.max_num_tokens = 64 | ||
|
||
llm = LLM( | ||
model=llama_model_path, | ||
build_config=build_config, | ||
tensor_parallel_size=tp_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to remove the backend argument, so it is better to from tensorrt_llm import LLM as TorchLLM
and from tensorrt_llm._tensorrt_engine import LLM as TrtLLM
and test with both.
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
7635b98
to
a036283
Compare
/bot run |
PR_Github #10584 [ run ] triggered by Bot |
PR_Github #10584 [ run ] completed with state |
Description
This PR add check against the prompt length, to prevent a single illegal request from killing by the pytorch executor.
Test Coverage
Extend the backend of
test_llm_capture_request_error
to test pytorch backend.GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]
Launch build/test pipelines. All previously running jobs will be killed.
--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.