-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Add VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE & VLLM_ENABLE_INDUCTOR_COORDINA… #25493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…TE_DESCENT_TUNING in environment variable Signed-off-by: rouchenzi <ruochenwen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully exposes two Inductor tuning parameters, max_autotune
and coordinate_descent_tuning
, as environment variables. This is a valuable change that gives users more control over the trade-off between compilation time and potential runtime performance. The implementation is clean and follows existing patterns, and the PR description includes thorough test results. I have one suggestion to improve the robustness of how the new environment variables are parsed to prevent potential runtime errors.
"VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE": | ||
lambda: bool(int(os.getenv("VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE", "1"))), | ||
# If set to 1, enable coordinate_descent_tuning; | ||
# By default, this is enabled (1) | ||
"VLLM_ENABLE_INDUCTOR_COORDINATE_DESCENT_TUNING": | ||
lambda: bool(int(os.getenv("VLLM_ENABLE_INDUCTOR_COORDINATE_DESCENT_TUNING", | ||
"1"))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation bool(int(os.getenv(...)))
for parsing these boolean environment variables is not robust. It will raise a ValueError
if a user sets the variable to a non-integer string like "true" or "false". To improve user experience and prevent runtime crashes from misconfiguration, it's better to adopt a more resilient parsing pattern that is also used elsewhere in this file.
"VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE": | |
lambda: bool(int(os.getenv("VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE", "1"))), | |
# If set to 1, enable coordinate_descent_tuning; | |
# By default, this is enabled (1) | |
"VLLM_ENABLE_INDUCTOR_COORDINATE_DESCENT_TUNING": | |
lambda: bool(int(os.getenv("VLLM_ENABLE_INDUCTOR_COORDINATE_DESCENT_TUNING", | |
"1"))), | |
"VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE": | |
lambda: os.getenv("VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE", "1").lower() in ("1", "true"), | |
# If set to 1, enable coordinate_descent_tuning; | |
# By default, this is enabled (1) | |
"VLLM_ENABLE_INDUCTOR_COORDINATE_DESCENT_TUNING": | |
lambda: (os.getenv("VLLM_ENABLE_INDUCTOR_COORDINATE_DESCENT_TUNING", | |
"1").lower() in ("1", "true")), |
Can these be compilation config instead? Seems like they are not temporary... |
@simon-mo yeah was thinking about compilation config, but as those vars currently only apply to static induct config and there is ongoing compilation overhaul on CompilationConfig: #20283. So thinking it might be straightforward starting from env var to avoid conflicts, we could move to config incrementally once its more stable |
I think this should start as an env variable because of the compilation config overhaul. Alternatively we could add a |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: rouchenzi <40842833+rouchenzi@users.noreply.github.com>
vllm-project#25493) Signed-off-by: rouchenzi <ruochenwen@gmail.com> Signed-off-by: rouchenzi <40842833+rouchenzi@users.noreply.github.com>
#25493) Signed-off-by: rouchenzi <ruochenwen@gmail.com> Signed-off-by: rouchenzi <40842833+rouchenzi@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Set
max_autotune
&coordinate_descent_tuning
as env variable in inductor config for static shape compilation.Purpose
Those two inductor variables are hard-coded as
True
for inductor config of static compile shape: code ref, thus enabled by default whenevercompile_sizes
passed from vLLM engine args.For some specific model architectures, those two turnings might not be consistently useful and instead extend vLLM engine first start time. This change surfaces them to vLLM-level environment variable, giving users flexibility to turn off to lower cold-start, or keep on if it improves performance.
Add as env var as initial commit, also to avoid conflicts with ongoing
CompilationConfig
overhaul PR: #20283.Test Plan & Test Result
Sanity test on Qwen3 model, performance might vary across models, but overall cold start with compile_sizes reduces when those two envs set as False
vLLM engine first start time (w/o cache)
[TEST1] w/o compile_sizes
vLLM log
[TEST2] w/ compile_sizes=[1,2,4,8,16,24]
This by default has max_autotune=True & coordinate_descent_tuning=True
vLLM log
[TEST3] w/ compile_sizes=[1,2,4,8,16,24] &
VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE
=0This by default has coordinate_descent_tuning=True
vLLM log
[TEST4] w/ compile_sizes=[1,2,4,8,16,24] &
VLLM_ENABLE_INDUCTOR_COORDINATE_DESCENT_TUNING
=0 &VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE
=0vLLM log
vLLM performance
command
Could see shape [1,2,4,8,16,24] in running batchsize from one of the example logs
[TEST1] w/o compile_sizes
[TEST2] w/ compile_sizes=[1,2,4,8,16,24]
This by default has max_autotune=True & coordinate_descent_tuning=True
[TEST3] w/ compile_sizes=[1,2,4,8,16,24] &
VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE
=0This by default has coordinate_descent_tuning=True
[TEST4] w/ compile_sizes=[1,2,4,8,16,24] &
VLLM_ENABLE_INDUCTOR_COORDINATE_DESCENT_TUNING
=0 &VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE
=0Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.