-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[Misc] Feature flag to hide verbose prompt logging to debug level #26708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Aleksei Tsvetkov <aitsvet@ya.ru>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a feature flag VLLM_DEBUG_LOG_API_SERVER_REQUEST_PROMPT to move verbose prompt logging to the DEBUG level, reducing log noise. The implementation is correct, but there is significant code duplication in vllm/entrypoints/logger.py which poses a maintainability risk. I've suggested a refactoring to address this by using dictionary-based string formatting, which eliminates redundancy and improves clarity.
| if not envs.VLLM_DEBUG_LOG_API_SERVER_REQUEST_PROMPT: | ||
| # Original logging behavior | ||
| logger.info( | ||
| "Received request %s: prompt: %r, " | ||
| "params: %s, prompt_token_ids: %s, " | ||
| "prompt_embeds shape: %s, " | ||
| "lora_request: %s.", | ||
| request_id, | ||
| prompt, | ||
| params, | ||
| prompt_token_ids, | ||
| prompt_embeds.shape if prompt_embeds is not None else None, | ||
| lora_request, | ||
| ) | ||
| return | ||
|
|
||
| # Split logging: basic info at INFO level, prompt details at DEBUG level | ||
| logger.info( | ||
| "Received request %s: prompt: %r, " | ||
| "params: %s, prompt_token_ids: %s, " | ||
| "prompt_embeds shape: %s, " | ||
| "lora_request: %s.", | ||
| "Received request %s: params: %s, lora_request: %s.", | ||
| request_id, | ||
| prompt, | ||
| params, | ||
| lora_request, | ||
| ) | ||
| logger.debug( | ||
| "Request %s prompt details: prompt: %r, prompt_token_ids: %s, " | ||
| "prompt_embeds shape: %s", | ||
| request_id, | ||
| prompt, | ||
| prompt_token_ids, | ||
| prompt_embeds.shape if prompt_embeds is not None else None, | ||
| lora_request, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation duplicates the logging logic. The original logger.info call is copied into the if not envs.VLLM_DEBUG_LOG_API_SERVER_REQUEST_PROMPT: block. This code duplication can lead to maintenance issues, as future changes to the log message might not be applied in both places.
To improve maintainability and avoid redundancy, I suggest refactoring this to use dictionary-based string formatting. This approach centralizes the log data and constructs the log messages conditionally, making the code cleaner and easier to maintain. I've also slightly modified prompt_embeds shape to prompt_embeds_shape in the log message to make it a valid identifier for dictionary-based formatting.
log_data = {
"request_id": request_id,
"prompt": prompt,
"params": params,
"prompt_token_ids": prompt_token_ids,
"prompt_embeds_shape": prompt_embeds.shape if prompt_embeds is not None else None,
"lora_request": lora_request,
}
if not envs.VLLM_DEBUG_LOG_API_SERVER_REQUEST_PROMPT:
# Original logging behavior
logger.info(
"Received request %(request_id)s: prompt: %(prompt)r, "
"params: %(params)s, prompt_token_ids: %(prompt_token_ids)s, "
"prompt_embeds_shape: %(prompt_embeds_shape)s, "
"lora_request: %(lora_request)s.",
log_data,
)
return
# Split logging: basic info at INFO level, prompt details at DEBUG level
logger.info(
"Received request %(request_id)s: params: %(params)s, lora_request: %(lora_request)s.",
log_data,
)
logger.debug(
"Request %(request_id)s prompt details: prompt: %(prompt)r, "
"prompt_token_ids: %(prompt_token_ids)s, "
"prompt_embeds_shape: %(prompt_embeds_shape)s",
log_data,
)|
I think we should avoid a proliferation of magic environment variables like this Unconditionally changing the default behaviour seems reasonable to me - basically, reducing the (Note for reviewers - this already requires |
|
gonna just split log lines and change levels as suggested |
|
No need to close this and file a new PR. Keeping it in this PR actually helps maintain the context of the discussion |
Purpose
Introduces a feature flag
VLLM_DEBUG_LOG_API_SERVER_REQUEST_PROMPTto control the verbosity of API server request logging. When enabled, detailed prompt information (prompt, prompt_token_ids, prompt_embeds) is logged atDEBUGlevel instead ofINFOlevel, reducing noise in standardINFOlogs.Test Plan
VLLM_DEBUG_LOG_API_SERVER_REQUEST_PROMPTset (or set toFalse).INFOlevel.VLLM_DEBUG_LOG_API_SERVER_REQUEST_PROMPT=TrueandVLLM_LOG_LEVEL=debug.INFOlevel, and the detailed prompt information should only appear atDEBUGlevel.Test Result
False):INFOlogs contain full prompt details.True):INFOlogs contain only basic request info; full prompt details are moved toDEBUGlogs. This behavior was verified locally.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.