feat(api): Return 503 on /health when engine is dead #24897

dongbo910220 · 2025-09-15T18:07:15Z

Purpose

This pull request resolves #19881 by improving the HTTP semantics of the /health endpoint when the V1 engine dies unexpectedly.

Currently, when the EngineCore process is terminated (e.g., via kill -9), vLLM is able to reliably detect this condition thanks to the robust monitoring mechanism introduced in #21728. This detection correctly raises an EngineDeadError, which is then caught by a generic, high-level exception handler in launcher.py that returns a broad HTTP 500 Internal Server Error.

This PR introduces a more specific try...except block for EngineDeadError directly within the /health route handler. This change achieves two key objectives:

Corrects the HTTP Semantic: It changes the response to a more appropriate HTTP 503 Service Unavailable. This accurately signals that the service is temporarily unable to handle requests due to an unavailable dependency (the engine), which is distinct from a 500 (an unexpected bug in the application code).
Improves Production Observability: An explicit 503 response allows automated systems like Kubernetes and load balancers to make better decisions, such as gracefully routing traffic away from the unhealthy instance instead of treating it as a crashing application.

Test Plan

The functionality can be verified with the following end-to-end test using the multiprocessing backend:

Start the vLLM server on this branch:

python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-7b-hf --distributed-executor-backend mp

Confirm the service is healthy:
```
curl -v http://localhost:8000/health
```
- Expected Output: The response should include < HTTP/1.1 200 OK.
Find and terminate the EngineCore process:
- In a new terminal, find the main server's PID: pgrep -f "vllm.entrypoints.openai.api_server"
- Find the EngineCore child process PID (replace <SERVER_PID> with the result from the previous step): pgrep -P <SERVER_PID>
- Forcefully kill the EngineCore process (replace <ENGINE_PID>):
```
kill -9 <ENGINE_PID>
```
Verify the new health check behavior:
- Wait for the monitor thread to detect the failure.
- Check the health endpoint again:
```
curl -v http://localhost:8000/health
```
- Expected Output: The response should now include < HTTP/1.1 503 Service Unavailable.

Test Result

This section should be filled out after running the test plan on your branch.

Before (on main branch):
When the EngineCore process is killed, a curl command to the /health endpoint returns HTTP 500 Internal Server Error. This is because the EngineDeadError is caught by a generic exception handler called runtime_exception_handler in launcher.py. While the server does not crash, the status code does not accurately reflect the "service unavailable" nature of the failure.

After (on this improve-health-check branch):
When the EngineCore process is killed, a curl command to the /health endpoint correctly returns HTTP 503 Service Unavailable. The server remains responsive and provides the correct semantic signal for monitoring systems before gracefully shutting down.

gemini-code-assist

Code Review

This pull request correctly identifies the need to return an HTTP 503 status when the engine is dead, improving the semantics of the /health endpoint. My review focuses on making this behavior consistent across the entire API by proposing a more robust implementation using a global exception handler, which will also simplify the code and enhance maintainability.

gemini-code-assist · 2025-09-15T18:08:44Z

vllm/entrypoints/openai/api_server.py

While this try...except block correctly handles the EngineDeadError for the /health endpoint, this approach has a significant drawback: other endpoints that rely on the engine (e.g., /v1/chat/completions) will still raise EngineDeadError and fall back to a generic 500 error. This creates inconsistent API behavior for the same underlying issue.

A more robust and maintainable solution is to use a global FastAPI exception handler. This ensures all endpoints consistently return a 503 Service Unavailable response when the engine is dead, and it keeps the route handler logic clean.

You can achieve this by adding the following exception handler to the build_app function (e.g., after the RequestValidationError handler around line 1614):

@app.exception_handler(EngineDeadError) async def engine_dead_exception_handler(request: Request, exc: EngineDeadError): # The /health and /ping endpoints expect a plain Response if request.url.path in ("/health", "/ping"): return Response(status_code=HTTPStatus.SERVICE_UNAVAILABLE.value) # Other endpoints expect an OpenAI-compatible error JSON err = ErrorResponse( error=ErrorInfo(message="The service is currently unavailable, please try again later. Reason: Engine is dead.", type=HTTPStatus.SERVICE_UNAVAILABLE.phrase, code=HTTPStatus.SERVICE_UNAVAILABLE.value)) return JSONResponse(err.model_dump(), status_code=HTTPStatus.SERVICE_UNAVAILABLE.value)

With this handler in place, the body of the health function can be reverted to its original, simpler implementation.

await engine_client(raw_request).check_health() return Response(status_code=200)

Thank you for the suggestion to use a global exception handler, this is a great point for ensuring API consistency.

I initially explored this approach. However, my investigation revealed that a pre-existing, higher-level try...except block in launcher.py wraps the entire application.

Due to the execution order, this generic handler would intercept the EngineDeadError before a newly registered, specific handler in api_server.py could act, effectively overriding it and still returning a 500.

To ensure the correct 503 is returned for the /health endpoint without altering the existing high-level exception logic, I chose to handle the EngineDeadError specifically within the route itself. This is the most direct and lowest-risk way to achieve the desired behavior and avoid issues with handler precedence.

I'm happy to discuss further if you think modifying the root handler in launcher.py is a better path forward!

dongbo910220 · 2025-09-15T18:37:24Z

cc @robertgshaw2-redhat #19881

lengrongfu · 2025-09-16T09:32:53Z

This PR of mine solves a similar problem. #24491, maybe need discuss. keep both or just one?

dongbo910220 · 2025-09-16T10:24:51Z

Hi @lengrongfu, thank you for pointing this out! I've reviewed your PR #24491 and the related discussion in issue #24207.

This is a great observation. It seems we are working on improving the /health endpoint from two different, but complementary, angles:

Your PR [V1] add generate optional in health api #24491 focuses on implementing a "deep" health check (/health?generate=true) to detect if a live engine has become unresponsive.
My PR here (feat(api): Return 503 on /health when engine is dead #24897) has a simpler scope: to correct the HTTP status code from 500 to 503 for the case where the engine is already confirmed to be dead.

I believe our changes can coexist perfectly. My PR ensures we report a known failure with the correct semantic, while your PR works on the more complex challenge of how to detect a silent failure.

I'm happy to collaborate to ensure our changes merge smoothly together. My current implementation should not conflict with your proposed minimal_generation() logic.

mergify · 2025-09-17T13:07:18Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @dongbo910220.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: dongbo910220 <1275604947@qq.com>

🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: dongbo910220 <1275604947@qq.com>

This makes the health check more precise by only returning 503 for engine death scenarios rather than all exceptions. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: dongbo910220 <1275604947@qq.com>

dongbo910220 · 2025-09-18T12:49:24Z

Hi @DarkLight1337, would you have a moment to review this PR when you get a chance?

It's a small fix for the /health endpoint's status code (resolves #19881) . Thanks!

DarkLight1337

Since @robertgshaw2-redhat seems busy, I'll just stamp this since it looks reasonable

) Signed-off-by: dongbo910220 <1275604947@qq.com> Co-authored-by: Claude <noreply@anthropic.com>

cadedaniel · 2025-09-22T23:37:54Z

Thanks for the PR. Could we add a test for this PR? Otherwise, this behavior cannot be relied upon by downstream users as it can break in any commit.

) Signed-off-by: dongbo910220 <1275604947@qq.com> Co-authored-by: Claude <noreply@anthropic.com>

) Signed-off-by: dongbo910220 <1275604947@qq.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: charlifu <charlifu@amd.com>

dongbo910220 · 2025-10-02T04:02:28Z

Hi @cadedaniel, apologies for the late reply, I was on vacation.

Thank you for the suggestion to add a test for this behavior. I've just created a new pull request #26074 to add the corresponding test case. It mocks the EngineDeadError and verifies that the /health endpoint correctly returns a 503.

Would appreciate a review on the new PR when you have a chance. Thanks!

dongbo910220 requested review from aarnphm and chaunceyjiang as code owners September 15, 2025 18:07

mergify bot added the frontend label Sep 15, 2025

gemini-code-assist bot reviewed Sep 15, 2025

View reviewed changes

dongbo910220 force-pushed the improve-health-check branch from c347ea8 to 55852db Compare September 15, 2025 18:10

mergify bot added the needs-rebase label Sep 17, 2025

dongbo910220 and others added 3 commits September 17, 2025 22:24

Improve health check functionality

11f220d

Signed-off-by: dongbo910220 <1275604947@qq.com>

Improve health check functionality

ac42e02

Signed-off-by: dongbo910220 <1275604947@qq.com>

return 503 for /health when Exception occurs

712a447

🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: dongbo910220 <1275604947@qq.com>

dongbo910220 force-pushed the improve-health-check branch from 55852db to 0cf8c31 Compare September 17, 2025 14:44

mergify bot removed the needs-rebase label Sep 17, 2025

dongbo910220 force-pushed the improve-health-check branch from 0cf8c31 to 400e102 Compare September 17, 2025 15:04

DarkLight1337 approved these changes Sep 18, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 18, 2025 12:55

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 18, 2025

DarkLight1337 merged commit 67244c8 into vllm-project:main Sep 18, 2025
54 checks passed

debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025

feat(api): Return 503 on /health when engine is dead (vllm-project#24897

94bad1f

) Signed-off-by: dongbo910220 <1275604947@qq.com> Co-authored-by: Claude <noreply@anthropic.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

feat(api): Return 503 on /health when engine is dead (vllm-project#24897

c68b737

) Signed-off-by: dongbo910220 <1275604947@qq.com> Co-authored-by: Claude <noreply@anthropic.com>

dongbo910220 mentioned this pull request Oct 2, 2025

[Test] Add test for /health endpoint on engine failure #26074

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(api): Return 503 on /health when engine is dead #24897

feat(api): Return 503 on /health when engine is dead #24897

Uh oh!

dongbo910220 commented Sep 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 15, 2025

Uh oh!

dongbo910220 Sep 15, 2025

Uh oh!

dongbo910220 commented Sep 15, 2025

Uh oh!

lengrongfu commented Sep 16, 2025

Uh oh!

dongbo910220 commented Sep 16, 2025

Uh oh!

mergify bot commented Sep 17, 2025

Uh oh!

dongbo910220 commented Sep 18, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

Uh oh!

cadedaniel commented Sep 22, 2025

Uh oh!

dongbo910220 commented Oct 2, 2025

Uh oh!

Uh oh!

Uh oh!

feat(api): Return 503 on /health when engine is dead #24897

feat(api): Return 503 on /health when engine is dead #24897

Uh oh!

Conversation

dongbo910220 commented Sep 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

dongbo910220 Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

dongbo910220 commented Sep 15, 2025

Uh oh!

lengrongfu commented Sep 16, 2025

Uh oh!

dongbo910220 commented Sep 16, 2025

Uh oh!

mergify bot commented Sep 17, 2025

Uh oh!

dongbo910220 commented Sep 18, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cadedaniel commented Sep 22, 2025

Uh oh!

dongbo910220 commented Oct 2, 2025

Uh oh!

Uh oh!

dongbo910220 commented Sep 15, 2025 •

edited by github-actions bot

Loading