Add health check, make async Engine more robust #3015

Yard1 · 2024-02-23T23:46:25Z

For production usecases, we want to be able to detect Engine failures, especially ones that can happen silently (eg. due to NCCL timeouts). This PR adds a health check method (currently only checking the health of Ray workers) and makes the Async engine more robust by adding a timeout for each iteration as well as better error reporting.

zhuohan123

Thanks for the contribution! In general LGTM. Left some small questions.

vllm/engine/llm_engine.py

vllm/engine/async_llm_engine.py

zhuohan123 · 2024-03-03T23:02:06Z

vllm/engine/async_llm_engine.py

+    finally:
+        if exception:
+            error_callback(exception)


We raise errors in both try branch and except branch. Then what does the finally here do?

we want to run the error callback even after we re-raise an exception in except

I think you could just do this in the except block though before re-raising (things will still run in the same order)

zhuohan123 · 2024-03-03T23:32:21Z

vllm/engine/async_llm_engine.py

+    async def wait_for_new_requests(self, clear: bool):
+        if not self.has_new_requests():
+            await self.new_requests_event.wait()
+        if clear:
+            self.new_requests_event.clear()


Why don't we always clear this flag?

Suggested change

async def wait_for_new_requests(self, clear: bool):

if not self.has_new_requests():

await self.new_requests_event.wait()

if clear:

self.new_requests_event.clear()

async def wait_for_new_requests(self):

if not self.has_new_requests():

await self.new_requests_event.wait()

self.new_requests_event.clear()

Also, what's the reason behind this change? Why do we need to move the clear call from get_new_and_finished_requests to here?

Yes, we can always clear it.

The reason is to ensure the event is cleared as soon as we have new requests

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

njhill

Thanks @Yard1 this looks great

vllm/engine/async_llm_engine.py

njhill · 2024-03-04T19:45:33Z

vllm/engine/async_llm_engine.py

+    finally:
+        if exception:
+            error_callback(exception)


I think you could just do this in the except block though before re-raising (things will still run in the same order)

njhill · 2024-03-04T20:25:40Z

vllm/engine/async_llm_engine.py

+        if not self.has_new_requests():
+            await self.new_requests_event.wait()
+        self.new_requests_event.clear()


Suggestion to only clear before waiting

Suggested change

if not self.has_new_requests():

await self.new_requests_event.wait()

self.new_requests_event.clear()

if not self.has_new_requests():

self.new_requests_event.clear()

if not self.has_new_requests():

await self.new_requests_event.wait()

Hmm can you explain why we should do it like that?

Just to avoid flip-flopping the event - it only needs to be cleared when you're actually about to wait on it. But I guess with python/asyncio it doesn't matter anyway.

Yeah I think it should be fine

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

Yard1 added 2 commits February 23, 2024 15:42

Add health check, make async Engine more robust

8c10eeb

Update tests, request tracker

b24b18e

Yard1 requested review from simon-mo and zhuohan123 February 26, 2024 20:39

Trigger CI

5e18cb8

zhuohan123 self-assigned this Mar 2, 2024

zhuohan123 approved these changes Mar 3, 2024

View reviewed changes

Yard1 and others added 4 commits March 4, 2024 11:00

Update vllm/engine/llm_engine.py

8f4f442

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

Update vllm/engine/async_llm_engine.py

1afc5c2

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

Merge branch 'main' into add_health_check

48d55f7

WIP

ce2e0e6

njhill reviewed Mar 4, 2024

View reviewed changes

Remove finally

c75e213

Yard1 enabled auto-merge (squash) March 4, 2024 21:44

Yard1 merged commit ff578ca into vllm-project:main Mar 4, 2024
22 checks passed

njhill mentioned this pull request Mar 7, 2024

Connect engine healthcheck to openai server #3260

Merged

dtransposed pushed a commit to afeldman-nm/vllm that referenced this pull request Mar 26, 2024

Add health check, make async Engine more robust (vllm-project#3015)

8122488

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

zifeitong mentioned this pull request May 21, 2024

[Bug]: Async engine hangs with 0.4.* releases #4789

Open

DarkLight1337 mentioned this pull request Jun 1, 2024

bug fixed: cuda out of memory lead to 'AsyncEngineDeadError: Background loop has errored already. #5173

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add health check, make async Engine more robust #3015

Add health check, make async Engine more robust #3015

Yard1 commented Feb 23, 2024

zhuohan123 left a comment

zhuohan123 Mar 3, 2024

Yard1 Mar 4, 2024

njhill Mar 4, 2024

zhuohan123 Mar 3, 2024

zhuohan123 Mar 3, 2024

Yard1 Mar 4, 2024

njhill left a comment

njhill Mar 4, 2024

njhill Mar 4, 2024

Yard1 Mar 4, 2024

njhill Mar 4, 2024

Yard1 Mar 4, 2024

Add health check, make async Engine more robust #3015

Add health check, make async Engine more robust #3015

Conversation

Yard1 commented Feb 23, 2024

zhuohan123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njhill left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment