Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Add validation for seed #4529

Merged
merged 1 commit into from
May 1, 2024
Merged

Conversation

sasha0552
Copy link
Contributor

@sasha0552 sasha0552 commented May 1, 2024

The following code causes vLLM crash:

api_key = "token-abc123"
base_url = "http://localhost:8000/v1"
content = "test " * 20
model = "mistralai/Mistral-7B-Instruct-v0.2"

from openai import OpenAI

client = OpenAI(
  api_key=api_key,
  base_url=base_url,
)

client.chat.completions.create(
  seed=2**64,
  max_tokens=128,
  messages=[
    {"role": "user", "content": content }
  ],
  model=model,
)
Exception
ERROR 05-01 13:52:05 worker_base.py:157] Error executing method execute_model. This might cause deadlock in distributed execution.
ERROR 05-01 13:52:05 worker_base.py:157] Traceback (most recent call last):
ERROR 05-01 13:52:05 worker_base.py:157]   File "/mnt/ml/vllm/vllm/worker/worker_base.py", line 149, in execute_method
ERROR 05-01 13:52:05 worker_base.py:157]     return executor(*args, **kwargs)
ERROR 05-01 13:52:05 worker_base.py:157]            ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 worker_base.py:157]   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-01 13:52:05 worker_base.py:157]     return func(*args, **kwargs)
ERROR 05-01 13:52:05 worker_base.py:157]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 worker_base.py:157]   File "/mnt/ml/vllm/vllm/worker/worker.py", line 254, in execute_model
ERROR 05-01 13:52:05 worker_base.py:157]     output = self.model_runner.execute_model(seq_group_metadata_list,
ERROR 05-01 13:52:05 worker_base.py:157]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 worker_base.py:157]   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-01 13:52:05 worker_base.py:157]     return func(*args, **kwargs)
ERROR 05-01 13:52:05 worker_base.py:157]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 worker_base.py:157]   File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 828, in execute_model
ERROR 05-01 13:52:05 worker_base.py:157]     ) = self.prepare_input_tensors(seq_group_metadata_list)
ERROR 05-01 13:52:05 worker_base.py:157]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 worker_base.py:157]   File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 688, in prepare_input_tensors
ERROR 05-01 13:52:05 worker_base.py:157]     sampling_metadata = self._prepare_sample(seq_group_metadata_list,
ERROR 05-01 13:52:05 worker_base.py:157]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 worker_base.py:157]   File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 598, in _prepare_sample
ERROR 05-01 13:52:05 worker_base.py:157]     device=self.device).manual_seed(sampling_params.seed)
ERROR 05-01 13:52:05 worker_base.py:157]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 worker_base.py:157] RuntimeError: Overflow when unpacking long
ERROR 05-01 13:52:05 async_llm_engine.py:43] Engine background task failed
ERROR 05-01 13:52:05 async_llm_engine.py:43] Traceback (most recent call last):
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
ERROR 05-01 13:52:05 async_llm_engine.py:43]     task.result()
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 496, in run_engine_loop
ERROR 05-01 13:52:05 async_llm_engine.py:43]     has_requests_in_progress = await asyncio.wait_for(
ERROR 05-01 13:52:05 async_llm_engine.py:43]                                ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/usr/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
ERROR 05-01 13:52:05 async_llm_engine.py:43]     return fut.result()
ERROR 05-01 13:52:05 async_llm_engine.py:43]            ^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 470, in engine_step
ERROR 05-01 13:52:05 async_llm_engine.py:43]     request_outputs = await self.engine.step_async()
ERROR 05-01 13:52:05 async_llm_engine.py:43]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 213, in step_async
ERROR 05-01 13:52:05 async_llm_engine.py:43]     output = await self.model_executor.execute_model_async(
ERROR 05-01 13:52:05 async_llm_engine.py:43]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
ERROR 05-01 13:52:05 async_llm_engine.py:43]     all_outputs = await self._run_workers_async(
ERROR 05-01 13:52:05 async_llm_engine.py:43]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
ERROR 05-01 13:52:05 async_llm_engine.py:43]     all_outputs = await asyncio.gather(*coros)
ERROR 05-01 13:52:05 async_llm_engine.py:43]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
ERROR 05-01 13:52:05 async_llm_engine.py:43]     result = self.fn(*self.args, **self.kwargs)
ERROR 05-01 13:52:05 async_llm_engine.py:43]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/vllm/worker/worker_base.py", line 158, in execute_method
ERROR 05-01 13:52:05 async_llm_engine.py:43]     raise e
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/vllm/worker/worker_base.py", line 149, in execute_method
ERROR 05-01 13:52:05 async_llm_engine.py:43]     return executor(*args, **kwargs)
ERROR 05-01 13:52:05 async_llm_engine.py:43]            ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-01 13:52:05 async_llm_engine.py:43]     return func(*args, **kwargs)
ERROR 05-01 13:52:05 async_llm_engine.py:43]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/vllm/worker/worker.py", line 254, in execute_model
ERROR 05-01 13:52:05 async_llm_engine.py:43]     output = self.model_runner.execute_model(seq_group_metadata_list,
ERROR 05-01 13:52:05 async_llm_engine.py:43]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-01 13:52:05 async_llm_engine.py:43]     return func(*args, **kwargs)
ERROR 05-01 13:52:05 async_llm_engine.py:43]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 828, in execute_model
ERROR 05-01 13:52:05 async_llm_engine.py:43]     ) = self.prepare_input_tensors(seq_group_metadata_list)
ERROR 05-01 13:52:05 async_llm_engine.py:43]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 688, in prepare_input_tensors
ERROR 05-01 13:52:05 async_llm_engine.py:43]     sampling_metadata = self._prepare_sample(seq_group_metadata_list,
ERROR 05-01 13:52:05 async_llm_engine.py:43]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43]   File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 598, in _prepare_sample
ERROR 05-01 13:52:05 async_llm_engine.py:43]     device=self.device).manual_seed(sampling_params.seed)
ERROR 05-01 13:52:05 async_llm_engine.py:43]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 13:52:05 async_llm_engine.py:43] RuntimeError: Overflow when unpacking long
ERROR:asyncio:Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7ca81a718c20>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7ca810e79010>>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7ca81a718c20>, error_callback=<bound method AsyncLLMEngine._error_callback of <vllm.engine.async_llm_engine.AsyncLLMEngine object at 0x7ca810e79010>>)>
Traceback (most recent call last):
  File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 496, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
                               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 470, in engine_step
    request_outputs = await self.engine.step_async()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
    all_outputs = await self._run_workers_async(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/worker/worker_base.py", line 158, in execute_method
    raise e
  File "/mnt/ml/vllm/vllm/worker/worker_base.py", line 149, in execute_method
    return executor(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/worker/worker.py", line 254, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 828, in execute_model
    ) = self.prepare_input_tensors(seq_group_metadata_list)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 688, in prepare_input_tensors
    sampling_metadata = self._prepare_sample(seq_group_metadata_list,
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 598, in _prepare_sample
    device=self.device).manual_seed(sampling_params.seed)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Overflow when unpacking long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 45, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
ERROR:    Exception in ASGI application
  + Exception Group Traceback (most recent call last):
  |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 192, in __call__
  |     await response(scope, wrapped_receive, send)
  |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 258, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
  |     raise BaseExceptionGroup(
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Exception Group Traceback (most recent call last):
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
    |     await func()
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 217, in stream_response
    |     return await super().stream_response(send)
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 181, in body_stream
    |     raise app_exc
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 151, in coro
    |     await self.app(scope, receive_or_disconnect, send_no_error)
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    |     await self.app(scope, receive, send)
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    |     raise exc
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    |     await app(scope, receive, sender)
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    |     await self.middleware_stack(scope, receive, send)
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    |     await route.handle(scope, receive, send)
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    |     await self.app(scope, receive, send)
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    |     raise exc
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    |     await app(scope, receive, sender)
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
    |     await response(scope, receive, send)
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 258, in __call__
    |     async with anyio.create_task_group() as task_group:
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
    |     raise BaseExceptionGroup(
    | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
    +-+---------------- 1 ----------------
      | Traceback (most recent call last):
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
      |     result = await app(  # type: ignore[func-returns-value]
      |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
      |     return await self.app(scope, receive, send)
      |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
      |     await super().__call__(scope, receive, send)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
      |     await self.middleware_stack(scope, receive, send)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
      |     raise exc
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
      |     await self.app(scope, receive, _send)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 189, in __call__
      |     with collapse_excgroups():
      |   File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
      |     self.gen.throw(typ, value, traceback)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/_utils.py", line 93, in collapse_excgroups
      |     raise exc
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
      |     await func()
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
      |     async for chunk in self.body_iterator:
      |   File "/mnt/ml/vllm/vllm/entrypoints/openai/serving_completion.py", line 194, in completion_stream_generator
      |     async for prompt_idx, res in result_generator:
      |   File "/mnt/ml/vllm/vllm/utils.py", line 228, in consumer
      |     raise item
      |   File "/mnt/ml/vllm/vllm/utils.py", line 213, in producer
      |     async for item in iterator:
      |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 661, in generate
      |     raise e
      |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 655, in generate
      |     async for request_output in stream:
      |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 77, in __anext__
      |     raise result
      |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
      |     task.result()
      |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 496, in run_engine_loop
      |     has_requests_in_progress = await asyncio.wait_for(
      |                                ^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/usr/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
      |     return fut.result()
      |            ^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 470, in engine_step
      |     request_outputs = await self.engine.step_async()
      |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 213, in step_async
      |     output = await self.model_executor.execute_model_async(
      |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
      |     all_outputs = await self._run_workers_async(
      |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
      |     all_outputs = await asyncio.gather(*coros)
      |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
      |     result = self.fn(*self.args, **self.kwargs)
      |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/vllm/worker/worker_base.py", line 158, in execute_method
      |     raise e
      |   File "/mnt/ml/vllm/vllm/worker/worker_base.py", line 149, in execute_method
      |     return executor(*args, **kwargs)
      |            ^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
      |     return func(*args, **kwargs)
      |            ^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/vllm/worker/worker.py", line 254, in execute_model
      |     output = self.model_runner.execute_model(seq_group_metadata_list,
      |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
      |     return func(*args, **kwargs)
      |            ^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 828, in execute_model
      |     ) = self.prepare_input_tensors(seq_group_metadata_list)
      |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 688, in prepare_input_tensors
      |     sampling_metadata = self._prepare_sample(seq_group_metadata_list,
      |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 598, in _prepare_sample
      |     device=self.device).manual_seed(sampling_params.seed)
      |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      | RuntimeError: Overflow when unpacking long
      +------------------------------------

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/_utils.py", line 87, in collapse_excgroups
  |     yield
  |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 190, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
  |     raise BaseExceptionGroup(
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 265, in __call__
    |     await wrap(partial(self.listen_for_disconnect, receive))
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
    |     await func()
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
    |     message = await receive()
    |               ^^^^^^^^^^^^^^^
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 54, in wrapped_receive
    |     msg = await self.receive()
    |           ^^^^^^^^^^^^^^^^^^^^
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
    |     await self.message_event.wait()
    |   File "/usr/lib/python3.11/asyncio/locks.py", line 213, in wait
    |     await fut
    | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7ca7dc2dad50
    |
    | During handling of the above exception, another exception occurred:
    |
    | Exception Group Traceback (most recent call last):
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 192, in __call__
    |     await response(scope, wrapped_receive, send)
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 258, in __call__
    |     async with anyio.create_task_group() as task_group:
    |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
    |     raise BaseExceptionGroup(
    | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
    +-+---------------- 1 ----------------
      | Traceback (most recent call last):
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 265, in __call__
      |     await wrap(partial(self.listen_for_disconnect, receive))
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
      |     await func()
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
      |     message = await receive()
      |               ^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 128, in receive_or_disconnect
      |     message = await wrap(wrapped_receive)
      |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 123, in wrap
      |     result = await func()
      |              ^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 54, in wrapped_receive
      |     msg = await self.receive()
      |           ^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
      |     await self.message_event.wait()
      |   File "/usr/lib/python3.11/asyncio/locks.py", line 213, in wait
      |     await fut
      | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7ca7dc06e0d0
      |
      | During handling of the above exception, another exception occurred:
      |
      | Exception Group Traceback (most recent call last):
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
      |     await func()
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 217, in stream_response
      |     return await super().stream_response(send)
      |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
      |     async for chunk in self.body_iterator:
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 181, in body_stream
      |     raise app_exc
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 151, in coro
      |     await self.app(scope, receive_or_disconnect, send_no_error)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
      |     await self.app(scope, receive, send)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
      |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
      |     raise exc
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
      |     await app(scope, receive, sender)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
      |     await self.middleware_stack(scope, receive, send)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
      |     await route.handle(scope, receive, send)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
      |     await self.app(scope, receive, send)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
      |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
      |     raise exc
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
      |     await app(scope, receive, sender)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
      |     await response(scope, receive, send)
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 258, in __call__
      |     async with anyio.create_task_group() as task_group:
      |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
      |     raise BaseExceptionGroup(
      | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
      +-+---------------- 1 ----------------
        | Traceback (most recent call last):
        |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
        |     result = await app(  # type: ignore[func-returns-value]
        |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
        |     return await self.app(scope, receive, send)
        |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
        |     await super().__call__(scope, receive, send)
        |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
        |     await self.middleware_stack(scope, receive, send)
        |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
        |     raise exc
        |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
        |     await self.app(scope, receive, _send)
        |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 189, in __call__
        |     with collapse_excgroups():
        |   File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
        |     self.gen.throw(typ, value, traceback)
        |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/_utils.py", line 93, in collapse_excgroups
        |     raise exc
        |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
        |     await func()
        |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
        |     async for chunk in self.body_iterator:
        |   File "/mnt/ml/vllm/vllm/entrypoints/openai/serving_completion.py", line 194, in completion_stream_generator
        |     async for prompt_idx, res in result_generator:
        |   File "/mnt/ml/vllm/vllm/utils.py", line 228, in consumer
        |     raise item
        |   File "/mnt/ml/vllm/vllm/utils.py", line 213, in producer
        |     async for item in iterator:
        |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 661, in generate
        |     raise e
        |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 655, in generate
        |     async for request_output in stream:
        |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 77, in __anext__
        |     raise result
        |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
        |     task.result()
        |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 496, in run_engine_loop
        |     has_requests_in_progress = await asyncio.wait_for(
        |                                ^^^^^^^^^^^^^^^^^^^^^^^
        |   File "/usr/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
        |     return fut.result()
        |            ^^^^^^^^^^^^
        |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 470, in engine_step
        |     request_outputs = await self.engine.step_async()
        |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |   File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 213, in step_async
        |     output = await self.model_executor.execute_model_async(
        |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |   File "/mnt/ml/vllm/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
        |     all_outputs = await self._run_workers_async(
        |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |   File "/mnt/ml/vllm/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
        |     all_outputs = await asyncio.gather(*coros)
        |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |   File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
        |     result = self.fn(*self.args, **self.kwargs)
        |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |   File "/mnt/ml/vllm/vllm/worker/worker_base.py", line 158, in execute_method
        |     raise e
        |   File "/mnt/ml/vllm/vllm/worker/worker_base.py", line 149, in execute_method
        |     return executor(*args, **kwargs)
        |            ^^^^^^^^^^^^^^^^^^^^^^^^^
        |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
        |     return func(*args, **kwargs)
        |            ^^^^^^^^^^^^^^^^^^^^^
        |   File "/mnt/ml/vllm/vllm/worker/worker.py", line 254, in execute_model
        |     output = self.model_runner.execute_model(seq_group_metadata_list,
        |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |   File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
        |     return func(*args, **kwargs)
        |            ^^^^^^^^^^^^^^^^^^^^^
        |   File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 828, in execute_model
        |     ) = self.prepare_input_tensors(seq_group_metadata_list)
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |   File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 688, in prepare_input_tensors
        |     sampling_metadata = self._prepare_sample(seq_group_metadata_list,
        |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |   File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 598, in _prepare_sample
        |     device=self.device).manual_seed(sampling_params.seed)
        |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        | RuntimeError: Overflow when unpacking long
        +------------------------------------

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/middleware/base.py", line 189, in __call__
    with collapse_excgroups():
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/_utils.py", line 93, in collapse_excgroups
    raise exc
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
    await func()
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
    async for chunk in self.body_iterator:
  File "/mnt/ml/vllm/vllm/entrypoints/openai/serving_completion.py", line 194, in completion_stream_generator
    async for prompt_idx, res in result_generator:
  File "/mnt/ml/vllm/vllm/utils.py", line 228, in consumer
    raise item
  File "/mnt/ml/vllm/vllm/utils.py", line 213, in producer
    async for item in iterator:
  File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 661, in generate
    raise e
  File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 655, in generate
    async for request_output in stream:
  File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 77, in __anext__
    raise result
  File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
    task.result()
  File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 496, in run_engine_loop
    has_requests_in_progress = await asyncio.wait_for(
                               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 470, in engine_step
    request_outputs = await self.engine.step_async()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/engine/async_llm_engine.py", line 213, in step_async
    output = await self.model_executor.execute_model_async(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
    all_outputs = await self._run_workers_async(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/worker/worker_base.py", line 158, in execute_method
    raise e
  File "/mnt/ml/vllm/vllm/worker/worker_base.py", line 149, in execute_method
    return executor(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/worker/worker.py", line 254, in execute_model
    output = self.model_runner.execute_model(seq_group_metadata_list,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 828, in execute_model
    ) = self.prepare_input_tensors(seq_group_metadata_list)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 688, in prepare_input_tensors
    sampling_metadata = self._prepare_sample(seq_group_metadata_list,
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/ml/vllm/vllm/worker/model_runner.py", line 598, in _prepare_sample
    device=self.device).manual_seed(sampling_params.seed)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Overflow when unpacking long

Copy link
Collaborator

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Yard1 Yard1 enabled auto-merge (squash) May 1, 2024 18:35
@Yard1 Yard1 merged commit c47ba4a into vllm-project:main May 1, 2024
48 checks passed
@sasha0552 sasha0552 deleted the seed-validation branch May 2, 2024 02:13
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 6, 2024
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request May 7, 2024
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants