Skip to content

High concurrency benchmarks (rate >= 24) cause unhandled ConnectionRefusedError on shutdown #329

@git-jxj

Description

@git-jxj

Describe the bug

When running a benchmark with rate-type="concurrent", a ConnectionRefusedError traceback is consistently printed at the end of the run if the concurrency rate is high (e.g., >= 24). The error does not occur at lower concurrency rates (e.g., <= 20).

Despite the error, the final benchmark statistics appear to be correct and complete. However, the presence of an unhandled exception indicates an unclean shutdown process.

Expected behavior

The benchmark should complete without any errors or exceptions printed to the console, regardless of the concurrency rate. The process shutdown should be graceful.

Environment
Include all relevant environment information:

  1. OS: macOS (darwin)
  2. Python version: 3.10.x

To Reproduce
Exact steps to reproduce the behavior:
The bug can be reliably reproduced by comparing a low-rate run with a high-rate run.

  1. Run the "Good" Case (No Error): Execute the benchmark with rate=20. This command completes cleanly every time.

guidellm benchmark
--target "http://<your_model_endpoint>"
--processor "Qwen/Qwen3-0.6B"
--rate-type "concurrent"
--rate 20
--max-requests 20
--data "prompt_tokens=16,output_tokens=16,samples=1"

  1. Run the "Bad" Case (Error Occurs): Execute the benchmark with rate=24 or higher. This command consistently produces the error at the end of the run.

guidellm benchmark
--target "http://<your_model_endpoint>"
--processor "Qwen/Qwen3-0.6B"
--rate-type "concurrent"
--rate 24
--max-requests 24
--data "prompt_tokens=16,output_tokens=16,samples=1"

Errors

The following traceback is printed to the console on failure:

1 Task exception was never retrieved
2 future: <Task finished name='Task-22' coro=<RequestsWorker.send_result() done, defined at 
  /path/to/guidellm/src/guidellm/scheduler/worker.py:111> exception=ConnectionRefusedError(61, 'Connection refused')>
3 Traceback (most recent call last):
4   File "/path/to/python3.10/multiprocessing/managers.py", line 810, in _callmethod
5     conn = self._tls.connection
6 AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
7 
8 During handling of the above exception, another exception occurred:
9 

10 Traceback (most recent call last):
11 File "/path/to/guidellm/src/guidellm/scheduler/worker.py", line 116, in send_result
12 await asyncio.to_thread(results_queue.put, result)
13 File "/path/to/python3.10/asyncio/threads.py", line 25, in to_thread
14 return await loop.run_in_executor(None, func_call)
15 File "/path/to/python3.10/concurrent/futures/thread.py", line 58, in run
16 result = self.fn(*self.args, **self.kwargs)
17 File "", line 2, in put
18 File "/path/to/python3.10/multiprocessing/managers.py", line 814, in _callmethod
19 self._connect()
20 File "/path/to/python3.10/multiprocessing/managers.py", line 801, in _connect
21 conn = self._Client(self._token.address, authkey=self._authkey)
22 File "/path/to/python3.10/multiprocessing/connection.py", line 502, in Client
23 c = SocketClient(address)
24 File "/path/to/python3.10/multiprocessing/connection.py", line 630, in SocketClient
25 s.connect(address)
26 ConnectionRefusedError: [Errno 61] Connection refused

Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions