Open
Description
What happened:
When attempting to send large load, we presumably overrun the length of sharegpt and hit StopIteration.
What you expected to happen:
Reuse requests from sharegpt
How to reproduce it (as minimally and precisely as possible):
Large rate * duration
Anything else we need to know?:
Environment:
- inference-perf version:
- config.yml (entire one printed by the benchmark run):
- cloud provider or hardware configuration:
- others:
Using configuration from: config.yml
Benchmarking with the following config:
api: completion
data:
type: shareGPT
input_distribution:
min: 10
max: 1024
mean: 512.0
std_dev: 200.0
total_count: 1000
output_distribution:
min: 10
max: 1024
mean: 512.0
std_dev: 200.0
total_count: 1000
load:
type: constant
interval: 1.0
stages:
- rate: 1000
duration: 120
num_workers: 44
worker_max_concurrency: 100
worker_max_tcp_connections: 100
metrics: null
report:
request_lifecycle:
summary: true
per_stage: true
per_request: false
storage:
google_cloud_storage: null
server:
type: vllm
model_name: meta-llama/Meta-Llama-3-8B
base_url: http://pyserver-service:80
ignore_eos: true
tokenizer:
pretrained_model_name_or_path: meta-llama/Meta-Llama-3-8B
Stage 0 - run started
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/inference_perf/datagen/hf_sharegpt_datagen.py", line 46, in get_data
data = next(self.sharegpt_dataset)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/inference-perf", line 8, in <module>
sys.exit(main_cli())
^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/inference_perf/main.py", line 162, in main_cli
perfrunner.run()
File "/usr/local/lib/python3.12/site-packages/inference_perf/main.py", line 56, in run
asyncio.run(_run())
File "/usr/local/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/inference_perf/main.py", line 53, in _run
await self.loadgen.run(self.client)
File "/usr/local/lib/python3.12/site-packages/inference_perf/loadgen/load_generator.py", line 140, in run
return await self.mp_run(client)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/inference_perf/loadgen/load_generator.py", line 118, in mp_run
for request_number, (request_data, request_time) in enumerate(
^^^^^^^^^^
RuntimeError: generator raised StopIteration
^[[A^CException ignored in atexit callback: <function _exit_function at 0x7db50f5c8220>
Traceback (most recent call last):
File "/usr/local/lib/python3.12/multiprocessing/util.py", line 360, in _exit_function
p.join()
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 149, in join
res = self._popen.wait(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/multiprocessing/popen_fork.py", line 43, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/multiprocessing/popen_fork.py", line 27, in poll
pid, sts = os.waitpid(self.pid, flag)