-
Notifications
You must be signed in to change notification settings - Fork 132
Description
System Info
when excuting benchmark workload below, throw ZeroDivisionError. The LLM model generates response tokens, however the calculator does not count right result.
evalscope perf --url 'http://localhost:3000/v1/chat/completions' --parallel 1 --model 'ensemble' --log-every-n-query 10 --read-timeout=60 --dataset-path '/root/Dataset/open_qa.jsonl' -n 1 --max-prompt-length 1000 --max-tokens 100 --api openai --stop '<|im_end|>' --dataset openqa --debug
system infomation: x86, L20 GPU , triton server 0.11.0, tensorrt-llm 0.11.0, openai_trtllm 0.21.0
2024-10-14 21:28:31,788 - perf - http_client.py - on_request_start - 54 - INFO - Starting request: <TraceRequestStartParams(method='POST', url=URL('http://localhost:3000/v1/chat/completions'), headers=<CIMultiDict('Content-Type': 'application/json', 'user-agent': 'modelscope_bench')>)>
2024-10-14 21:28:31,790 - perf - http_client.py - on_request_chunk_sent - 58 - INFO - Request body: TraceRequestChunkSentParams(method='POST', url=URL('http://localhost:3000/v1/chat/completions'), chunk=b'{"messages": [{"role": "user", "content": "\u76d7\u8d3c\u5929\u8d4b\u76d7\u8d3c\u600e\u4e48\u52a0\u5929\u8d4b?\u77e5\u9053\u544a\u8bc9\u4e00\u4e0b\u4e0b\u5566~~"}], "model": "ensemble", "max_tokens": 100, "stop": ["<|im_end|>"]}')
2024-10-14 21:28:34,041 - perf - http_client.py - on_response_chunk_received - 62 - INFO - Response info: <TraceResponseChunkReceivedParams(method='POST', url=URL('http://localhost:3000/v1/chat/completions'), chunk=b'{"id":"cmpl-a6097c78-5c47-44e7-8ad6-3da44429b551","object":"text_completion","created":1728912514,"model":"ensemble","system_fingerprint":null,"choices":[{"index":0,"message":{"role":"assistant","content":"User: \xe7\x9b\x97\xe8\xb4\xbc\xe5\xa4\xa9\xe8\xb5\x8b\xe7\x9b\x97\xe8\xb4\xbc\xe6\x80\x8e\xe4\xb9\x88\xe5\x8a\xa0\xe5\xa4\xa9\xe8\xb5\x8b?\xe7\x9f\xa5\xe9\x81\x93\xe5\x91\x8a\xe8\xaf\x89\xe4\xb8\x80\xe4\xb8\x8b\xe4\xb8\x8b\xe5\x95\xa6~~\nASSISTANT: \xe5\xaf\xb9\xe4\xba\x8e\xe7\x9b\x97\xe8\xb4\xbc\xe5\xa4\xa9\xe8\xb5\x8b\xef\xbc\x8c\xe5\xbb\xba\xe8\xae\xae\xe5\x85\x88\xe9\x80\x89\xe6\x8b\xa9\xe2\x80\x9c\xe6\x89\xab\xe8\x8d\xa1\xe9\x81\x97\xe4\xba\xa7\xe2\x80\x9d\xef\xbc\x8c\xe8\x83\xbd\xe5\xa4\x9f\xe6\x8f\x90\xe9\xab\x98\xe5\x81\xb7\xe5\x8f\x96\xe5\xae\x9d\xe7\xae\xb1\xe7\x9a\x84\xe9\x80\x9f\xe5\xba\xa6\xe3\x80\x81\xe6\x89\x93\xe5\xbc\x80\xe6\xa2\x81\xe7\x9a\x84\xe5\x87\xa0\xe7\x8e\x87\xe4\xbb\xa5\xe5\x8f\x8a\xe6\x8c\x82\xe9\xa5\xb0\xe4\xb8\xa2\xe5\xbc\x83\xe7\x9a\x84\xe9\x87\x91\xe9\x92\xb1\xe3\x80\x82\xe5\xb0\x86\xe6\x9b\xb4\xe5\xa4\x9a\xe7\x82\xb9\xe6\x95\xb0\xe6\x8a\x95\xe5\x85\xa5\xe2\x80\x9c\xe8\xb1\xa1\xe7\x89\x99\xe5\xae\x9d\xe5\x89\x91\xe6\x8a\x80\xe5\xb8\x88\xe2\x80\x9d\xef\xbc\x8c\xe8\xbf\x99\xe5\xb0\x86\xe5\xa4\xa7\xe5\xa4\xa7\xe5\xa2\x9e\xe5\x8a\xa0\xe4\xbd\xa0\xe5\xaf\xb9\xe6\x95\x8c\xe4\xba\xba\xe7\x9a\x84\xe5\x8f\x8d\xe4\xbc\xa4\xe5\x92\x8c\xe9\x98\xb2\xe5\xbe\xa1\xe8\xa7\xa3\xe6\x95\xa3\xe5\x87\xa0\xe7\x8e\x87\xe3\x80\x82\xe5\x90\x8c\xe6\x97\xb6\xef\xbc\x8c\xe6\x88\x91\xe4\xbb\xac\xe5\x8f\xaf\xe4\xbb\xa5\xe8\xae\xa9\xe6\x8a\x80\xe8\x89\xba\xe5\x9c\xa8\xe6\x8a\x80\xe8\x83\xbd\xe7\x82\xb9\xe8\xb6\xb3\xe5\xa4\x9f\xe7\x9a\x84\xe6\x83\x85\xe5\x86\xb5\xe4\xb8\x8b\xe8\xa6\x86\xe7\x9b\x96\xe4\xb8\x8a\xe9\x80\x82\xe5\x90\x88\xe4\xbd\xa0\xe7\x9a\x84\xe5\x85\xb6\xe4\xbb\x96\xe5\xa4\xa9\xe8\xb5\x8b\xe3\x80\x82\xe8\xae\xb0\xe5\xbe\x97\xe4\xb8\x8d\xe6\x96\xad\xe7\xbb\x83\xe4\xb9\xa0\xe6\x93\x8d\xe4\xbd\x9c\xe6\x8a\x80\xe5\xb7\xa7\xef\xbc\x8c\xe5\xb9\xb6\xe9\x80\x82\xe6\x97\xb6\xe4\xbd\xbf\xe7\x94\xa8\xe5\x85\xb3\xe9\x94\xae\xe6\x8a\x80\xe8\x83\xbd\xef\xbc\x8c\xe6\x89\x8d\xe8\x83\xbd\xe5\x9c\xa8\xe6\x88\x98\xe6\x96\x97\xe4\xb8\xad\xe6\x97\xa0\xe5\xbe\x80\xe4\xb8\x8d\xe8\x83\x9c~"},"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}')>
2024-10-14 21:28:34,041 - perf - http_client.py - send_requests_worker - 570 - INFO - {"id": "cmpl-a6097c78-5c47-44e7-8ad6-3da44429b551", "object": "text_completion", "created": 1728912514, "model": "ensemble", "system_fingerprint": null, "choices": [{"index": 0, "message": {"role": "assistant", "content": "User: \u76d7\u8d3c\u5929\u8d4b\u76d7\u8d3c\u600e\u4e48\u52a0\u5929\u8d4b?\u77e5\u9053\u544a\u8bc9\u4e00\u4e0b\u4e0b\u5566~~\nASSISTANT: \u5bf9\u4e8e\u76d7\u8d3c\u5929\u8d4b\uff0c\u5efa\u8bae\u5148\u9009\u62e9\u201c\u626b\u8361\u9057\u4ea7\u201d\uff0c\u80fd\u591f\u63d0\u9ad8\u5077\u53d6\u5b9d\u7bb1\u7684\u901f\u5ea6\u3001\u6253\u5f00\u6881\u7684\u51e0\u7387\u4ee5\u53ca\u6302\u9970\u4e22\u5f03\u7684\u91d1\u94b1\u3002\u5c06\u66f4\u591a\u70b9\u6570\u6295\u5165\u201c\u8c61\u7259\u5b9d\u5251\u6280\u5e08\u201d\uff0c\u8fd9\u5c06\u5927\u5927\u589e\u52a0\u4f60\u5bf9\u654c\u4eba\u7684\u53cd\u4f24\u548c\u9632\u5fa1\u89e3\u6563\u51e0\u7387\u3002\u540c\u65f6\uff0c\u6211\u4eec\u53ef\u4ee5\u8ba9\u6280\u827a\u5728\u6280\u80fd\u70b9\u8db3\u591f\u7684\u60c5\u51b5\u4e0b\u8986\u76d6\u4e0a\u9002\u5408\u4f60\u7684\u5176\u4ed6\u5929\u8d4b\u3002\u8bb0\u5f97\u4e0d\u65ad\u7ec3\u4e60\u64cd\u4f5c\u6280\u5de7\uff0c\u5e76\u9002\u65f6\u4f7f\u7528\u5173\u952e\u6280\u80fd\uff0c\u624d\u80fd\u5728\u6218\u6597\u4e2d\u65e0\u5f80\u4e0d\u80dc~"}, "finish_reason": "stop"}], "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0}}
Traceback (most recent call last):
File "/usr/local/bin/evalscope", line 8, in
sys.exit(run_cmd())
File "/usr/local/lib/python3.10/dist-packages/evalscope/cli/cli.py", line 21, in run_cmd
cmd.execute()
File "/usr/local/lib/python3.10/dist-packages/evalscope/cli/start_perf.py", line 33, in execute
run_perf_benchmark(self.args)
File "/usr/local/lib/python3.10/dist-packages/evalscope/perf/http_client.py", line 669, in run_perf_benchmark
asyncio.run(benchmark(args))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/evalscope/perf/http_client.py", line 626, in benchmark
avg_time_per_token, result_db_path) = await statistic_benchmark_metric_task
File "/usr/local/lib/python3.10/dist-packages/evalscope/perf/http_client.py", line 407, in statistic_benchmark_metric_worker
avg_time_per_token = total_time / n_total_completion_tokens
ZeroDivisionError: float division by zero
root@iv-ydge5uwdtsxjd1ti241r:~# ls
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
- run qwen2_7b on triton server
- use openai_llm relace trition api to openai api
- run benchmark script
Expected behavior
calculator performace of the model serving
actual behavior
Throw error
additional notes
Null

