Skip to content

Throw ZeroDivisionError when benchmark  #619

@moyerlee

Description

@moyerlee

System Info

when excuting benchmark workload below, throw ZeroDivisionError. The LLM model generates response tokens, however the calculator does not count right result.
evalscope perf --url 'http://localhost:3000/v1/chat/completions' --parallel 1 --model 'ensemble' --log-every-n-query 10 --read-timeout=60 --dataset-path '/root/Dataset/open_qa.jsonl' -n 1 --max-prompt-length 1000 --max-tokens 100 --api openai --stop '<|im_end|>' --dataset openqa --debug
system infomation: x86, L20 GPU , triton server 0.11.0, tensorrt-llm 0.11.0, openai_trtllm 0.21.0

2024-10-14 21:28:31,788 - perf - http_client.py - on_request_start - 54 - INFO - Starting request: <TraceRequestStartParams(method='POST', url=URL('http://localhost:3000/v1/chat/completions'), headers=<CIMultiDict('Content-Type': 'application/json', 'user-agent': 'modelscope_bench')>)>
2024-10-14 21:28:31,790 - perf - http_client.py - on_request_chunk_sent - 58 - INFO - Request body: TraceRequestChunkSentParams(method='POST', url=URL('http://localhost:3000/v1/chat/completions'), chunk=b'{"messages": [{"role": "user", "content": "\u76d7\u8d3c\u5929\u8d4b\u76d7\u8d3c\u600e\u4e48\u52a0\u5929\u8d4b?\u77e5\u9053\u544a\u8bc9\u4e00\u4e0b\u4e0b\u5566~~"}], "model": "ensemble", "max_tokens": 100, "stop": ["<|im_end|>"]}')
2024-10-14 21:28:34,041 - perf - http_client.py - on_response_chunk_received - 62 - INFO - Response info: <TraceResponseChunkReceivedParams(method='POST', url=URL('http://localhost:3000/v1/chat/completions'), chunk=b'{"id":"cmpl-a6097c78-5c47-44e7-8ad6-3da44429b551","object":"text_completion","created":1728912514,"model":"ensemble","system_fingerprint":null,"choices":[{"index":0,"message":{"role":"assistant","content":"User: \xe7\x9b\x97\xe8\xb4\xbc\xe5\xa4\xa9\xe8\xb5\x8b\xe7\x9b\x97\xe8\xb4\xbc\xe6\x80\x8e\xe4\xb9\x88\xe5\x8a\xa0\xe5\xa4\xa9\xe8\xb5\x8b?\xe7\x9f\xa5\xe9\x81\x93\xe5\x91\x8a\xe8\xaf\x89\xe4\xb8\x80\xe4\xb8\x8b\xe4\xb8\x8b\xe5\x95\xa6~~\nASSISTANT: \xe5\xaf\xb9\xe4\xba\x8e\xe7\x9b\x97\xe8\xb4\xbc\xe5\xa4\xa9\xe8\xb5\x8b\xef\xbc\x8c\xe5\xbb\xba\xe8\xae\xae\xe5\x85\x88\xe9\x80\x89\xe6\x8b\xa9\xe2\x80\x9c\xe6\x89\xab\xe8\x8d\xa1\xe9\x81\x97\xe4\xba\xa7\xe2\x80\x9d\xef\xbc\x8c\xe8\x83\xbd\xe5\xa4\x9f\xe6\x8f\x90\xe9\xab\x98\xe5\x81\xb7\xe5\x8f\x96\xe5\xae\x9d\xe7\xae\xb1\xe7\x9a\x84\xe9\x80\x9f\xe5\xba\xa6\xe3\x80\x81\xe6\x89\x93\xe5\xbc\x80\xe6\xa2\x81\xe7\x9a\x84\xe5\x87\xa0\xe7\x8e\x87\xe4\xbb\xa5\xe5\x8f\x8a\xe6\x8c\x82\xe9\xa5\xb0\xe4\xb8\xa2\xe5\xbc\x83\xe7\x9a\x84\xe9\x87\x91\xe9\x92\xb1\xe3\x80\x82\xe5\xb0\x86\xe6\x9b\xb4\xe5\xa4\x9a\xe7\x82\xb9\xe6\x95\xb0\xe6\x8a\x95\xe5\x85\xa5\xe2\x80\x9c\xe8\xb1\xa1\xe7\x89\x99\xe5\xae\x9d\xe5\x89\x91\xe6\x8a\x80\xe5\xb8\x88\xe2\x80\x9d\xef\xbc\x8c\xe8\xbf\x99\xe5\xb0\x86\xe5\xa4\xa7\xe5\xa4\xa7\xe5\xa2\x9e\xe5\x8a\xa0\xe4\xbd\xa0\xe5\xaf\xb9\xe6\x95\x8c\xe4\xba\xba\xe7\x9a\x84\xe5\x8f\x8d\xe4\xbc\xa4\xe5\x92\x8c\xe9\x98\xb2\xe5\xbe\xa1\xe8\xa7\xa3\xe6\x95\xa3\xe5\x87\xa0\xe7\x8e\x87\xe3\x80\x82\xe5\x90\x8c\xe6\x97\xb6\xef\xbc\x8c\xe6\x88\x91\xe4\xbb\xac\xe5\x8f\xaf\xe4\xbb\xa5\xe8\xae\xa9\xe6\x8a\x80\xe8\x89\xba\xe5\x9c\xa8\xe6\x8a\x80\xe8\x83\xbd\xe7\x82\xb9\xe8\xb6\xb3\xe5\xa4\x9f\xe7\x9a\x84\xe6\x83\x85\xe5\x86\xb5\xe4\xb8\x8b\xe8\xa6\x86\xe7\x9b\x96\xe4\xb8\x8a\xe9\x80\x82\xe5\x90\x88\xe4\xbd\xa0\xe7\x9a\x84\xe5\x85\xb6\xe4\xbb\x96\xe5\xa4\xa9\xe8\xb5\x8b\xe3\x80\x82\xe8\xae\xb0\xe5\xbe\x97\xe4\xb8\x8d\xe6\x96\xad\xe7\xbb\x83\xe4\xb9\xa0\xe6\x93\x8d\xe4\xbd\x9c\xe6\x8a\x80\xe5\xb7\xa7\xef\xbc\x8c\xe5\xb9\xb6\xe9\x80\x82\xe6\x97\xb6\xe4\xbd\xbf\xe7\x94\xa8\xe5\x85\xb3\xe9\x94\xae\xe6\x8a\x80\xe8\x83\xbd\xef\xbc\x8c\xe6\x89\x8d\xe8\x83\xbd\xe5\x9c\xa8\xe6\x88\x98\xe6\x96\x97\xe4\xb8\xad\xe6\x97\xa0\xe5\xbe\x80\xe4\xb8\x8d\xe8\x83\x9c~"},"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}')>
2024-10-14 21:28:34,041 - perf - http_client.py - send_requests_worker - 570 - INFO - {"id": "cmpl-a6097c78-5c47-44e7-8ad6-3da44429b551", "object": "text_completion", "created": 1728912514, "model": "ensemble", "system_fingerprint": null, "choices": [{"index": 0, "message": {"role": "assistant", "content": "User: \u76d7\u8d3c\u5929\u8d4b\u76d7\u8d3c\u600e\u4e48\u52a0\u5929\u8d4b?\u77e5\u9053\u544a\u8bc9\u4e00\u4e0b\u4e0b\u5566~~\nASSISTANT: \u5bf9\u4e8e\u76d7\u8d3c\u5929\u8d4b\uff0c\u5efa\u8bae\u5148\u9009\u62e9\u201c\u626b\u8361\u9057\u4ea7\u201d\uff0c\u80fd\u591f\u63d0\u9ad8\u5077\u53d6\u5b9d\u7bb1\u7684\u901f\u5ea6\u3001\u6253\u5f00\u6881\u7684\u51e0\u7387\u4ee5\u53ca\u6302\u9970\u4e22\u5f03\u7684\u91d1\u94b1\u3002\u5c06\u66f4\u591a\u70b9\u6570\u6295\u5165\u201c\u8c61\u7259\u5b9d\u5251\u6280\u5e08\u201d\uff0c\u8fd9\u5c06\u5927\u5927\u589e\u52a0\u4f60\u5bf9\u654c\u4eba\u7684\u53cd\u4f24\u548c\u9632\u5fa1\u89e3\u6563\u51e0\u7387\u3002\u540c\u65f6\uff0c\u6211\u4eec\u53ef\u4ee5\u8ba9\u6280\u827a\u5728\u6280\u80fd\u70b9\u8db3\u591f\u7684\u60c5\u51b5\u4e0b\u8986\u76d6\u4e0a\u9002\u5408\u4f60\u7684\u5176\u4ed6\u5929\u8d4b\u3002\u8bb0\u5f97\u4e0d\u65ad\u7ec3\u4e60\u64cd\u4f5c\u6280\u5de7\uff0c\u5e76\u9002\u65f6\u4f7f\u7528\u5173\u952e\u6280\u80fd\uff0c\u624d\u80fd\u5728\u6218\u6597\u4e2d\u65e0\u5f80\u4e0d\u80dc~"}, "finish_reason": "stop"}], "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0}}
Traceback (most recent call last):
File "/usr/local/bin/evalscope", line 8, in
sys.exit(run_cmd())
File "/usr/local/lib/python3.10/dist-packages/evalscope/cli/cli.py", line 21, in run_cmd
cmd.execute()
File "/usr/local/lib/python3.10/dist-packages/evalscope/cli/start_perf.py", line 33, in execute
run_perf_benchmark(self.args)
File "/usr/local/lib/python3.10/dist-packages/evalscope/perf/http_client.py", line 669, in run_perf_benchmark
asyncio.run(benchmark(args))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/evalscope/perf/http_client.py", line 626, in benchmark
avg_time_per_token, result_db_path) = await statistic_benchmark_metric_task
File "/usr/local/lib/python3.10/dist-packages/evalscope/perf/http_client.py", line 407, in statistic_benchmark_metric_worker
avg_time_per_token = total_time / n_total_completion_tokens
ZeroDivisionError: float division by zero
root@iv-ydge5uwdtsxjd1ti241r:~# ls

1
2

Who can help?

@byshiue @schetlur-nv

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. run qwen2_7b on triton server
  2. use openai_llm relace trition api to openai api
  3. run benchmark script

Expected behavior

calculator performace of the model serving

actual behavior

Throw error

additional notes

Null

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions