Throw ZeroDivisionError when benchmark 

### System Info

when excuting benchmark workload below,  throw ZeroDivisionError. The LLM model generates response tokens, however the calculator does not count right result.
evalscope perf --url 'http://localhost:3000/v1/chat/completions' --parallel 1 --model 'ensemble' --log-every-n-query 10 --read-timeout=60 --dataset-path '/root/Dataset/open_qa.jsonl' -n 1 --max-prompt-length 1000 --max-tokens 100 --api openai --stop '<|im_end|>' --dataset openqa --debug
system infomation: x86, L20 GPU , triton server 0.11.0,  tensorrt-llm 0.11.0, openai_trtllm  0.21.0
----
2024-10-14 21:28:31,788 - perf - http_client.py - on_request_start - 54 - INFO - Starting request: <TraceRequestStartParams(method='POST', url=URL('http://localhost:3000/v1/chat/completions'), headers=<CIMultiDict('Content-Type': 'application/json', 'user-agent': 'modelscope_bench')>)>
2024-10-14 21:28:31,790 - perf - http_client.py - on_request_chunk_sent - 58 - INFO - Request body: TraceRequestChunkSentParams(method='POST', url=URL('http://localhost:3000/v1/chat/completions'), chunk=b'{"messages": [{"role": "user", "content": "\\u76d7\\u8d3c\\u5929\\u8d4b\\u76d7\\u8d3c\\u600e\\u4e48\\u52a0\\u5929\\u8d4b?\\u77e5\\u9053\\u544a\\u8bc9\\u4e00\\u4e0b\\u4e0b\\u5566~~"}], "model": "ensemble", "max_tokens": 100, "stop": ["<|im_end|>"]}')
2024-10-14 21:28:34,041 - perf - http_client.py - on_response_chunk_received - 62 - INFO - Response info: <TraceResponseChunkReceivedParams(method='POST', url=URL('http://localhost:3000/v1/chat/completions'), chunk=b'{"id":"cmpl-a6097c78-5c47-44e7-8ad6-3da44429b551","object":"text_completion","created":1728912514,"model":"ensemble","system_fingerprint":null,"choices":[{"index":0,"message":{"role":"assistant","content":"User: \xe7\x9b\x97\xe8\xb4\xbc\xe5\xa4\xa9\xe8\xb5\x8b\xe7\x9b\x97\xe8\xb4\xbc\xe6\x80\x8e\xe4\xb9\x88\xe5\x8a\xa0\xe5\xa4\xa9\xe8\xb5\x8b?\xe7\x9f\xa5\xe9\x81\x93\xe5\x91\x8a\xe8\xaf\x89\xe4\xb8\x80\xe4\xb8\x8b\xe4\xb8\x8b\xe5\x95\xa6~~\\nASSISTANT: \xe5\xaf\xb9\xe4\xba\x8e\xe7\x9b\x97\xe8\xb4\xbc\xe5\xa4\xa9\xe8\xb5\x8b\xef\xbc\x8c\xe5\xbb\xba\xe8\xae\xae\xe5\x85\x88\xe9\x80\x89\xe6\x8b\xa9\xe2\x80\x9c\xe6\x89\xab\xe8\x8d\xa1\xe9\x81\x97\xe4\xba\xa7\xe2\x80\x9d\xef\xbc\x8c\xe8\x83\xbd\xe5\xa4\x9f\xe6\x8f\x90\xe9\xab\x98\xe5\x81\xb7\xe5\x8f\x96\xe5\xae\x9d\xe7\xae\xb1\xe7\x9a\x84\xe9\x80\x9f\xe5\xba\xa6\xe3\x80\x81\xe6\x89\x93\xe5\xbc\x80\xe6\xa2\x81\xe7\x9a\x84\xe5\x87\xa0\xe7\x8e\x87\xe4\xbb\xa5\xe5\x8f\x8a\xe6\x8c\x82\xe9\xa5\xb0\xe4\xb8\xa2\xe5\xbc\x83\xe7\x9a\x84\xe9\x87\x91\xe9\x92\xb1\xe3\x80\x82\xe5\xb0\x86\xe6\x9b\xb4\xe5\xa4\x9a\xe7\x82\xb9\xe6\x95\xb0\xe6\x8a\x95\xe5\x85\xa5\xe2\x80\x9c\xe8\xb1\xa1\xe7\x89\x99\xe5\xae\x9d\xe5\x89\x91\xe6\x8a\x80\xe5\xb8\x88\xe2\x80\x9d\xef\xbc\x8c\xe8\xbf\x99\xe5\xb0\x86\xe5\xa4\xa7\xe5\xa4\xa7\xe5\xa2\x9e\xe5\x8a\xa0\xe4\xbd\xa0\xe5\xaf\xb9\xe6\x95\x8c\xe4\xba\xba\xe7\x9a\x84\xe5\x8f\x8d\xe4\xbc\xa4\xe5\x92\x8c\xe9\x98\xb2\xe5\xbe\xa1\xe8\xa7\xa3\xe6\x95\xa3\xe5\x87\xa0\xe7\x8e\x87\xe3\x80\x82\xe5\x90\x8c\xe6\x97\xb6\xef\xbc\x8c\xe6\x88\x91\xe4\xbb\xac\xe5\x8f\xaf\xe4\xbb\xa5\xe8\xae\xa9\xe6\x8a\x80\xe8\x89\xba\xe5\x9c\xa8\xe6\x8a\x80\xe8\x83\xbd\xe7\x82\xb9\xe8\xb6\xb3\xe5\xa4\x9f\xe7\x9a\x84\xe6\x83\x85\xe5\x86\xb5\xe4\xb8\x8b\xe8\xa6\x86\xe7\x9b\x96\xe4\xb8\x8a\xe9\x80\x82\xe5\x90\x88\xe4\xbd\xa0\xe7\x9a\x84\xe5\x85\xb6\xe4\xbb\x96\xe5\xa4\xa9\xe8\xb5\x8b\xe3\x80\x82\xe8\xae\xb0\xe5\xbe\x97\xe4\xb8\x8d\xe6\x96\xad\xe7\xbb\x83\xe4\xb9\xa0\xe6\x93\x8d\xe4\xbd\x9c\xe6\x8a\x80\xe5\xb7\xa7\xef\xbc\x8c\xe5\xb9\xb6\xe9\x80\x82\xe6\x97\xb6\xe4\xbd\xbf\xe7\x94\xa8\xe5\x85\xb3\xe9\x94\xae\xe6\x8a\x80\xe8\x83\xbd\xef\xbc\x8c\xe6\x89\x8d\xe8\x83\xbd\xe5\x9c\xa8\xe6\x88\x98\xe6\x96\x97\xe4\xb8\xad\xe6\x97\xa0\xe5\xbe\x80\xe4\xb8\x8d\xe8\x83\x9c~"},"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}')>
2024-10-14 21:28:34,041 - perf - http_client.py - send_requests_worker - 570 - INFO - {"id": "cmpl-a6097c78-5c47-44e7-8ad6-3da44429b551", "object": "text_completion", "created": 1728912514, "model": "ensemble", "system_fingerprint": null, "choices": [{"index": 0, "message": {"role": "assistant", "content": "User: \u76d7\u8d3c\u5929\u8d4b\u76d7\u8d3c\u600e\u4e48\u52a0\u5929\u8d4b?\u77e5\u9053\u544a\u8bc9\u4e00\u4e0b\u4e0b\u5566~~\nASSISTANT: \u5bf9\u4e8e\u76d7\u8d3c\u5929\u8d4b\uff0c\u5efa\u8bae\u5148\u9009\u62e9\u201c\u626b\u8361\u9057\u4ea7\u201d\uff0c\u80fd\u591f\u63d0\u9ad8\u5077\u53d6\u5b9d\u7bb1\u7684\u901f\u5ea6\u3001\u6253\u5f00\u6881\u7684\u51e0\u7387\u4ee5\u53ca\u6302\u9970\u4e22\u5f03\u7684\u91d1\u94b1\u3002\u5c06\u66f4\u591a\u70b9\u6570\u6295\u5165\u201c\u8c61\u7259\u5b9d\u5251\u6280\u5e08\u201d\uff0c\u8fd9\u5c06\u5927\u5927\u589e\u52a0\u4f60\u5bf9\u654c\u4eba\u7684\u53cd\u4f24\u548c\u9632\u5fa1\u89e3\u6563\u51e0\u7387\u3002\u540c\u65f6\uff0c\u6211\u4eec\u53ef\u4ee5\u8ba9\u6280\u827a\u5728\u6280\u80fd\u70b9\u8db3\u591f\u7684\u60c5\u51b5\u4e0b\u8986\u76d6\u4e0a\u9002\u5408\u4f60\u7684\u5176\u4ed6\u5929\u8d4b\u3002\u8bb0\u5f97\u4e0d\u65ad\u7ec3\u4e60\u64cd\u4f5c\u6280\u5de7\uff0c\u5e76\u9002\u65f6\u4f7f\u7528\u5173\u952e\u6280\u80fd\uff0c\u624d\u80fd\u5728\u6218\u6597\u4e2d\u65e0\u5f80\u4e0d\u80dc~"}, "finish_reason": "stop"}], "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0}}
Traceback (most recent call last):
  File "/usr/local/bin/evalscope", line 8, in <module>
    sys.exit(run_cmd())
  File "/usr/local/lib/python3.10/dist-packages/evalscope/cli/cli.py", line 21, in run_cmd
    cmd.execute()
  File "/usr/local/lib/python3.10/dist-packages/evalscope/cli/start_perf.py", line 33, in execute
    run_perf_benchmark(self.args)
  File "/usr/local/lib/python3.10/dist-packages/evalscope/perf/http_client.py", line 669, in run_perf_benchmark
    asyncio.run(benchmark(args))
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/dist-packages/evalscope/perf/http_client.py", line 626, in benchmark
    avg_time_per_token, result_db_path) = await statistic_benchmark_metric_task
  File "/usr/local/lib/python3.10/dist-packages/evalscope/perf/http_client.py", line 407, in statistic_benchmark_metric_worker
    avg_time_per_token = total_time / n_total_completion_tokens
ZeroDivisionError: float division by zero
root@iv-ydge5uwdtsxjd1ti241r:~# ls

![1](https://github.com/user-attachments/assets/9bfda4cf-d7e4-4734-bdac-d56fa8e8e0fa)
![2](https://github.com/user-attachments/assets/54063c0c-f41b-4ca3-91e3-40e86b069633)

### Who can help?

@byshiue @schetlur-nv

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

1. run qwen2_7b on triton server
2. use openai_llm relace trition api to openai api
3. run benchmark script

### Expected behavior

calculator performace of the model serving

### actual behavior

Throw error

### additional notes

Null

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Throw ZeroDivisionError when benchmark #619

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Throw ZeroDivisionError when benchmark #619

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions