Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side

### System Info

- CPU Architecture x86_64
- GPU - A100-80GB
- CUDA version - 11
- Tensorrt LLM version : 0.9.0
- Triton server version - 2.46.0
- model : Llama3-7b 

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

deploy a llama3-7b model on triton server 2.46.0


### Expected behavior

Expected is to get some failure rate in this metrics when nv_inference_request_failure when getting 5xx at the client side 

### actual behavior

Currently, this value is not getting updated. It is only showing zero even after the server is giving 5xx

### additional notes

curl --location --request POST 'http://sampletritonmodel-triton.genai-a100-mh-prod.fkcloud.in/v2/models/ensemble/generate' \
--header 'Content-Type: application/json'
{"error":"failed to parse the request JSON buffer: The document is empty. at 0"}%   
After getting this error as well I am not getting failure metric count 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side #582

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side #582

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions