System Info
- CPU Architecture x86_64
- GPU - A100-80GB
- CUDA version - 11
- Tensorrt LLM version : 0.9.0
- Triton server version - 2.46.0
- model : Llama3-7b
Who can help?
No response
Information
Tasks
Reproduction
deploy a llama3-7b model on triton server 2.46.0
Expected behavior
Expected is to get some failure rate in this metrics when nv_inference_request_failure when getting 5xx at the client side
actual behavior
Currently, this value is not getting updated. It is only showing zero even after the server is giving 5xx
additional notes
curl --location --request POST 'http://sampletritonmodel-triton.genai-a100-mh-prod.fkcloud.in/v2/models/ensemble/generate'
--header 'Content-Type: application/json'
{"error":"failed to parse the request JSON buffer: The document is empty. at 0"}%
After getting this error as well I am not getting failure metric count