Skip to content

Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side #582

@ajagetia2001

Description

@ajagetia2001

System Info

  • CPU Architecture x86_64
  • GPU - A100-80GB
  • CUDA version - 11
  • Tensorrt LLM version : 0.9.0
  • Triton server version - 2.46.0
  • model : Llama3-7b

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

deploy a llama3-7b model on triton server 2.46.0

Expected behavior

Expected is to get some failure rate in this metrics when nv_inference_request_failure when getting 5xx at the client side

actual behavior

Currently, this value is not getting updated. It is only showing zero even after the server is giving 5xx

additional notes

curl --location --request POST 'http://sampletritonmodel-triton.genai-a100-mh-prod.fkcloud.in/v2/models/ensemble/generate'
--header 'Content-Type: application/json'
{"error":"failed to parse the request JSON buffer: The document is empty. at 0"}%
After getting this error as well I am not getting failure metric count

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions