Skip to content

The normalized_time_per_output_token_seconds metrics is not recorded #939

Open
@delavet

Description

@delavet

What happened:
As this guide says, the normalized_time_per_output_token_seconds metric is supported by EPP, which means "Distribution of ntpot (response latency per output token)".

However, this metric is not actually being recorded in the latest EPP code. The RecordNormalizedTimePerOutputToken function here is only called by unit test.

func RecordNormalizedTimePerOutputToken(ctx context.Context, modelName, targetModelName string, received time.Time, complete time.Time, outputTokenCount int) bool {

What you expected to happen:

normalized_time_per_output_token_seconds should be recored and exposed by EPP after generating streaming response.

How to reproduce it (as minimally and precisely as possible):

I actually discovered this issue while providing e2e testing for metrics. You can refer to my branch (#938). Remove the line marked with TODO and run the e2e tests to observe the problem.

Anything else we need to know?:

I find the guidance documentation for the metrics to also be problematic.

  1. I think the normalized_time_per_output_token_seconds metric mentioned in the document should actually be the inference_model_normalized_time_per_output_token_seconds metric. The prefix for the subsystem is not included here.
  2. The doc says:

To have response metrics, ensure the body mode is set to Buffered or Streamed (this should be the default behavior for all implementations)."

Is the description here somewhat outdated? As far as I know, EPP now exclusively uses FULL_DUPLEX_STREAMED.

Environment:

  • Kubernetes version (use kubectl version):
  • Inference extension version (use git describe --tags --dirty --always):
  • Cloud provider or hardware configuration:
  • Install tools:
  • Others:

Metadata

Metadata

Labels

good first issueDenotes an issue ready for a new contributor, according to the "help wanted" guidelines.help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions