The normalized_time_per_output_token_seconds metrics is not recorded

**What happened**:
As [this guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/metrics/) says, the `normalized_time_per_output_token_seconds` metric is supported by EPP, which means "Distribution of ntpot (response latency per output token)".

However, this metric is not actually being recorded in the latest EPP code. The `RecordNormalizedTimePerOutputToken`  function here is only called by unit test.
https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/c300d268ad7e90f3b49a08c68092bd81fedd9a02/pkg/epp/metrics/metrics.go#L361
**What you expected to happen**:

`normalized_time_per_output_token_seconds` should be recored and exposed by EPP after generating streaming response.

**How to reproduce it (as minimally and precisely as possible)**:

I actually discovered this issue while providing e2e testing for metrics. You can refer to my branch (https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/938). Remove the line marked with TODO and run the e2e tests to observe the problem.

**Anything else we need to know?**:

I find the [guidance documentation](https://gateway-api-inference-extension.sigs.k8s.io/guides/metrics/) for the metrics to also be problematic.
1. I think the `normalized_time_per_output_token_seconds` metric mentioned in the document should actually be the `inference_model_normalized_time_per_output_token_seconds` metric. The prefix for the subsystem is not included here.
2. The doc says: 

>  To have response metrics, ensure the body mode is set to Buffered or Streamed (this should be the default behavior for all implementations)."

Is the description here somewhat outdated? As far as I know, EPP now exclusively uses `FULL_DUPLEX_STREAMED`.

**Environment**:
- Kubernetes version (use `kubectl version`):
- Inference extension version (use `git describe --tags --dirty --always`):
- Cloud provider or hardware configuration:
- Install tools:
- Others:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The normalized_time_per_output_token_seconds metrics is not recorded #939

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The normalized_time_per_output_token_seconds metrics is not recorded #939

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions