Skip to content

Missing Granular Metrics for upstream_addr and upstream_response_time #14374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JDarzan opened this issue Mar 19, 2025 · 2 comments
Open

Missing Granular Metrics for upstream_addr and upstream_response_time #14374

JDarzan opened this issue Mar 19, 2025 · 2 comments

Comments

@JDarzan
Copy link

JDarzan commented Mar 19, 2025

I am using Kong API Gateway Enterprise (via Konnect Hybrid mode) and Self-Managed servers.

My goal is to monitor individual targets within an upstream to diagnose performance bottlenecks. However, I have realized that there is no native way in Kong to capture detailed metrics for each specific target within an upstream. Both in Datadog and Prometheus, I can see aggregated upstream metrics, but I cannot obtain per-target granularity.

Currently, I can access metrics such as kong.http.requests.count, kong.upstream.latency.ms.bucket, kong.upstream.latency.ms.count, and kong.upstream.latency.ms.sum, as well as general latency metrics like kongdd.upstream_latency.avg and kongdd.upstream_latency.max
However, all these metrics refer to the upstream as a whole, not to each individual target. The only per-target information available in Prometheus is the kong_upstream_target_health metric, which only indicates whether the target is healthy or unhealthy, without any visibility into the number of requests received or the individual response time.

In Kong’s access logs, both upstream_addr and upstream_response_time appear correctly, confirming that Kong knows which target was used and how long it took to respond. However, there is no native way to convert this information into metrics consumable by Datadog or Prometheus. I have attempted multiple approaches, such as modifying the Datadog plugin to include upstream_addr as a tag, creating a Kong Post-Function plugin to add an X-Upstream-Addr header to the response, and even storing the information in kong.ctx.shared within a Pre-Function plugin, but in all cases, the values were not available in the logging phase.

Since Kong already has internal access to upstream_addr and upstream_response_time, why is it so difficult to expose them as metrics? The lack of this granularity makes it challenging to monitor individual targets within an upstream, preventing precise identification of specific instances that may be causing performance issues. Is there a technical limitation preventing this feature from being implemented, or is there a recommended approach to efficiently work around this problem within Kong?

@chobits
Copy link
Contributor

chobits commented May 7, 2025

Can Kong Expose These Metrics?

It is technically possible for Kong to expose these metrics. The main effort would involve:

  1. Developing New Metric Collection Logic: Implementing new code within Kong to specifically track and expose upstream_addr and upstream_response_time on a per-target basis.
  2. Integrating with Metric Reporting Plugins: Ensuring that existing or new metric reporting plugins (like Prometheus or Datadog) can access and format this granular data.

Developing this feature will require some effort. Since you're already an Enterprise user, I suggest opening a support ticket with the Enterprise Team if this functionality is critical for you.

While it might seem simple, this small feature involves intricate work and will change the log format, which could potentially be a breaking change to other users.

@chobits
Copy link
Contributor

chobits commented May 7, 2025

I also find a discussion for similar issue: #8270, which utilize a custom plugin to expose custom variables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants