-
Notifications
You must be signed in to change notification settings - Fork 107
refactored updateConnectionMetricCache to report connection-metrics correctly #1377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors how connection metrics are reported to ensure the connection metric cache is updated correctly. Key changes include:
- Removing the tcpConns parameter from updateConnectionMetricCache and its tests.
- Updating updateConnectionMetricCache to use requestMetric data directly for computing metric differences.
- Changing the access log condition to skip logging based on totalReports instead of a duration threshold.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
File | Description |
---|---|
pkg/controller/telemetry/metric_test.go | Removed the tcpConns field from the test struct and updated the call to updateConnectionMetricCache. |
pkg/controller/telemetry/metric.go | Refactored updateConnectionMetricCache to remove the tcpConns parameter and the duration threshold check, using data fields instead. |
pkg/controller/telemetry/accesslog.go | Modified the condition for skipping access log output from a duration threshold to checking totalReports. |
@hzxuzhonghu can u review it |
Codecov ReportAttention: Patch coverage is
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not fix. Please take a look at #1378.
I have looked to #1378, this pr will work as successor of it, this pr refactor the updateConnMetricCache code to give correct connectionMetrics @hzxuzhonghu |
Let me have a look, it's great if you could describe the bug a little bit |
pkg/controller/telemetry/metric.go
Outdated
@@ -508,7 +506,7 @@ func (m *MetricController) Run(ctx context.Context, mapOfTcpInfo *ebpf.Map) { | |||
} | |||
|
|||
connectionLabels := connectionMetricLabels{} | |||
if m.EnableConnectionMetric.Load() && data.duration > LONG_CONN_METRIC_THRESHOLD { | |||
if m.EnableConnectionMetric.Load() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you refactor a little more to make it together with L522. We have so many duplicate flag checking in this function
pkg/controller/telemetry/metric.go
Outdated
@@ -908,7 +906,7 @@ func (m *MetricController) updateServiceMetricCache(data requestMetric, labels s | |||
} | |||
} | |||
|
|||
func (m *MetricController) updateConnectionMetricCache(data requestMetric, connData connMetric, labels connectionMetricLabels) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this is just a cleanup, because data
contains only delta sent received bytes while connData
contains datas till now
currently busy i will complete this after 1-2 days |
ok, so this may not be back to release-1.1 |
This in needed in release, for correct connection-metrics, can u please wait 2 days for the release, currently having a last exam tomorrow. @hzxuzhonghu |
After review, i thought this is kind of cleanup. Not sure what bug it is fixing. |
pkg/controller/telemetry/metric.go
Outdated
@@ -508,7 +506,7 @@ func (m *MetricController) Run(ctx context.Context, mapOfTcpInfo *ebpf.Map) { | |||
} | |||
|
|||
connectionLabels := connectionMetricLabels{} | |||
if m.EnableConnectionMetric.Load() && data.duration > LONG_CONN_METRIC_THRESHOLD { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean we should remove data.duration > LONG_CONN_METRIC_THRESHOLD
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this const to only create conn metrics for the connections which have passed the 5 sec duration. But this is not necessary as we already flushing the metrics after 5 sec,
pkg/controller/telemetry/metric.go
Outdated
@@ -47,8 +47,6 @@ const ( | |||
metricFlushInterval = 5 * time.Second | |||
|
|||
DEFAULT_UNKNOWN = "-" | |||
|
|||
LONG_CONN_METRIC_THRESHOLD = uint64(5 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we add this const before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this const to only create conn metrics for the connections which have passed the 5 sec duration. But this is not necessary as we already flushing the metrics after 5 sec,
@hzxuzhonghu can u review and merge |
@yp969803 You can rebase on your branch by |
pkg/controller/telemetry/metric.go
Outdated
@@ -508,7 +506,7 @@ func (m *MetricController) Run(ctx context.Context, mapOfTcpInfo *ebpf.Map) { | |||
} | |||
|
|||
connectionLabels := connectionMetricLabels{} | |||
if m.EnableConnectionMetric.Load() && reqMetric.duration > LONG_CONN_METRIC_THRESHOLD { | |||
if m.EnableConnectionMetric.Load() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we only generate connection metrics for connection duration > 5s?
Otherwise how do you filter out the connection establishment report
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yp969803 Please explain #1377 (comment) |
kmesh/pkg/controller/telemetry/metric.go Line 970 in e908419
filtering long connections at the time of report to prometheus |
I am keeping the information of connection metrics from the start at the cache so I can calculate the correct metrics, |
we can only use delta of connection metrics here, after the 5 sec duration of a connection we have to report the metrics till 5 sec, |
i have to come up with this approach
|
@hzxuzhonghu did u understand !! |
No I do think At the connection establishment, we donot store tcp connection, it doesnot matter much |
i think you are correct it will not matter much, I will refactor it |
b3d0ff2
to
76fa21a
Compare
@hzxuzhonghu can u look now, i have refactored it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hzxuzhonghu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Yash Patel <yp969803@gmail.com> rfac: TestMetricController_updatePrometheusMetric Signed-off-by: Yash Patel <yp969803@gmail.com>
@hzxuzhonghu waiting for merge |
/lgtm |
What type of PR is this?
/kind bug
What this PR does / why we need it:
refactored updateConnectionMetricCache to report connection-metrics correctly
Which issue(s) this PR fixes:
Fixes #1370
Special notes for your reviewer:
Does this PR introduce a user-facing change?: