metrics: refine batch client metrics#1931
Conversation
Signed-off-by: zyguan <zhongyangguan@gmail.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughPer-store batch-request stage metrics were added and wired through batch client paths: entries record atomic batched/sent/recv timestamps; per-store Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant ConnPool as ConnPool
participant StoreMetrics as StoreMetrics
participant BatchClient as BatchClient
participant BatchEntry as BatchEntry
participant Observer as Observer
rect rgba(100,200,150,0.5)
Note over Client,ConnPool: Start request — fetch per-store metrics
Client->>ConnPool: sendRequest(req)
ConnPool->>StoreMetrics: getStoreMetrics(storeID)
ConnPool-->>Client: storeMetrics (includes batchReqStage)
end
rect rgba(150,150,200,0.5)
Note over Client,BatchEntry: Create entry and record batched stage
Client->>BatchEntry: create entry + attach storeMetrics.batchReqStage
Client->>BatchEntry: recordBatchRequestStage(batched)
end
rect rgba(200,150,150,0.5)
Note over Client,BatchClient: Send path records sent stage
Client->>BatchClient: send via stream (connIdx metadata)
BatchClient->>BatchEntry: recordBatchRequestStage(sent)
end
rect rgba(150,200,150,0.5)
Note over BatchEntry,Observer: Recv path records recv and emits observations
BatchClient->>BatchEntry: receive response -> recordBatchRequestStage(recv)
BatchEntry->>Observer: visitBatchRequestObservations -> observe(stage,outcome,duration)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
80b2be9 to
723f512
Compare
Signed-off-by: zyguan <zhongyangguan@gmail.com>
723f512 to
4b701c5
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/client/client.go`:
- Around line 342-346: The deferred call currently invokes the removed
connPool.updateRPCMetrics and must instead call the store-specific metrics
updater; capture elapsed as already done and replace
connPool.updateRPCMetrics(req, resp, elapsed) with
storeMetrics.updateRPCMetrics(req, resp, elapsed) so the sync path delegates to
the same storeMetrics implementation used by the async path (see
connPool.getStoreMetrics, start, req, resp, elapsed, and
storeMetrics.updateRPCMetrics).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 48eee403-94f7-4d36-bd24-b24f75b094c3
📒 Files selected for processing (8)
internal/client/client.gointernal/client/client_async.gointernal/client/client_batch.gointernal/client/client_test.gointernal/client/conn_batch.gointernal/client/conn_pool.gometrics/metrics.gometrics/shortcuts.go
💤 Files with no reviewable changes (1)
- metrics/shortcuts.go
|
/retest |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/client/client.go (1)
833-883: Consider documenting that non-OKdonestage outcomes are intentionally not observed.The
observemethod only recordsdoneOKfor thebatchRequestStageDonestage (lines 875-878), silently ignoring other outcomes. This appears intentional since failed requests would have already recorded their failure in a prior stage, but a brief comment would clarify this design choice for future maintainers.💡 Suggested comment addition
case batchRequestStageDone: + // Only OK outcome is recorded for done stage; failure outcomes are captured in earlier stages if outcome == batchRequestOutcomeOK { observer = m.doneOK }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/client/client.go` around lines 833 - 883, The observe method on batchRequestStageMetrics only records metrics for batchRequestStageDone when outcome == batchRequestOutcomeOK and ignores non-OK outcomes; add a brief explanatory comment inside observe near the batchRequestStageDone handling (referencing batchRequestStageMetrics.observe, batchRequestStageDone, batchRequestOutcomeOK, and the doneOK observer) stating that non-OK done outcomes are intentionally not observed because failures are recorded in earlier stages, so future maintainers understand this design choice.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/client/client.go`:
- Around line 833-883: The observe method on batchRequestStageMetrics only
records metrics for batchRequestStageDone when outcome == batchRequestOutcomeOK
and ignores non-OK outcomes; add a brief explanatory comment inside observe near
the batchRequestStageDone handling (referencing
batchRequestStageMetrics.observe, batchRequestStageDone, batchRequestOutcomeOK,
and the doneOK observer) stating that non-OK done outcomes are intentionally not
observed because failures are recorded in earlier stages, so future maintainers
understand this design choice.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4bae2e87-4e1b-47fa-80a3-89d52300aea9
📒 Files selected for processing (8)
internal/client/client.gointernal/client/client_async.gointernal/client/client_batch.gointernal/client/client_test.gointernal/client/conn_batch.gointernal/client/conn_pool.gometrics/metrics.gometrics/shortcuts.go
💤 Files with no reviewable changes (1)
- metrics/shortcuts.go
🚧 Files skipped from review as they are similar to previous changes (3)
- internal/client/client_test.go
- internal/client/client_async.go
- internal/client/client_batch.go
Signed-off-by: zyguan <zhongyangguan@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/client/client_batch.go (1)
252-257: String-based error matching is fragile.Comparing
cause.Error()against literal strings couples this code to exact error message text. If the error messages at creation sites change, this matching silently fails and falls through tobatchRequestOutcomeFailed.Consider defining sentinel errors and comparing by identity:
♻️ Suggested refactor using sentinel errors
+var ( + errBatchConnClosed = errors.New("batchConn closed") + errBatchClientClosed = errors.New("batch client closed") +) + func batchRequestTerminalOutcome(err error) batchRequestOutcome { if err == nil { return batchRequestOutcomeOK } cause := errors.Cause(err) switch cause { case context.DeadlineExceeded: return batchRequestOutcomeTimeout case context.Canceled: return batchRequestOutcomeCanceled + case errBatchConnClosed, errBatchClientClosed: + return batchRequestOutcomeClosed + default: + return batchRequestOutcomeFailed } - - switch cause.Error() { - case "batchConn closed", "batch client closed": - return batchRequestOutcomeClosed - default: - return batchRequestOutcomeFailed - } }Then update the error creation sites (e.g., line 1217, 1236, and in
failAsyncRequestsOnClose) to use these sentinels.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/client/client_batch.go` around lines 252 - 257, The code is fragile because it matches error strings from cause.Error(); replace this with sentinel errors and identity/wrapping-aware checks: declare package-level sentinel vars (e.g., ErrBatchConnClosed, ErrBatchClientClosed) and change the switch in the handler to use errors.Is(cause, ErrBatchConnClosed) / errors.Is(cause, ErrBatchClientClosed) (or direct equality if not wrapped) to return batchRequestOutcomeClosed, else batchRequestOutcomeFailed; then update all creation sites that currently return matching string errors (e.g., where errors are created for closed connection/client and in failAsyncRequestsOnClose) to return the corresponding sentinel error variables instead so the comparisons remain robust.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/client/client_batch.go`:
- Around line 1044-1046: The canceled entry tail latency can be inflated when
entry.sentNS is still zero due to a race between send() and batchRecvLoop;
update the logic around canceledEntryTailLat.Observe to load entry.sentNS into a
local variable (e.g., sent := entry.sentNS.Load()), check that sent != 0 (or
sent < recvNS) and only observe (float64(recvNS-sent)/1e9) when sent is
non-zero/valid; otherwise skip the observation (or record a separate "unknown"
metric) so you don't record recvNS/1e9 when sentNS is uninitialized. Use the
existing symbols entry.sentNS, canceledEntryTailLat, batchRecvLoop and send() to
locate where to add this guard.
---
Nitpick comments:
In `@internal/client/client_batch.go`:
- Around line 252-257: The code is fragile because it matches error strings from
cause.Error(); replace this with sentinel errors and identity/wrapping-aware
checks: declare package-level sentinel vars (e.g., ErrBatchConnClosed,
ErrBatchClientClosed) and change the switch in the handler to use
errors.Is(cause, ErrBatchConnClosed) / errors.Is(cause, ErrBatchClientClosed)
(or direct equality if not wrapped) to return batchRequestOutcomeClosed, else
batchRequestOutcomeFailed; then update all creation sites that currently return
matching string errors (e.g., where errors are created for closed
connection/client and in failAsyncRequestsOnClose) to return the corresponding
sentinel error variables instead so the comparisons remain robust.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bee95a27-8d11-420d-bdc7-a5c6afb8d197
📒 Files selected for processing (2)
internal/client/client_batch.gometrics/metrics.go
🚧 Files skipped from review as they are similar to previous changes (1)
- metrics/metrics.go
Signed-off-by: zyguan <zhongyangguan@gmail.com>
| for _, requestID := range req.RequestIds { | ||
| value, ok := c.batched.Load(requestID) | ||
| if !ok { | ||
| continue |
There was a problem hiding this comment.
Does it mean the request is responsed very quickly? Will it introduce any confusing if we skip to upate the entry.sentNS ?
There was a problem hiding this comment.
Also, I see a similar skip in failRequestsByIDs:
func (c *batchCommandsClient) failRequestsByIDs(err error, requestIDs []uint64) {
for _, requestID := range requestIDs {
value, ok := c.batched.Load(requestID)
if !ok {
continue
}
c.failRequest(err, requestID, value.(*batchCommandsEntry))
}
}
Do we need to assert intest.Assert(ok) here?And if not ok,maybe something wrong,maybe we can also output some logs.
| batchedNS atomic.Int64 | ||
| sentNS atomic.Int64 | ||
| recvNS atomic.Int64 |
There was a problem hiding this comment.
Better to have some comments about this NS, i.e., the interval def, unit, etc..
| // this maybe caused by batchCommandsClient#send meets ambiguous error that request has be sent to TiKV but still report a error. | ||
| // then TiKV will send response back though stream and reach here. | ||
| outdatedRespCount.Inc() | ||
| logutil.BgLogger().Warn("batchRecvLoop receives outdated response", zap.Uint64("requestID", requestID), zap.String("forwardedHost", streamClient.forwardedHost)) |
There was a problem hiding this comment.
Better to add conn in the log fields
| } | ||
|
|
||
| func recordBatchRequestStage(stage *atomic.Int64, start, now time.Time) int64 { | ||
| elapsed := batchRequestElapsedNS(start, now) |
There was a problem hiding this comment.
Could we set elapsed to max(1, elapsed) to distinguish not set and a very small duration?
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/client/client_batch.go (1)
251-257: String-based error matching is fragile and requires synchronization across multiple creation sites.Comparing
cause.Error()against string literals will silently break if error messages change in any of the 6 locations where "batchConn closed" and "batch client closed" are created. Consider defining sentinel errors at the package level and using identity comparison:♻️ Suggested approach
Define sentinels once:
var ( errBatchConnClosed = errors.New("batchConn closed") errBatchClientClosed = errors.New("batch client closed") )Replace all six
errors.New()calls inclient_async.goandclient_batch.gowith these sentinels, then update the matching logic:func batchRequestTerminalOutcome(err error) batchRequestOutcome { if err == nil { return batchRequestOutcomeOK } cause := errors.Cause(err) switch cause { case context.DeadlineExceeded: return batchRequestOutcomeTimeout case context.Canceled: return batchRequestOutcomeCanceled case errBatchConnClosed, errBatchClientClosed: return batchRequestOutcomeClosed default: return batchRequestOutcomeFailed } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/client/client_batch.go` around lines 251 - 257, The code in batchRequestTerminalOutcome currently matches on cause.Error() string values which is brittle; define package-level sentinel errors (e.g., errBatchConnClosed and errBatchClientClosed) and replace all occurrences of errors.New("batchConn closed") / errors.New("batch client closed") in client_async.go and client_batch.go with those sentinel variables, then update batchRequestTerminalOutcome to compare the unwrapped cause against the sentinels (using ==) instead of matching strings; keep existing handling for context.DeadlineExceeded and context.Canceled and ensure errors.Cause(err) is used before the equality checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/client/client_batch.go`:
- Around line 251-257: The code in batchRequestTerminalOutcome currently matches
on cause.Error() string values which is brittle; define package-level sentinel
errors (e.g., errBatchConnClosed and errBatchClientClosed) and replace all
occurrences of errors.New("batchConn closed") / errors.New("batch client
closed") in client_async.go and client_batch.go with those sentinel variables,
then update batchRequestTerminalOutcome to compare the unwrapped cause against
the sentinels (using ==) instead of matching strings; keep existing handling for
context.DeadlineExceeded and context.Canceled and ensure errors.Cause(err) is
used before the equality checks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: baae8de4-988c-4204-b709-8593d4603f64
📒 Files selected for processing (3)
internal/client/client.gointernal/client/client_batch.gointernal/client/conn_pool.go
✅ Files skipped from review due to trivial changes (1)
- internal/client/conn_pool.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/client/client.go
Signed-off-by: zyguan <zhongyangguan@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/client/client_batch.go (1)
238-257: Use sentinel errors instead of string matching for close classification.The
batchRequestTerminalOutcomefunction maps literal error strings to theclosedoutcome. If any error message is changed at the return sites (lines 898, 1239, 1258 in client_batch.go or lines 159, 177, 184 in client_async.go), the metric will silently mislabel those requests asfailedinstead ofclosed. Replace the strings with a shared sentinel error variable to prevent this fragility.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/client/client_batch.go` around lines 238 - 257, batchRequestTerminalOutcome currently matches error text literals to map closed outcomes; define a shared sentinel error (e.g., ErrBatchClientClosed) and replace the string-based switch with an errors.Is or direct equality check against that sentinel in batchRequestTerminalOutcome, and update the places that create/return the "batchConn closed"/"batch client closed" errors (the return sites in client_batch.go and client_async.go) to return or wrap that sentinel so the classification is robust to message changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/client/client_batch.go`:
- Around line 1005-1009: The code currently uses recvLoopStartTime as the start
for both the "recv" and "process" measurements so the process metric
double-counts recv time; after computing recvDur and calling
recvDur.Observe(...) reset the timer (e.g., set processStart := time.Now()) and
use that new timestamp to measure/process duration
(processDur.Observe(processTime.Seconds())), ensuring recvLoopStartTime is only
used for the "recv" series; update locations around recvLoopStartTime,
respRecvTime, recvDur.Observe and the process measurement to use the new
processStart.
---
Nitpick comments:
In `@internal/client/client_batch.go`:
- Around line 238-257: batchRequestTerminalOutcome currently matches error text
literals to map closed outcomes; define a shared sentinel error (e.g.,
ErrBatchClientClosed) and replace the string-based switch with an errors.Is or
direct equality check against that sentinel in batchRequestTerminalOutcome, and
update the places that create/return the "batchConn closed"/"batch client
closed" errors (the return sites in client_batch.go and client_async.go) to
return or wrap that sentinel so the classification is robust to message changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 016ed5e4-c982-48b1-a722-033b3591c784
📒 Files selected for processing (2)
internal/client/client_batch.gointernal/client/client_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/client/client_test.go
Signed-off-by: zyguan <zhongyangguan@gmail.com>
|
/retest |
Signed-off-by: zyguan <zhongyangguan@gmail.com>
Signed-off-by: zyguan <zhongyangguan@gmail.com>
…ient-metrics Signed-off-by: zyguan <zhongyangguan@gmail.com>
Signed-off-by: zyguan <zhongyangguan@gmail.com>
|
/retest |
|
@lcwangchao @ekexium Could you PTAL? |
Signed-off-by: zyguan <zhongyangguan@gmail.com>
Signed-off-by: zyguan <zhongyangguan@gmail.com>
Signed-off-by: zyguan <zhongyangguan@gmail.com>
|
@hbisheng Could you PTAL? |
ekexium
left a comment
There was a problem hiding this comment.
LGTM overall.
Have you run any benchmarks to check for performance regressions?
hbisheng
left a comment
There was a problem hiding this comment.
I reviewed the logic that sets ClientSendTimeNs before sending and measures recv_ts - resp.GetTikvSendTimeNs() after receiving, and it LGTM.
I also ran an AI-assisted review, which didn't surface any critical issues. A few minor suggestions it raised:
-
Cross-node clock assumptions: This relies on client/TiKV wall-clock sync (
UnixNano()vsTikvSendTimeNs). Consider documenting this assumption and clamping or ignoring obvious skew (e.g. negative or extreme values) to avoid metric pollution. -
Potential panic risk: In
batchRecvLoop(),batchState := entry.batchState.Load()is dereferenced based on an invariant. A defensive nil check (warn + continue) could improve robustness against future regressions. -
Metrics stability / label cardinality: This introduces
batch_stream_*metrics with new labels. Please confirmtargetis a stable store address (avoid peer or ephemeral port strings causing high cardinality), and clarify whether any existing metrics are being replaced or renamed (call out if this is a breaking change).
Signed-off-by: zyguan <zhongyangguan@gmail.com>
…ient-metrics Signed-off-by: zyguan <zhongyangguan@gmail.com>
|
|
||
| epoch := atomic.LoadUint64(&c.epoch) | ||
| for { | ||
|
|
| atomic.StoreInt64(&entry.recvLat, int64(respRecvTime.Sub(entry.start))) | ||
| batchState := entry.batchState.Load() | ||
| batchState.firstRespAfterBatchedNS.CompareAndSwap(0, max(respRecvTime.Sub(batchState.batchedAt).Nanoseconds(), int64(1))) | ||
| entry.recvAfterStartNS.CompareAndSwap(0, max(respRecvTime.Sub(entry.start).Nanoseconds(), int64(1))) |
There was a problem hiding this comment.
Better to also record tikvSendTailLat here?
There was a problem hiding this comment.
I prefer not, all durations reported by EntryProgress are observed at tidb side, however, tikvSendTailLat is an estimated duration calculated by cross-nodes wall clock. Track it by metric is ok, but print the exact values to log can be misleading.
There was a problem hiding this comment.
I think the benefit is, if only part of the request of a batch is responded to TiDB, all requests in the batch can get the estimated latency for TiKV, we can confirm whether a timeout request has arrived to TiKV if it presents. Compared with metrics, metrics only displays a whole picture without specified request.
To reduce a misleading, we can just record two fields here:
- Whether this request is confirmed to arrivied to TiKV.
- Only if the request arrived at TiKV too late (for example, tikvTime - tidbTime > 5s), we record the latency. Otherwise, the slow log does not display it.
There was a problem hiding this comment.
- Whether this request is confirmed to arrivied to TiKV.
It's already included in this PR, please search "ack:" for details.
- Only if the request arrived at TiKV too late (for example, tikvTime - tidbTime > 5s), we record the latency. Otherwise, the slow log does not display it.
I wonder how would you like to use the tikvSendTailLat if it's recorded in entry as a field? log it? when?
There was a problem hiding this comment.
This info can be be recorded in the error when waiting for the recvLoop timeout:
client-go/internal/client/client_batch.go
Lines 969 to 978 in d1ffcbb
There was a problem hiding this comment.
For example, if a batch contains 10 requests, as one of the requests responds, all the requests in this batch can be updated.
There was a problem hiding this comment.
For example, if a batch contains 10 requests, as one of the requests responds, all the requests in this batch can be updated.
It's already included, please search ack: as I mentioned in #1931 (comment) , and I think this is not the tikvSendTailLat you originally mentioned, BTW.
If you mean recording the tidbRecvTimeNs - tikvSendTimeNs of the very first response returned within the batch into the current timed-out request, wouldn't that be even more misleading?
There was a problem hiding this comment.
My point is not that every request should log this. Rather, when tidbRecvTimeNs - tikvSendTimeNs is unusually large—for example, greater than 5000000000 (5s)—we could add a new log field, slowTiKVResponse to print both tidbRecvTimeNs and tikvSendTimeNs. We do not need to record this in every entry. It is sufficient to log it only when the condition is met, so that we can retrieve this information when needed.
There was a problem hiding this comment.
But it's still fine for me to approve this PR because we added a lot of statistics in this PR. Feel free to ask for approval for me once you think this PR is ready to merge
There was a problem hiding this comment.
f44bd00 PTAL. I would like to emphasize a crucial point: just because something does not typically occur does not mean it cannot happen. In this context, the condition for triggering log entries should not be based solely on tidbRecvTimeNs - tikvSendTimeNs exceeding a specific threshold; otherwise, should a significant clock skew arise between nodes, it would result in a massive volume of false positive log entries.
| if dur := sendLoopEndTime.Sub(headRecvTime); dur > batchSendTailLatThreshold { | ||
| a.metrics.batchSendTailLat.Observe(dur.Seconds()) | ||
| } | ||
| if sendLoopEndTime.Sub(lastPendingInspectAt) >= batchRequestInspectInterval { |
There was a problem hiding this comment.
I think we can also call inspectPendingRequests in line 233 after the loop exit for idle reason
Signed-off-by: zyguan <zhongyangguan@gmail.com>
|
/retest |
Signed-off-by: zyguan <zhongyangguan@gmail.com>
| return false | ||
| } | ||
| requestID := entry.requestID.Load() | ||
| return requestID > 0 && requestID < stream.maxRespReqID.Load() |
There was a problem hiding this comment.
Should it be requestID <= stream.maxRespReqID.Load()
There was a problem hiding this comment.
Yes, since the request whose id equals to maxRespReqID must have firstRespAfterSendStartNS, here < won't lead to the request be considered as unconfirmed, however, to make the function semantics more accurate, it would be better to use <=.
…ient-metrics Signed-off-by: zyguan <zhongyangguan@gmail.com>
Signed-off-by: zyguan <zhongyangguan@gmail.com>
|
/retest |
Signed-off-by: zyguan <zhongyangguan@gmail.com>
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cfzjywxk, ekexium The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |


This PR refines the TiKV batch client metrics system:
TiKVBatchRecvLoopDurationbecomesTiKVBatchStreamRecvLoopDuration, with additional labels (conn, forward) for better granularity.TiKVBatchRequestDurationtoTiKVBatchRequestStageDurationand adds a stage label to break down latency by request lifecycle stages.Summary by CodeRabbit
New Features
Tests