Skip to content

Add metrics on yamux sessions and active AdminStreamReplicationMessages#97

Merged
temporal-nick merged 7 commits intomainfrom
nick/monitormuxes
Jun 23, 2025
Merged

Add metrics on yamux sessions and active AdminStreamReplicationMessages#97
temporal-nick merged 7 commits intomainfrom
nick/monitormuxes

Conversation

@temporal-nick
Copy link
Copy Markdown
Collaborator

What was changed

Added some gauges to report the current state of the underlying yamux sessions and the application-level AdminStreamReplicationMessages connections. The gauges for the yamux session have labels for the config name, local and remote address, and session type.

Why?

This should help us validate that the proxy is alive and running correctly.

Checklist

  1. Part of https://temporalio.atlassian.net/browse/CGS-839

  2. How was this tested:
    No unit tests yet. These changes should help improve observability in test while I backfill them.

@temporal-nick temporal-nick requested a review from a team as a code owner June 20, 2025 23:40
Comment thread transport/mux_connection_manager.go Outdated
Comment thread transport/mux_connection_manager.go Outdated
Comment thread transport/mux_connection_manager.go Outdated
Comment on lines +252 to +256
if session.IsClosed() {
metrics.MuxSessionOpen.WithLabelValues(labels...).Set(0)
metrics.MuxStreamsActive.WithLabelValues(labels...).Set(float64(0))
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this block? Or will the loop on L258 handle it correctly?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, removed

@temporal-nick temporal-nick merged commit b48d9bc into main Jun 23, 2025
5 checks passed
@temporal-nick temporal-nick deleted the nick/monitormuxes branch June 23, 2025 17:31
}
defer func() {
// This is an async monitor. Don't let it crash the rest of the program if there's a problem
recover()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log the result of recover

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

// This is an async monitor. Don't let it crash the rest of the program if there's a problem
recover()
}()
labels := []string{session.LocalAddr().String(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially session can be nil even with the check at line 237. Probably rare but maybe there is a better solution?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-checked https://github.com/hashicorp/yamux/blob/master/addr.go#L44, and these are guaranteed non-nil, so long as nothing deliberately deletes session. So I think this should be safe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants