Skip to content

Silence count metric collection #4358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rajagopalanand opened this issue Apr 19, 2025 · 2 comments
Open

Silence count metric collection #4358

rajagopalanand opened this issue Apr 19, 2025 · 2 comments

Comments

@rajagopalanand
Copy link
Contributor

rajagopalanand commented Apr 19, 2025

Currently silence metric collection happens during scrape time. In scenarios where AlertManager is under heavy load, lock contention can occur and causes high latency in scraping. One such scenario is when there are lots of aggregation groups and new silences are being added

Would it be acceptable to collect silences count in the background instead of collecting it at the time of scraping? Doing so reduces latency in scraping by removing lock contention at the time of scraping. Lock contention can still occur in the Goroutine.

Profile captured during high latency in scraping

-----------+-------------------------------------------------------
             runtime.gopark build/lib/src/runtime/proc.go:424
             runtime.goparkunlock build/lib/src/runtime/proc.go:430 (inline)
             runtime.semacquire1 build/lib/src/runtime/sema.go:178
             sync.runtime_SemacquireMutex build/lib/src/runtime/sema.go:95
             sync.(*Mutex).lockSlow build/lib/src/sync/mutex.go:173
             sync.(*Mutex).Lock build/lib/src/sync/mutex.go:92 (inline)
             sync.(*RWMutex).Lock build/lib/src/sync/rwmutex.go:148
             github.com/prometheus/alertmanager/silence.(*Silences).Query /build/gopath/src/github.com/prometheus/alertmanager/silence/silence.go:797
             github.com/prometheus/alertmanager/silence.(*Silencer).Mutes /build/gopath/src/github.com/prometheus/alertmanager/silence/silence.go:145
             github.com/prometheus/alertmanager/notify.(*MuteStage).Exec /build/gopath/src/github.com/prometheus/alertmanager/notify/notify.go:599
             github.com/prometheus/alertmanager/notify.MultiStage.Exec /build/gopath/src/github.com/prometheus/alertmanager/notify/notify.go:512
             github.com/prometheus/alertmanager/notify.RoutingStage.Exec /build/gopath/src/github.com/prometheus/alertmanager/notify/notify.go:495
             github.com/prometheus/alertmanager/dispatch.(*Dispatcher).processAlert.func1 /build/gopath/src/github.com/prometheus/alertmanager/dispatch/dispatch.go:423
             github.com/prometheus/alertmanager/dispatch.(*aggrGroup).run.func1 /build/gopath/src/github.com/prometheus/alertmanager/dispatch/dispatch.go:548
             github.com/prometheus/alertmanager/dispatch.(*aggrGroup).flush /build/gopath/src/github.com/prometheus/alertmanager/dispatch/dispatch.go:611
             github.com/prometheus/alertmanager/dispatch.(*aggrGroup).run /build/gopath/src/github.com/prometheus/alertmanager/dispatch/dispatch.go:547
-----------+-------------------------------------------------------
             runtime.gopark build/lib/src/runtime/proc.go:424
             runtime.goparkunlock build/lib/src/runtime/proc.go:430 (inline)
             runtime.semacquire1 build/lib/src/runtime/sema.go:178
             sync.runtime_SemacquireMutex build/lib/src/runtime/sema.go:95
             sync.(*Mutex).lockSlow build/lib/src/sync/mutex.go:173
             sync.(*Mutex).Lock build/lib/src/sync/mutex.go:92 (inline)
             sync.(*RWMutex).Lock build/lib/src/sync/rwmutex.go:148
             github.com/prometheus/alertmanager/silence.(*Silences).Query /build/gopath/src/github.com/prometheus/alertmanager/silence/silence.go:797
             github.com/prometheus/alertmanager/silence.(*Silences).CountState /build/gopath/src/github.com/prometheus/alertmanager/silence/silence.go:827
             github.com/prometheus/alertmanager/silence.newSilenceMetricByState.func1 /build/gopath/src/github.com/prometheus/alertmanager/silence/silence.go:242
             github.com/prometheus/client_golang/prometheus.(*valueFunc).Write /build/gopath/src/github.com/prometheus/client_golang/prometheus/value.go:95
             github.com/prometheus/client_golang/prometheus.processMetric /build/gopath/src/github.com/prometheus/client_golang/prometheus/registry.go:633
             github.com/prometheus/client_golang/prometheus.(*Registry).Gather /build/gopath/src/github.com/prometheus/client_golang/prometheus/registry.go:502
-----------+-------------------------------------------------------   

PR to collect silence counts in a separate goroutine

@grobinson-grafana
Copy link
Collaborator

I'm more interested to see if this can be made faster instead of offloading it to a goroutine. There is a comment in CountState:

// This could probably be optimized.

Perhaps first look at how to make it faster? The lock still has been acquired, so I would assume under very heavy load, you're just scraping stale silence metrics.

@rajagopalanand
Copy link
Contributor Author

I'm more interested to see if this can be made faster instead of offloading it to a goroutine. There is a comment in CountState:

// This could probably be optimized.

Perhaps first look at how to make it faster? The lock still has been acquired, so I would assume under very heavy load, you're just scraping stale silence metrics.

Can investigate if there are any improvements that could be made. Just want to note that counting silences holds up collection of other metrics too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants