Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rule: fix panic when calling API /api/v1/rules?type=alert #6189

Merged
merged 1 commit into from Mar 7, 2023

Conversation

thib-ack
Copy link
Contributor

@thib-ack thib-ack commented Mar 6, 2023

Hello,

Sometimes, when I open the /alerts page on Ruler web UI, I get an error. When it happens, the situation is irremediable and I have to restart the Ruler component completely.. As you can see in the stacktrace linked below, the error look like and old issue fixed in #2925 . After digging, I think this is linked to the AlertInstance.ActiveAt date, which is not 'formatted' to UTC() before protobuf encoding like the ones in the old PR.

I tried to produce a testcase to reproduce this, but unfortunately I failed to find the exact combination of events...
I think this has someting to do with the reload process of Ruler (either SIGHUP or POST /-/reload) which might be copying the alerts and losing/changing the time.Time Locations (which is the real problem with protobuf encoding)..

Here is the full stacktrace from server:

2023/03/03 14:48:44 http: panic serving 10.10.90.171:64321: merger not found for type:int
goroutine 18109 [running]:
net/http.(*conn).serve.func1()
#011/usr/lib/go-1.19/src/net/http/server.go:1850 +0xbf
panic({0x2184f60, 0xc001afbc50})
#011/usr/lib/go-1.19/src/runtime/panic.go:890 +0x262
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo(0xc001d70d80)
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:662 +0xe85
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70d80, {0xc0001029c0?}, {0x2145420?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:113 +0x58
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo.func27({0x40d95f?}, {0x3f5560?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:545 +0x165
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70d00, {0xc0012f01c0?}, {0x25bf1e0?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:139 +0x305
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo.func30({0x40d95f?}, {0xc000670a20?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:587 +0x8b
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70940, {0xc000b8f278?}, {0x94cd06?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:139 +0x305
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo.func30({0x40d95f?}, {0xc000596340?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:587 +0x8b
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70a40, {0xc001e084e0?}, {0x1?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:139 +0x305
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo.func29({0x1609a0?}, {0x0?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:567 +0xf2
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70980, {0x2356140?}, {0x4?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:139 +0x305
github.com/gogo/protobuf/proto.(*InternalMessageInfo).Merge(0x40b8bd?, {0x2bee3b0, 0xc0007ace40}, {0x2bee3b0, 0xc001cfc300})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:50 +0xb6
github.com/thanos-io/thanos/pkg/rules/rulespb.(*Alert).XXX_Merge(0x3e4fec0?, {0x2bee3b0?, 0xc001cfc300?})
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/rules/rulespb/rpc.pb.go:486 +0x3a
github.com/gogo/protobuf/proto.Merge({0x2bee3b0?, 0xc0007ace40}, {0x2bee3b0?, 0xc001cfc300})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/clone.go:95 +0x4a3
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo.func32({0x40d95f?}, {0x62c5a0?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:652 +0x686
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70900, {0xc001afbc40?}, {0x8?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:139 +0x305
github.com/gogo/protobuf/proto.(*mergeInfo).computeMergeInfo.func29({0x3fbce49454356161?}, {0x408c200000000000?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:567 +0xf2
github.com/gogo/protobuf/proto.(*mergeInfo).merge(0xc001d70840, {0x418a880?}, {0x2?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:139 +0x305
github.com/gogo/protobuf/proto.(*InternalMessageInfo).Merge(0x40b8bd?, {0x2bee470, 0xc0012f0150}, {0x2bee470, 0xc0012f00e0})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/table_merge.go:50 +0xb6
github.com/thanos-io/thanos/pkg/rules/rulespb.(*RuleGroup).XXX_Merge(0x3e4fec0?, {0x2bee470?, 0xc0012f00e0?})
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/rules/rulespb/rpc.pb.go:310 +0x3a
github.com/gogo/protobuf/proto.Merge({0x2bee470?, 0xc0012f0150}, {0x2bee470?, 0xc0012f00e0})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/clone.go:95 +0x4a3
github.com/gogo/protobuf/proto.Clone({0x2bee470?, 0xc0012f00e0?})
#011/home/jenkins/go/pkg/mod/github.com/gogo/protobuf@v1.3.2/proto/clone.go:52 +0x1a5
github.com/thanos-io/thanos/pkg/rules.(*Manager).Rules(0xc000bae300, 0xc000bd0080, {0x2c02670, 0xc001c3c060})
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/rules/manager.go:409 +0x219
github.com/thanos-io/thanos/pkg/rules.(*GRPCClient).Rules(0xc0001721c8, {0x2bf4408?, 0xc001d8c450?}, 0xc000bd0080)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/rules/rules.go:60 +0x174
github.com/thanos-io/thanos/pkg/api/query.NewRulesHandler.func1.3({0x2bf4408?, 0xc001d8c450?})
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/api/query/v1.go:990 +0x58
github.com/thanos-io/thanos/pkg/tracing.DoInSpan({0x2bf4408?, 0xc001d8c390?}, {0x26a1d4a?, 0x7?}, 0xc0019dec60, {0x0?, 0x0?, 0x7f78c0b90108?})
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/tracing/tracing.go:95 +0xa3
github.com/thanos-io/thanos/pkg/api/query.NewRulesHandler.func1(0xc001104400)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/api/query/v1.go:989 +0x485
github.com/thanos-io/thanos/pkg/api.GetInstr.func1.1({0x2be9b80, 0xc0012f0000}, 0x4?)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/api/api.go:211 +0x50
net/http.HandlerFunc.ServeHTTP(0xc00110c1e0?, {0x2be9b80?, 0xc0012f0000?}, 0x2bce5cc?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/thanos-io/thanos/pkg/logging.(*HTTPServerMiddleware).HTTPMiddleware.func1({0x2be9b80?, 0xc0012f0000}, 0xc001104400)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/logging/http.go:69 +0x3b8
net/http.HandlerFunc.ServeHTTP(0x2bf4408?, {0x2be9b80?, 0xc0012f0000?}, 0x2bcee48?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/thanos-io/thanos/pkg/server/http/middleware.RequestID.func1({0x2be9b80, 0xc0012f0000}, 0xc001104200)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/server/http/middleware/request_id.go:40 +0x542
net/http.HandlerFunc.ServeHTTP(0x2184f60?, {0x2be9b80?, 0xc0012f0000?}, 0x4?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1({0x2bedf60, 0xc000bd0020}, 0x490001?)
#011/home/jenkins/go/pkg/mod/github.com/!n!y!times/gziphandler@v1.1.1/gzip.go:338 +0x26f
net/http.HandlerFunc.ServeHTTP(0x7f7899a5efff?, {0x2bedf60?, 0xc000bd0020?}, 0xc0019df170?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/thanos-io/thanos/pkg/extprom/http.httpInstrumentationHandler.func1({0x7f78993897e0?, 0xc0017f80a0}, 0xc001104200)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/extprom/http/instrument_server.go:75 +0x10b
net/http.HandlerFunc.ServeHTTP(0x7f78993897e0?, {0x7f78993897e0?, 0xc0017f80a0?}, 0xc001d8c270?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerResponseSize.func1({0x7f78993897e0?, 0xc0017f8050?}, 0xc001104200)
#011/home/jenkins/go/pkg/mod/github.com/prometheus/client_golang@v1.14.0/prometheus/promhttp/instrument_server.go:288 +0xc5
net/http.HandlerFunc.ServeHTTP(0x7f78993897e0?, {0x7f78993897e0?, 0xc0017f8050?}, 0x0?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x7f78993897e0?, 0xc0017f8000?}, 0xc001104200)
#011/home/jenkins/go/pkg/mod/github.com/prometheus/client_golang@v1.14.0/prometheus/promhttp/instrument_server.go:146 +0xb8
net/http.HandlerFunc.ServeHTTP(0x22c9b80?, {0x7f78993897e0?, 0xc0017f8000?}, 0x6?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/thanos-io/thanos/pkg/extprom/http.instrumentHandlerInFlight.func1({0x7f78993897e0, 0xc0017f8000}, 0xc001104200)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/extprom/http/instrument_server.go:162 +0x169
net/http.HandlerFunc.ServeHTTP(0x2bf1530?, {0x7f78993897e0?, 0xc0017f8000?}, 0xc00180b698?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerRequestSize.func1({0x2bf1530?, 0xc00160c0e0?}, 0xc001104200)
#011/home/jenkins/go/pkg/mod/github.com/prometheus/client_golang@v1.14.0/prometheus/promhttp/instrument_server.go:238 +0xc5
net/http.HandlerFunc.ServeHTTP(0x2bf4408?, {0x2bf1530?, 0xc00160c0e0?}, 0x418a220?)
#011/usr/lib/go-1.19/src/net/http/server.go:2109 +0x2f
github.com/thanos-io/thanos/pkg/tracing.HTTPMiddleware.func1({0x2bf1530, 0xc00160c0e0}, 0xc001104100)
#011/home/jenkins/workspace/arty-foss_thanos_alcatel_v0.30.2/pkg/tracing/http.go:62 +0x9a2
github.com/prometheus/common/route.(*Router).handle.func1({0x2bf1530, 0xc00160c0e0}, 0xc001104000, {0x0, 0x0, 0x478d2e?})
#011/home/jenkins/go/pkg/mod/github.com/prometheus/common@v0.37.1/route/route.go:83 +0x18d
github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc001a045a0, {0x2bf1530, 0xc00160c0e0}, 0xc001104000)
#011/home/jenkins/go/pkg/mod/github.com/julienschmidt/httprouter@v1.3.0/router.go:387 +0x81c
github.com/prometheus/common/route.(*Router).ServeHTTP(0xc00180baf0?, {0x2bf1530?, 0xc00160c0e0?}, 0x0?)
#011/home/jenkins/go/pkg/mod/github.com/prometheus/common@v0.37.1/route/route.go:126 +0x26
net/http.(*ServeMux).ServeHTTP(0xc0006ea042?, {0x2bf1530, 0xc00160c0e0}, 0xc001104000)
#011/usr/lib/go-1.19/src/net/http/server.go:2487 +0x149
net/http.serverHandler.ServeHTTP({0xc001d8c090?}, {0x2bf1530, 0xc00160c0e0}, 0xc001104000)
#011/usr/lib/go-1.19/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc0017fa000, {0x2bf4408, 0xc000b12690})
#011/usr/lib/go-1.19/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
#011/usr/lib/go-1.19/src/net/http/server.go:3102 +0x4db
  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

fpetkovski
fpetkovski previously approved these changes Mar 6, 2023
Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's try this out.

saswatamcode
saswatamcode previously approved these changes Mar 7, 2023
Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing! 🙂

In the rules proto spec, we have four stdtime fields, the last_evaluation of RuleGroup, RecordingRule, and Alert and this active_at in AlertInstance. The gogo/protobuf issue does specify that the field needs to be non-nullable, but no harm in trying this out.

@@ -97,12 +97,14 @@ func ActiveAlertsToProto(s storepb.PartialResponseStrategy, a *rules.AlertingRul
active := a.ActiveAlerts()
ret := make([]*rulespb.AlertInstance, len(active))
for i, ruleAlert := range active {
// https://github.com/gogo/protobuf/issues/519
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep comments consistent with the other workarounds as well.

Suggested change
// https://github.com/gogo/protobuf/issues/519
// UTC needed due to https://github.com/gogo/protobuf/issues/519.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,
I updated the comment (+ another similar one found in the codebase)

Signed-off-by: Thibaut Ackermann <thibaut.ackermann@al-enterprise.com>
@saswatamcode saswatamcode enabled auto-merge (squash) March 7, 2023 08:43
@saswatamcode saswatamcode merged commit dabbeda into thanos-io:main Mar 7, 2023
14 checks passed
junotx pushed a commit to junotx/thanos that referenced this pull request Aug 3, 2023
…6189)

Signed-off-by: Thibaut Ackermann <thibaut.ackermann@al-enterprise.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants