Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with thanos compact #5675

Open
danielmotaleite opened this issue Sep 6, 2022 · 1 comment
Open

Problem with thanos compact #5675

danielmotaleite opened this issue Sep 6, 2022 · 1 comment
Labels

Comments

@danielmotaleite
Copy link
Contributor

Thanos, Prometheus and Golang version used:

quay.io/thanos/thanos:v0.28.0

running inside a k8s setup with this startup options:

  • args:
    • compact
    • --data-dir=/prometheus/tmp
    • --objstore.config-file=/srv/s3.yml
    • --http-address=0.0.0.0:10930
    • -w
    • --wait-interval=5m
    • --log.level=debug
    • --consistency-delay=1h
    • --retention.resolution-raw=900d
    • --retention.resolution-5m=1300d
    • --retention.resolution-1h=1600d
    • --compact.enable-vertical-compaction
    • --deduplication.replica-label='replica'
    • --debug.accept-malformed-index
    • --compact.concurrency=4
    • --selector.relabel-config-file=/prometheus/compact-relabel.yml

Object Storage Provider:
S3

What happened:

During thanos-compact, the process reports the known issue prometheus-junkyard/tsdb#347 but exists just after that.
the exit error is 1, so doesn't look a normal exist ( and i would expect some ending output too)

The s3 data have several years, from much older thanos versions (as old as thanos 0.5 IIRC)
i bypassed thanos compact for several years, as data wasn't that much and had many initial problems with compact, only recently gave it another try

What you expected to happen:
fix the problem and continue to work

How to reproduce it (as minimally and precisely as possible):
can reproduce locally, and probably can download the s3 data and send if needed

Full logs to relevant components:

Logs

level=warn ts=2022-09-06T00:28:16.993082834Z caller=index.go:267 msg="out-of-order label set: known bug in Prometheus 2.8.0 and below" labelset="{__name__=\"zookeeper_znode_count\", address=\"172.26.30.116:9009\", alias=\"172.26.30.116\", dc=\"interxion-fra6\", environment=\"staging\", exported_dc=\"eu-central-1a\", host=\"zookeeper-staging-a02\", id=\"zookeeper-staging-a02\", instance=\"172.26.30.116:9009\", job=\"HostsMetrics\", node=\"zookeeper-staging-a02\", port=\"2181\", region=\"eu-central-1\", rkt=\"true\", server=\"zookeeper-staging-a03\", service=\"telegraf:metrics\", service_id=\"zookeeper-staging-a02:telegraf\", service_port=\"9009\", state=\"follower\", tags=\",http,telegraf,host,output,metrics,alias_zookeeper-staging-a02,eu-central-1a,eu-central-1,\", type=\"type_t2_small\", version=\"3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0\", zookeeper=\"true\", url=\"http://localhost:10020\", vhost=\"jm_staging\", version_malloc_library=\"system jemalloc\", version_ssl_library=\"OpenSSL 1.0.2l 25 May 2017\", wsrep_patch_version=\"wsrep_25.21\", type=\"type_i3_2xlarge\", vault=\"true\", vault=\"true\", type=\"type_c5_large\", vault=\"true\"}" series=68344 level=warn ts=2022-09-06T00:28:17.187858177Z caller=intrumentation.go:67 msg="changing probe status" status=not-ready reason="error executing compaction: compaction: group 0@2976064661857305202: invalid, but reparable block /prometheus/tmp/compact/0@2976064661857305202/01CVKPRHWYEZBZMZ65JWWV1AF1: found 48 chunks outside the block time range introduced by https://github.com/prometheus-junkyard/tsdb/issues/347" level=info ts=2022-09-06T00:28:17.187929581Z caller=http.go:84 service=http/server component=compact msg="internal server is shutting down" err="error executing compaction: compaction: group 0@2976064661857305202: invalid, but reparable block /prometheus/tmp/compact/0@2976064661857305202/01CVKPRHWYEZBZMZ65JWWV1AF1: found 48 chunks outside the block time range introduced by https://github.com/prometheus-junkyard/tsdb/issues/347" level=info ts=2022-09-06T00:28:17.189210412Z caller=http.go:103 service=http/server component=compact msg="internal server is shutdown gracefully" err="error executing compaction: compaction: group 0@2976064661857305202: invalid, but reparable block /prometheus/tmp/compact/0@2976064661857305202/01CVKPRHWYEZBZMZ65JWWV1AF1: found 48 chunks outside the block time range introduced by https://github.com/prometheus-junkyard/tsdb/issues/347" level=info ts=2022-09-06T00:28:17.189271633Z caller=intrumentation.go:81 msg="changing probe status" status=not-healthy reason="error executing compaction: compaction: group 0@2976064661857305202: invalid, but reparable block /prometheus/tmp/compact/0@2976064661857305202/01CVKPRHWYEZBZMZ65JWWV1AF1: found 48 chunks outside the block time range introduced by https://github.com/prometheus-junkyard/tsdb/issues/347" level=error ts=2022-09-06T00:28:17.189477697Z caller=main.go:158 err="group 0@2976064661857305202: invalid, but reparable block /prometheus/tmp/compact/0@2976064661857305202/01CVKPRHWYEZBZMZ65JWWV1AF1: found 48 chunks outside the block time range introduced by https://github.com/prometheus-junkyard/tsdb/issues/347\ncompaction\nmain.runCompact.func7\n\t/app/cmd/thanos/compact.go:424\nmain.runCompact.func8.1\n\t/app/cmd/thanos/compact.go:478\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/app/pkg/runutil/runutil.go:74\nmain.runCompact.func8\n\t/app/cmd/thanos/compact.go:477\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\nerror executing compaction\nmain.runCompact.func8.1\n\t/app/cmd/thanos/compact.go:505\ngithub.com/thanos-io/thanos/pkg/runutil.Repeat\n\t/app/pkg/runutil/runutil.go:74\nmain.runCompact.func8\n\t/app/cmd/thanos/compact.go:477\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\ncompact command failed\nmain.main\n\t/app/cmd/thanos/main.go:158\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"

and them the process terminates with error 1

Anything else we need to know:

@stale
Copy link

stale bot commented Nov 13, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Nov 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant