Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compactor: fails "invalid checksum" #2128

Closed
Zergvl opened this issue Feb 13, 2020 · 6 comments
Closed

compactor: fails "invalid checksum" #2128

Zergvl opened this issue Feb 13, 2020 · 6 comments
Labels

Comments

@Zergvl
Copy link

Zergvl commented Feb 13, 2020

Thanos, Prometheus and Golang version used:
thanos, version 0.10.1
go version: go1.13.1
prometheus, version 2.15.2

Object Storage Provider:
S3 minio
What happened:

thanos compact --wait --log.level=debug --data-dir=/thanos/compact --http-address=x.x.x.x:19191 --objstore.config-file=/opt/thanos/storage.yaml --retention.resolution-raw=21d --retention.resolution-5m=90d --retention.resolution-1h=365d --consistency-delay=30m --block-sync-concurrency=20 --compact.concurrency=5

level=info ts=2020-02-13T05:40:10.336479317Z caller=prober.go:137 msg="changing probe status" status=not-healthy reason="error executing compaction: compaction failed: 3 errors: compaction failed for group 0@3615588222792942986: compact blocks [/thanos/compact/compact/0@3615588222792942986/01E09MMHZ2T5J3ZMTN4R0DJG2J /thanos/compact/compact/0@3615588222792942986/01E09VG972MW8JM3A74SM57BA9 /thanos/compact/compact/0@3615588222792942986/01E0A2C0F21KY74XA6VB43FEXK /thanos/compact/compact/0@3615588222792942986/01E0A97QQ26W046HY0A14DVM9Z]: write compaction: iterate compaction set: chunk 134217715 not found: segment doesn't include enough bytes to read the chunk - required:134217829, available:134217728; compaction failed for group 0@10694505281190012710: compact blocks [/thanos/compact/compact/0@10694505281190012710/01E09MMHXTHJWZTBX4DSVEHWWQ /thanos/compact/compact/0@10694505281190012710/01E09VG95EYKXMHZWFX8DK6MAQ /thanos/compact/compact/0@10694505281190012710/01E0A2C0DGW6BQXW8E9K4CMHZ4 /thanos/compact/compact/0@10694505281190012710/01E0A97QNS22R5Z64FYBR1W27Y]: write compaction: iterate compaction set: chunk 134217302 not found: segment doesn't include enough bytes to read the chunk - required:134218156, available:134217728; compaction failed for group 0@15871345448339717723: gather index issues for block /thanos/compact/compact/0@15871345448339717723/01DW4CKA83QVDRSYT8NBZ5545G: open index file: read TOC: read TOC: invalid checksum"

What you expected to happen:

dont fail

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:

Anything else we need to know:

@bwplotka
Copy link
Member

bwplotka commented Feb 13, 2020

Hi and thanks for report!

We seen those errors before on some object storages and it looks like some corrupted data in the obj store itself. What this says is that one of the chunk files are broken for any of the: /thanos/compact/compact/0@3615588222792942986/01E09MMHZ2T5J3ZMTN4R0DJG2J /thanos/compact/compact/0@3615588222792942986/01E09VG972MW8JM3A74SM57BA9 /thanos/compact/compact/0@3615588222792942986/01E0A2C0F21KY74XA6VB43FEXK /thanos/compact/compact/0@3615588222792942986/01E0A97QQ26W046HY0A14DVM9Z, and any of the thanos/compact/compact/0@10694505281190012710/01E09MMHXTHJWZTBX4DSVEHWWQ /thanos/compact/compact/0@10694505281190012710/01E09VG95EYKXMHZWFX8DK6MAQ /thanos/compact/compact/0@10694505281190012710/01E0A2C0DGW6BQXW8E9K4CMHZ4 /thanos/compact/compact/0@10694505281190012710/01E0A97QNS22R5Z64FYBR1W27Y] and the same with index of 01DW4CKA83QVDRSYT8NBZ5545G.

Is minio distributed or single node? Maybe it's eventually consistent and we need https://thanos.io/proposals/201901-read-write-operations-bucket.md/ (which in progress?)

What is interesting is that all of those groups failed... Something that would be extremely helpful, would be to download any of those blocks and make it available for us for investigation. Would that be possible @Zergvl ? Also cc @krasi-georgiev as it looks our better errors for chunk reading works well (: But it would nice to understand this problem @krasi-georgiev

@krasi-georgiev
Copy link
Contributor

yeah it looks like some partial upload or download, but is is surprising that is for more than a single block.

@Zergvl
Copy link
Author

Zergvl commented Feb 14, 2020

Hi and thanks for report!

Mailed you!
Thanks for answer!

@Zergvl
Copy link
Author

Zergvl commented Feb 18, 2020

Tried to clear storage, and exclude balancer to minio. Problem still happened =( .
so thanos store also have warn errors like that level=warn ts=2020-02-18T06:23:34.445304758Z caller=bucket.go:432 msg="loading block failed" elapsed=46.190744694s id=01DZR8G50S5HRXBXNCDXZBD0WK err="create index header reader: write index cache: open index reader: read TOC: read TOC: invalid checksum"
so i think its prometheus trouble.. but idk what to do with it.

@stale
Copy link

stale bot commented Mar 19, 2020

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

@GiedriusS
Copy link
Member

Should be fixed with #3795.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants