Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compactor: add sync block metas timeout for block-viewer #4763

Closed
ianwoolf opened this issue Oct 10, 2021 · 2 comments · Fixed by #4764
Closed

compactor: add sync block metas timeout for block-viewer #4763

ianwoolf opened this issue Oct 10, 2021 · 2 comments · Fixed by #4764

Comments

@ianwoolf
Copy link
Contributor

ianwoolf commented Oct 10, 2021

relevant to #3868 #4689

Thanos Version 0.22
Prometheus: 2.27
Go: 1.16.3

Object Storage Provider: minio S3
What happened: Thanos Compact gets many context deadline exceeded when sync block mata from storage.

I have a lot of blocks. The compact often fails, especially when the load is high. logs as follows:

level=info ts=2021-10-09T11:06:00.428302213Z caller=clean.go:33 msg="started cleaning of aborted partial uploads"
level=info ts=2021-10-09T11:06:00.428323703Z caller=clean.go:60 msg="cleaning of aborted partial uploads done"
level=info ts=2021-10-09T11:06:00.428335948Z caller=blocks_cleaner.go:43 msg="started cleaning of blocks marked for deletion"
level=info ts=2021-10-09T11:06:00.428345569Z caller=blocks_cleaner.go:57 msg="cleaning of blocks marked for deletion done"
level=error ts=2021-10-09T11:10:48.404517968Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="incomplete view: 4 errors: meta.json file exists: 01FHEXFYSKP32GKM68YX8GAS0G/meta.json: stat s3 object: Head \"http://xx.xx.xx.xx:9000/thanos/01FHEXFYSKP32GKM68YX8GAS0G/meta.json\": context deadline exceeded; meta.json file exists: 01FGNFR72ZNXSZKJ4G23G532QM/meta.json: stat s3 object: Head \"http://xx.xx.xx.xx:9000/thanos/01FGNFR72ZNXSZKJ4G23G532QM/meta.json\": context deadline exceeded; meta.json file exists: 01FG690731HPETEJWHN6A5SM5A/meta.json: stat s3 object: Head \"http://xx.xx.xx.xx:9000/thanos/01FG690731HPETEJWHN6A5SM5A/meta.json\": context deadline exceeded; meta.json file exists: 01FGNDTQM1SFPKQXG2DT568RK6/meta.json: stat s3 object: Head \"http://xx.xx.xx.xx:9000/thanos/01FGNDTQM1SFPKQXG2DT568RK6/meta.json\": context deadline exceeded"
level=error ts=2021-10-09T11:10:48.425380702Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="incomplete view: 2 errors: meta.json file exists: 01FHJ4FTQN66W64Y3WAVAA1ETM/meta.json: stat s3 object: Head \"http://http://xx.xx.xx.xx:9000/thanos/01FHJ4FTQN66W64Y3WAVAA1ETM/meta.json\": context canceled; meta.json file exists: 01FHHV1BH1EP3Q3SF731NCH235/meta.json: stat s3 object: Head \"http://xx.xx.xx.xx:9000/thanos/01FHHV1BH1EP3Q3SF731NCH235/meta.json\": context canceled"
level=warn ts=2021-10-09T11:10:48.411623431Z caller=intrumentation.go:54 msg="changing probe status" status=not-ready reason="syncing metas: incomplete view: 4 errors: meta.json file exists: 01FHEXFYSKP32GKM68YX8GAS0G/meta.json: stat s3 object: Head \"http://xx.xx.xx.xx:9000/thanos/01FHEXFYSKP32GKM68YX8GAS0G/meta.json\": context deadline exceeded;  ...... ;meta.json file exists: 01FGNDTQM1SFPKQXG2DT568RK6/meta.json: stat s3 object: Head \"http://xx.xx.xx.xx:9000/thanos/01FGNDTQM1SFPKQXG2DT568RK6/meta.json\": context deadline exceeded"
level=info ts=2021-10-09T11:10:48.427451801Z caller=http.go:74 service=http/server component=compact msg="internal server is shutting down" err="syncing metas: incomplete view: 4 errors: meta.json file exists: 01FHEXFYSKP32GKM68YX8GAS0G/meta.json: stat s3 object: Head \"http://xx.xx.xx.xx:9000/thanos/01FHEXFYSKP32GKM68YX8GAS0G/meta.json\": context deadline exceeded; ...... ; meta.json file exists: 01FGNDTQM1SFPKQXG2DT568RK6/meta.json: stat s3 object: Head \"http://xx.xx.xx.xx:9000/thanos/01FGNDTQM1SFPKQXG2DT568RK6/meta.json\": context deadline exceeded"

I found compactor use wait interval as timeout when sync block metas.The --wait-interval is the period between compact, and shouldn't be used as timeout of sync block metas. So i add a param used as timeout of sync block metas.

@ianwoolf
Copy link
Contributor Author

ianwoolf commented Oct 11, 2021

--block-viewer.global.sync-block-interval and --wait-interval is merely a kind of cool-down period between sync block metas. I don't think the parameter --wait-interval can be used as timeout.

But currently --wait-interval is used as timeout of sync block metas, and it does cause problem. So i think it is a bug, and I think many issues are related to this issue. such as #4689 #3966 #3868.

@Venture200
Copy link

Hi @ianwoolf , what's your recommended value for --block-viewer.global.sync-block-interval ? and also --wait-interval and --block-viewer.global.sync-block-interval can't be used in the same configuration ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants