Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact: do not cleanup blocks on boot #3532

Merged
merged 1 commit into from Dec 5, 2020

Conversation

GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented Dec 2, 2020

Do not clean up blocks on boot because in some very bad cases there could
be thousands of blocks ready-to-be deleted and doing that makes Thanos
Compact exceed initialDelaySeconds on k8s.

Signed-off-by: Giedrius Statkevičius giedriuswork@gmail.com

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  • Removed cleanup on boot; added another metric which shows how many loops were performed. Used that metric in tests.

Verification

Updated e2e tests that pass.

Fixes #3395.

@GiedriusS GiedriusS force-pushed the remove_initiaL_cleanup branch 2 times, most recently from eb3ed62 to cf9ef37 Compare December 2, 2020 21:58
@GiedriusS GiedriusS marked this pull request as ready for review December 2, 2020 21:59
@kesor
Copy link

kesor commented Dec 3, 2020

This doesn't look like it addresses #3395 correctly. What should actually happen, is the HTTP socket should become open at the start of the process. And when either the cleanup, or compaction, or whatever starts failing the health check needs to start failing - or maybe the process might die because of errors. If there was a good reason to add cleanup at the start of the process, why remove it just to make the HTTP socket available?

@GiedriusS
Copy link
Member Author

This doesn't look like it addresses #3395 correctly. What should actually happen, is the HTTP socket should become open at the start of the process. And when either the cleanup, or compaction, or whatever starts failing the health check needs to start failing - or maybe the process might die because of errors. If there was a good reason to add cleanup at the start of the process, why remove it just to make the HTTP socket available?

That's true on a very pedantic level but syncing the metas + doing an initial cleanup potentially can take a lot of time because some people had many left-over blocks. That's where the complaints were coming from. With these changes now we barely do anything on boot before enabling the HTTP socket so it should be seamless just like before.

@metalmatze
Copy link
Member

It'd be great if we can merge this into release-0.17 first and then add it to v0.17.2. I'll merge it back to master afterwards.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This simplifies a lot things, thanks 💪

@bwplotka bwplotka changed the base branch from master to release-0.17 December 3, 2020 16:53
@bwplotka
Copy link
Member

bwplotka commented Dec 3, 2020

I based this branch to release-0.17 but looks like we need to rebase git commit as well 🤗 can you help @GiedriusS

Once this is merged we can do release 0.17.2

Do not cleanup blocks on boot because in some very bad cases there could
be thousands of blocks ready-to-be deleted and doing that makes Thanos
Compact exceed `initialDelaySeconds` on k8s.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
@GiedriusS
Copy link
Member Author

@metalmatze @bwplotka I have rebased this PR, PTAL.

@bwplotka bwplotka merged commit d57813b into thanos-io:release-0.17 Dec 5, 2020
metalmatze pushed a commit to metalmatze/thanos that referenced this pull request Dec 9, 2020
Do not cleanup blocks on boot because in some very bad cases there could
be thousands of blocks ready-to-be deleted and doing that makes Thanos
Compact exceed `initialDelaySeconds` on k8s.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
bwplotka added a commit that referenced this pull request Dec 9, 2020
* Fix query frontend regression on release v0.17.0 (#3480)

* query-frontend: make POST-request to downstream url for labels and series api endpoints (#3444)

Signed-off-by: Alexander Tunik <2braven@gmail.com>

* remove default response cache config

Signed-off-by: Ben Ye <yb532204897@gmail.com>

* ensure order when merging multiple responses

Signed-off-by: Ben Ye <yb532204897@gmail.com>

Co-authored-by: Alexander Tunik <2braven@gmail.com>

* *: Set debug.SetPanicOnFault(true) so we can recover seg faults. (#3498)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Prepare v0.17.1 release (#3505)

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* fix index out of bound bug when comparing ZLabelSets (#3520)

* fix index out of bound bug when comparing ZLabelSets

Signed-off-by: Ben Ye <yb532204897@gmail.com>

* fix param parsing error message

Signed-off-by: Ben Ye <yb532204897@gmail.com>

* address comment feedbacks

Signed-off-by: Ben Ye <yb532204897@gmail.com>

* compact: do not cleanup blocks on boot (#3532)

Do not cleanup blocks on boot because in some very bad cases there could
be thousands of blocks ready-to-be deleted and doing that makes Thanos
Compact exceed `initialDelaySeconds` on k8s.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* Prepare v0.17.2 (#3543)

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Properly rebase CHANGELOG.md after merging release-0.17

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

Co-authored-by: Ben Ye <yb532204897@gmail.com>
Co-authored-by: Alexander Tunik <2braven@gmail.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compactor: HTTP socket is only opened after initial cleanup of compacted blocks is complete
4 participants