Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact: introduce flag --wait-sync-block-duration #2752

Merged

Conversation

chrischdi
Copy link
Contributor

@chrischdi chrischdi commented Jun 10, 2020

Thanos compact currently does metadata sync every minute if --wait is
active. Thanos store has a equivalent flag named sync-block-duration.
This commit introduces --wait-sync-block-duration to be able to
configure the interval via a flag.

Resolves #2642

Signed-off-by: Schlotter, Christian christi.schlotter@gmail.com

  • I added CHANGELOG entry for this change.

Changes

  • added flag --wait-sync-block-duration to compact command
  • used value of flag instead of hardcoded time.Minute

Verification

Run it in production

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, this is important fix I agree.

BTW what do you think of something else - some logic of 0 were you disable syncing totally.

  • It would be only refreshed when refresh bucket UI page, WDYT?

In the mean time we can merge this 👍 one suggestion only.

Thanks! ❤️

@@ -475,6 +476,8 @@ func (cc *compactConfig) registerFlag(cmd *kingpin.CmdClause) *compactConfig {
Short('w').BoolVar(&cc.wait)
cmd.Flag("wait-interval", "Wait interval between consecutive compaction runs and bucket refreshes. Only works when --wait flag specified.").
Default("5m").DurationVar(&cc.waitInterval)
cmd.Flag("wait-sync-block-interval", "Repeat interval for syncing the blocks between local and remote view.").
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good but I think we have to name it better to show it's related to the UI.

What about something like:

Suggested change
cmd.Flag("wait-sync-block-interval", "Repeat interval for syncing the blocks between local and remote view.").
cmd.Flag("block-viewer.global.sync-block-interval", "Repeat interval for syncing the blocks between local and remote view for /global Block Viewer UI.").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will fix that and also fix the changelog conflict :-) (will have to rebuild the docs anyway)

@chrischdi
Copy link
Contributor Author

Awesome, this is important fix I agree.

BTW what do you think of something else - some logic of 0 were you disable syncing totally.

  • It would be only refreshed when refresh bucket UI page, WDYT?

In the mean time we can merge this one suggestion only.

Thanks!

Sounds also reasonable to me.
But maybe it would also need some logic to not re-execute syncing on every request maybe (use-case: opening the page multiple times by accident).

Maybe it would make more sense to allow using the cached index for the bucket UI?

@bwplotka
Copy link
Member

Maybe it would make more sense to allow using the cached index for the bucket UI?

Yes, it's must have - in separate PR (:

BTW @pstibrany do we have to cache for Cortex compactor? I guess yes?

@pstibrany
Copy link
Contributor

BTW @pstibrany do we have to cache for Cortex compactor? I guess yes?

In Cortex we don't use cache for compactor. Our default interval for running compactor is one hour, so it scans bucket only once per hour.

While we could use cache that is shared by queriers/store-gateways, but since scans are so infrequent, we decided that compactor having the correct picture of the world is more important.

@chrischdi chrischdi force-pushed the compact/add-wait-sync-block-duration-flag branch from d681877 to b38c6cd Compare June 19, 2020 05:42
@pracucci
Copy link
Contributor

BTW @pstibrany do we have to cache for Cortex compactor? I guess yes?

In Cortex we don't use cache for compactor. Our default interval for running compactor is one hour, so it scans bucket only once per hour.

While we could use cache that is shared by queriers/store-gateways, but since scans are so infrequent, we decided that compactor having the correct picture of the world is more important.

I'm not sure it's safe. The compactor should always have the real state of the world bucket.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (modulo a nit)

CHANGELOG.md Outdated Show resolved Hide resolved
@chrischdi
Copy link
Contributor Author

BTW @pstibrany do we have to cache for Cortex compactor? I guess yes?

In Cortex we don't use cache for compactor. Our default interval for running compactor is one hour, so it scans bucket only once per hour.
While we could use cache that is shared by queriers/store-gateways, but since scans are so infrequent, we decided that compactor having the correct picture of the world is more important.

I'm not sure it's safe. The compactor should always have the real state of the world bucket.

The compactor yes but the bucket web UI may not need to have that?!

@pracucci
Copy link
Contributor

BTW @pstibrany do we have to cache for Cortex compactor? I guess yes?

In Cortex we don't use cache for compactor. Our default interval for running compactor is one hour, so it scans bucket only once per hour.
While we could use cache that is shared by queriers/store-gateways, but since scans are so infrequent, we decided that compactor having the correct picture of the world is more important.

I'm not sure it's safe. The compactor should always have the real state of the world bucket.

The compactor yes but the bucket web UI may not need to have that?!

The UI can use as far as you accept to not see the latest state of the bucket. I personally would prefer to have the UI to always reflect the latest state, but I'm open to discuss it.

@chrischdi chrischdi force-pushed the compact/add-wait-sync-block-duration-flag branch from 1a26214 to bc9a7a0 Compare June 20, 2020 07:35
@chrischdi
Copy link
Contributor Author

Hi @pracucci / @bwplotka ,

I also fixed the nit :-)

Could it be that the thanos tests failed due to some flakyness? I thought they had been green before merging that nit.

Thanos compact currently does metadata sync every minute if `--wait` is
active. Thanos store has a equivalent flag named `sync-block-duration`.
This commit introduces `--block-viewer.global.sync-block-interval` to be able to
configure the interval via a flag.

Signed-off-by: Schlotter, Christian <christi.schlotter@gmail.com>
@chrischdi chrischdi force-pushed the compact/add-wait-sync-block-duration-flag branch from bc9a7a0 to 1011c4b Compare June 26, 2020 05:16
@chrischdi
Copy link
Contributor Author

Rebased to fix CHANGELOG.md conflict

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks! 👍

@@ -482,6 +483,8 @@ func (cc *compactConfig) registerFlag(cmd *kingpin.CmdClause) *compactConfig {

cmd.Flag("block-sync-concurrency", "Number of goroutines to use when syncing block metadata from object storage.").
Default("20").IntVar(&cc.blockSyncConcurrency)
cmd.Flag("block-viewer.global.sync-block-interval", "Repeat interval for syncing the blocks between local and remote view for /global Block Viewer UI.").
Default("1m").DurationVar(&cc.blockViewerSyncBlockInterval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is causing problems, why not reducing this to a minimum? 🤔 Anyway we can think in later PRs!

@bwplotka bwplotka merged commit a7bb287 into thanos-io:master Jun 26, 2020
@chrischdi chrischdi deleted the compact/add-wait-sync-block-duration-flag branch June 26, 2020 09:56
paulfantom added a commit to paulfantom/thanos that referenced this pull request Jul 9, 2020
* upstream/release-0.14: (46 commits)
  Cut release v0.14.0-rc.1 (thanos-io#2853)
  Query: correctly marshal errors to JSON and ignore if nil (thanos-io#2848)
  ci: Manually download promu in crossbuild stage (thanos-io#2828)
  Cut release v0.14.0-rc.0 (thanos-io#2826)
  Soft cut changelog on master to indicate v0.14.0 being in progress (thanos-io#2824)
  Update ThanosReceiveNoUpload to select sum == 0 (thanos-io#2819)
  receive: Added more observability, fixed leaktest, to actually check leaks ): (thanos-io#2817)
  Query: always return a string in the `lastError` field (thanos-io#2809)
  Added missing CHANGELOG entry for PR 2613 (thanos-io#2820)
  receive: Fixed small options race; Removed unused StartTime feature. (thanos-io#2816)
  go.mod: Bump Prometheus to current latest (thanos-io#2814)
  Implement CLI Flags page in React UI (thanos-io#2796)
  Improve ThanosReceiveNoUpload to only alert on current instances
  store: Preallocate output buffer when encoding postings. (thanos-io#2812)
  compact: introduce flag --block-viewer.global.sync-block-interval (thanos-io#2752)
  docs: compact: add blurb about how retention policy works (thanos-io#2808)
  Reduced memory allocations in readIndexRange() (thanos-io#2807)
  ui: Add Stores page to React UI (thanos-io#2754)
  Added Kemal to Maintainer Role; Kemal is volounteering to be next release shephard (thanos-io#2804)
  proposal: Add scalable rule storage proposal (thanos-io#2661)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

compact: --wait results in syncing blocks every minute for global view
4 participants