Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact: Introduce smart bucket caching for metadata with TTL; replace fetcher cache with it. #3408

Closed
bwplotka opened this issue Nov 4, 2020 · 17 comments

Comments

@bwplotka
Copy link
Member

bwplotka commented Nov 4, 2020

Action Items:

  1. Make sure it's programmatically configurable what files / dirs to cache.
  2. Currently metadata is cached with two separate mechanisms: Fetcher internal code and cache bucket. Unify this.
  3. For compact we don't use obj store level cache only fetcher one. Unify to bucket cache.
  4. Add reset cache functionality so we can reset certain files e.g markers when we add them. Also, expose this API through HTTP.
  5. Add fuzzing to TTL
  6. Add in-memory bucket caching as well
@bwplotka
Copy link
Member Author

bwplotka commented Nov 6, 2020

Code to bucket caching:

type CachingBucket struct {

Fetcher dir and memory caching

cacheDir string

@Sudhar287
Copy link
Contributor

Does it make sense to divide the tasks you mentioned above and send a PR for each one separately?

@bwplotka
Copy link
Member Author

Definitely (:

@bwplotka
Copy link
Member Author

bwplotka commented Nov 12, 2020

I need some clarifications here. Presently when we use baseFetcher, all the meta data is cached to either the run time memory or into a cacheDir Which I understand...
How would I possibly introduce a feature to the users which asks them what directories or files they want to cache?
Do that involves getting a set of block Ids from the user such as 01BKGV7JBM69T2G1BGBGM6KB12 01BKGTZQ1SYQJTR4PB43C8PD98

or both! (:

No, what I mean rather is what KIND of files to cache where.

Currently, we cache the only meta.json per block in mem and optionally on disk (fetcher). Plus if you do bucket caching does its own thing. Currently caching bucket implementation is quite flexible and good, but I think we might want to improve configuring YAML and how it's used in our code.

For example:

  • impl is in pkg/store/cache - Since we want to use it in fetcher for other components, we might want to move it to pkg/objstore/cache e.g?
  • Our YAML config is block specific and that's fantastic. However it's missing couple of files it would be nice to cache e.g it has:
// Config for Exists and Get operations for metadata files.
MetafileExistsTTL      time.Duration `yaml:"metafile_exists_ttl"`
MetafileDoesntExistTTL time.Duration `yaml:"metafile_doesnt_exist_ttl"`
MetafileContentTTL     time.Duration `yaml:"metafile_content_ttl"`
MetafileMaxSize        model.Bytes   `yaml:"metafile_max_size"`

Would be nice to specify this configuration per each "metadata" file. (make this more generic, per file name). We could then cache no-compact-mark.json and deletion-mark.json with different TTLs etc (:

  • I think implementation does not support memory and disk caching backend, we might want to add those.

cc @Sudhar287 and @pstibrany who wrote the caching bucket we want to reuse more (:

@pstibrany
Copy link
Contributor

Would be nice to specify this configuration per each "metadata" file. (make this more generic, per file name). We could then cache no-compact-mark.json and deletion-mark.json with different TTLs etc

You risk having too many config options. In Cortex we have these:

	TenantsListTTL         time.Duration `yaml:"tenants_list_ttl"`
	TenantBlocksListTTL    time.Duration `yaml:"tenant_blocks_list_ttl"`
	ChunksListTTL          time.Duration `yaml:"chunks_list_ttl"`
	MetafileExistsTTL      time.Duration `yaml:"metafile_exists_ttl"`
	MetafileDoesntExistTTL time.Duration `yaml:"metafile_doesnt_exist_ttl"`
	MetafileContentTTL     time.Duration `yaml:"metafile_content_ttl"`
	MetafileMaxSize        int           `yaml:"metafile_max_size_bytes"`

@bwplotka
Copy link
Member Author

Yes. I would aim to define some defaults first, THEN allow configuring those only if really there is use case, but good point 👍

@Sudhar287
Copy link
Contributor

Sudhar287 commented Nov 16, 2020

@bwplotka I'm trying to scope this issue. Can you please elaborate on

Add fuzzing to TTL

Do you mean additional test cases for the time to live functionality?
This is the definition of fuzzing from a quick online search:

Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program

@bwplotka
Copy link
Member Author

No I mean cache jitter rather (See: http://neopatel.blogspot.com/2012/04/adding-jitter-to-cache-layer.html (point no 2))

The problem is that sync is done once and then all metadata are cached in the same time. With no jitter cache TTL (expiry) we revoke all blocks in the same time, so we redownload all in one go introducing spike in bucket APIs and potential slow downs or even rate limits. If we add some randomness to TTL it will allow us to potentially uniform cache revokes and cache refresh

@stale
Copy link

stale bot commented Jan 16, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jan 16, 2021
@stale
Copy link

stale bot commented Jan 30, 2021

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Jan 30, 2021
@Sudhar287
Copy link
Contributor

This can be reopened as point 5 Add in-memory bucket caching as well is still in PR.

@yeya24 yeya24 reopened this Mar 20, 2021
@stale stale bot removed the stale label Mar 20, 2021
@stale
Copy link

stale bot commented Jun 2, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jun 2, 2021
@stale
Copy link

stale bot commented Jun 16, 2021

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Jun 16, 2021
@yeya24 yeya24 reopened this Oct 2, 2021
@stale
Copy link

stale bot commented Oct 30, 2021

Closing for now as promised, let us know if you need this to be reopened! 🤗

1 similar comment
@stale
Copy link

stale bot commented Jan 9, 2022

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Jan 9, 2022
@yeya24 yeya24 reopened this Jan 10, 2022
@stale stale bot removed the stale label Jan 10, 2022
@stale
Copy link

stale bot commented Apr 17, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Apr 17, 2022
@stale
Copy link

stale bot commented May 1, 2022

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed May 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants