compact: Introduce smart bucket caching for metadata with TTL; replace fetcher cache with it. #3408

bwplotka · 2020-11-04T16:04:54Z

Action Items:

Make sure it's programmatically configurable what files / dirs to cache.
Currently metadata is cached with two separate mechanisms: Fetcher internal code and cache bucket. Unify this.
For compact we don't use obj store level cache only fetcher one. Unify to bucket cache.
Add reset cache functionality so we can reset certain files e.g markers when we add them. Also, expose this API through HTTP.
Add fuzzing to TTL
Add in-memory bucket caching as well

bwplotka · 2020-11-06T17:29:54Z

Code to bucket caching:

thanos/pkg/store/cache/caching_bucket.go

Line 37 in 843307a

type CachingBucket struct {

Fetcher dir and memory caching

thanos/pkg/block/fetcher.go

Line 149 in 2ad3805

cacheDir string

Sudhar287 · 2020-11-10T17:22:18Z

Does it make sense to divide the tasks you mentioned above and send a PR for each one separately?

bwplotka · 2020-11-12T11:09:14Z

Definitely (:

bwplotka · 2020-11-12T12:27:55Z

I need some clarifications here. Presently when we use baseFetcher, all the meta data is cached to either the run time memory or into a cacheDir Which I understand...
How would I possibly introduce a feature to the users which asks them what directories or files they want to cache?
Do that involves getting a set of block Ids from the user such as 01BKGV7JBM69T2G1BGBGM6KB12 01BKGTZQ1SYQJTR4PB43C8PD98

or both! (:

No, what I mean rather is what KIND of files to cache where.

Currently, we cache the only meta.json per block in mem and optionally on disk (fetcher). Plus if you do bucket caching does its own thing. Currently caching bucket implementation is quite flexible and good, but I think we might want to improve configuring YAML and how it's used in our code.

For example:

impl is in pkg/store/cache - Since we want to use it in fetcher for other components, we might want to move it to pkg/objstore/cache e.g?
Our YAML config is block specific and that's fantastic. However it's missing couple of files it would be nice to cache e.g it has:

// Config for Exists and Get operations for metadata files.
MetafileExistsTTL      time.Duration `yaml:"metafile_exists_ttl"`
MetafileDoesntExistTTL time.Duration `yaml:"metafile_doesnt_exist_ttl"`
MetafileContentTTL     time.Duration `yaml:"metafile_content_ttl"`
MetafileMaxSize        model.Bytes   `yaml:"metafile_max_size"`

Would be nice to specify this configuration per each "metadata" file. (make this more generic, per file name). We could then cache no-compact-mark.json and deletion-mark.json with different TTLs etc (:

I think implementation does not support memory and disk caching backend, we might want to add those.

cc @Sudhar287 and @pstibrany who wrote the caching bucket we want to reuse more (:

pstibrany · 2020-11-12T16:59:02Z

Would be nice to specify this configuration per each "metadata" file. (make this more generic, per file name). We could then cache no-compact-mark.json and deletion-mark.json with different TTLs etc

You risk having too many config options. In Cortex we have these:

	TenantsListTTL         time.Duration `yaml:"tenants_list_ttl"`
	TenantBlocksListTTL    time.Duration `yaml:"tenant_blocks_list_ttl"`
	ChunksListTTL          time.Duration `yaml:"chunks_list_ttl"`
	MetafileExistsTTL      time.Duration `yaml:"metafile_exists_ttl"`
	MetafileDoesntExistTTL time.Duration `yaml:"metafile_doesnt_exist_ttl"`
	MetafileContentTTL     time.Duration `yaml:"metafile_content_ttl"`
	MetafileMaxSize        int           `yaml:"metafile_max_size_bytes"`

bwplotka · 2020-11-12T17:11:28Z

Yes. I would aim to define some defaults first, THEN allow configuring those only if really there is use case, but good point 👍

Sudhar287 · 2020-11-16T01:19:28Z

@bwplotka I'm trying to scope this issue. Can you please elaborate on

Add fuzzing to TTL

Do you mean additional test cases for the time to live functionality?
This is the definition of fuzzing from a quick online search:

Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program

bwplotka · 2020-11-16T17:26:48Z

No I mean cache jitter rather (See: http://neopatel.blogspot.com/2012/04/adding-jitter-to-cache-layer.html (point no 2))

The problem is that sync is done once and then all metadata are cached in the same time. With no jitter cache TTL (expiry) we revoke all blocks in the same time, so we redownload all in one go introducing spike in bucket APIs and potential slow downs or even rate limits. If we add some randomness to TTL it will allow us to potentially uniform cache revokes and cache refresh

stale · 2021-01-16T09:45:59Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2021-01-30T09:58:32Z