Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long Term Storage Improvements [Tracking Issue] #1705

Closed
3 of 34 tasks
bwplotka opened this issue Nov 1, 2019 · 5 comments
Closed
3 of 34 tasks

Long Term Storage Improvements [Tracking Issue] #1705

bwplotka opened this issue Nov 1, 2019 · 5 comments

Comments

@bwplotka
Copy link
Member

bwplotka commented Nov 1, 2019

This is the "index" issue to help to track issues, initiatives, and ideas to try that might improve the usage of long term storage of metrics for both read and storage part. It currently works, but there are many things we can improve. The goal of this ticket is to be more clear and give some overview of what’s happening and compare other potential improvement ideas! Targetted mostly to contributors who want to help us with some challenging problems.

Overall we want to encourage more collaboration and contributions on this! So please jump on anything interesting and propose new ideas! (:

Let's keep the discussions about each idea in separate GitHub issue, if no GitHub issue is created, please create one and link it here if it relates to the problems we want to solve! I will try to have this issue updated once we progress.

Store Gateway Syncing Blocks.

Things to improve

Overview

Click to see the current logic

The syncing might be the most impactful during startup, especially with empty local store Gateway directory, however, it is also performed every 3m minutes interval (which is configurable by flag). This means that any improvements in syncing will improve both startup and overall baseline memory used.

The main goal of the block sync process is to allow Store Gateway to access the data in the form of blocks from object storage and allow returning it to Querier on Series gRPC request. The process looks as follows:

  • Iterate over all blocks in object storage:
  • For every not seen before block:
    • Download meta.json
      • skip if younger than consistency delay
      • skip if outside of time partition
      • skip if relabelling ignores the block (sharding)
    • Check if index-cache.json is present on the disk. IF not:
      • Download whole index file, mmap all of it (!)
      • Calculates index-cache.json
    • Load whole index-cache to memory
    • Remove non-existing blocks’ index-cache (index-header let’s call it) JSON files from disk.

The mentioned index-cache.json (index-header) holds block’s:

Initiatives/Ideas

  • Switch to different format for disk index-cache issue
  • Work purely on symbols instead of strings, only lookup strings afterwards.
  • Fix regression introduced in v0.8: issue
  • Mmap/load index header to memory on demand
  • Optimize constructing index-cache from index: issue
  • Make index more object storage friendly (:

Querying

Things to improve

Overview

Click to see the current logic

So how the querying works in Thanos? The query is delivered through different components (top down):

  • Optional Cortex response cacher
    • Time align
    • Split by day (!)
    • Caches response
  • Querier
    • Performs PromQL evaluation
    • Fanouts to each StoreAPI
    • Merges all responses per sorted Series by Series
  • Store Gateway
    • Chooses blocks to query based on time, external labels and resolution
    • Per block:
      • Get matching postings in a partitioned way.
      • Get series pointed by those postings in a partitioned way
      • Choose chunks within each series that matches time range within the block
      • Fetch chunks in a partitioned way
    • Merge data in SeriesSet

Important facts:

  • We partition index data fetches in every step to combine multiple object storage GetRange fetches for the same index file into bigger requests to avoid rate limiting. This blocks a bit of streaming of fetching series, postings due to partitioning.

We currently don't cache anything on a disk other than index-cache.json mentioned in the Sync section.

Initiatives/Ideas

  • Store GW:
    • Avoid startup; move to lazy block loading: proposal
    • Time sharding
    • Block sharding: proposal
    • Index Caching in memory:
      • Cache only per block.
      • Evict block’s items from the cache once block does not exist.
      • Use ristretto for caching instead of our naive LRU: PR
      • Shared index across instances e.g with Memcache
    • Disk caching:
  • Rate limit / Limit memory (we have some concurrency limits and sample limits currently)
    • We do that partially but we can be smarter about it e.g bytes/chunks, detect early.
    • Count allocations (roughly) per user/query: proposal
  • StoreAPI: Split gRPC Series stream frames into smaller ones as recommended by gRPC. todo
  • Unify labels: issue
  • Querier
    • Prevent knowingly bad queries
    • Response caching: tracking issue
    • Distributed Queries: issue
    • Does splitting by day make even sense? Maybe split more as discussed here
    • Lax the StoreAPI.Series and merge more on Querier

Testability/Observability

Initiatives/Ideas

Stability/Maintainability

Initiatives/Ideas

  • Improved panic handling in run group: PR
  • Eventual consistency handling between store GW and writers: Proposal, Tracking issue
  • Smooth partial upload logic for Compactor: issue

Downsampling

Things to improve

  • It still causes a bit of confusion e.g the purpose, cost and usage e.g rate[X]

Initiatives/Ideas

  • Different query auto downsampling logic: issue
  • Step based auto downsampling: PR
  • Staleness + downsampling: issue
@bwplotka
Copy link
Member Author

bwplotka commented Nov 1, 2019

Tried and rejected:

* Simple global memory limiter in Querier. Rejected. attempt

@bwplotka bwplotka changed the title Long Term Storage Improvements [Tracking Issues] Long Term Storage Improvements [Tracking Issue] Nov 28, 2019
@stale
Copy link

stale bot commented Jan 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 11, 2020
@GiedriusS
Copy link
Member

Not stale, very much a work in progress.

@stale stale bot removed the stale label Jan 13, 2020
@stale
Copy link

stale bot commented Feb 12, 2020

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

@stale stale bot added the stale label Feb 12, 2020
@bwplotka
Copy link
Member Author

Let's close it. It was super not useful. Lesson learned. (: Milestones and separate issues works much better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants