-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added proposal for moving to index-header binary format. #1839
Conversation
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
Love the gist of this. I haven't had the chance to do a very thorough review, but I think this is only natural to do. Nice job! |
* Remove building `index-cache.json` in compactor. | ||
Step 2: Load on demand in query time: | ||
* Allow Store Gateway to load `index-header` on demand from disk in query time. | ||
* Add background job to unmmap/unload `index-header` for blocks that are unused (LRU). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just unused but the least recently used blocks if the memory limit is violated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How we would know this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant some kind of limit in terms of their size on disk because it would be really nice to avoid RAM usage blowing up in case of some queries. Having all of it in RAM is horrible, yes, but OTOH it gives the users a relatively certain thing: we know that at least that RAM usage is more or less equal over a period of time due to the retention policies being applied by the Thanos Compact component. Unless I am missing something but it seems that this is true.
|
||
TSDB index is in binary [format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/index.md). | ||
|
||
To allow reduced resource consumption and effort when building (1), (2), "index-header" for blocks we plan to reuse similar format for sections like symbols, label indices and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I have missed this detail but what is the actual format that this proposes? Protobufs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A custom binary format was planned, composed with those sections as mentioned, however, I might find a blocker to this ):
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
9dad9bf
to
dfd21c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all for the review! Updated, with the new info, not a good news ):
|
||
## Risks | ||
|
||
### Posting size to fetch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this which we missed initially in this design. Let me know if it's clear.
TL;DR: we scan postings
form TSDB index when building index-cache
and without changing TSDB index this will be still required building index-header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got some answers (:
* Check if index-cache.json is present in the bucket. If not: | ||
* Download the whole TSDB index file, mmap all of it. | ||
* Build index-cache.json | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silly thing: once this proposal will be displayed as markdown, the else:
won't be formatted the way I guess you want.
* Check if index-cache.json is present on the disk. If not: | ||
* Check if index-cache.json is present in the bucket. If not: | ||
* Download the whole TSDB index file, mmap all of it. | ||
* Build index-cache.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this one, I would also add another point * Delete downloaded TSDB index
to outline the waste of resources to download the TSDB index file just to build the cache (it won't be re-used the fully downloaded TSDB index).
## Goals | ||
|
||
* Reduce confusion between index-cache.json and IndexCache for series and postings. | ||
* Do not keep large pieces of the blocks (e.g symbols, label values) in the memory for all blocks. Be able to load it quickly in query time from disk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in query time
: I guess you mean at query time
. If this comment makes sense, then I would suggest you to look for in query time
around the doc too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do thanks
## No Goals | ||
|
||
* Removing initial startup for Thanos Store Gateway completely as designed in [Cortex, no initial block sync](https://github.com/thanos-io/thanos/issues/1813) | ||
* However this proposal might a step towards that as we might be able to load index-cache/index quickly on demand from disk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might
> might be
The process for building this will be as follows: | ||
|
||
* Thanks to https://github.com/thanos-io/thanos/pull/1792 we can check final size of index and scan for TOC file. | ||
* With TOC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, I think you also need to add a new bullet point at the end for writing the index-header
TOC. I guess you need it, right? Otherwise I'm not sure how you could read back from this file.
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All updated (:
|
||
## Risks | ||
|
||
### Posting size to fetch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got some answers (:
Starting implementation (: cc @GiedriusS @pracucci if you want to take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good to me! Can't wait to see it live!
While idea of combing different pieces of TSDB index as our index-header is great, unfortunately we heavily rely | ||
on information about size of each posting represented as `postingRange.End`. | ||
|
||
We need to know that apriori to know how to partition and how many bytes we need to fetch from the storage to get each posting: https://github.com/thanos-io/thanos/blob/7e11afe64af0c096743a3de8a594616abf52be45/pkg/store/bucket.go#L1567 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to know that apriori to know how to partition
I guess you mean We need to know apriori how to partition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix in next PR (:
Some verifications in TODO, otherwise. Feedback welcome.
Definitely not a silver bullet here, but rather change in a good direction (:
Signed-off-by: Bartek Plotka bwplotka@gmail.com