-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is a "granule"? #4414
Comments
Granule is a batch of rows of fixed size which addresses with primary key. Term make sense only for MergeTree* engine family. It can be set with setting
Yes, you understood correctly. This way (sparse index) of indexing is very efficient. Index is very small so it can be placed in memory. Sequential processing of group of small granules is also very fast. |
Thank you for the explanation. Maybe a small piece of text could be added to the documentation like "(a granule is one block of primary key containing I see now how this index can be properly used. It only makes sense when the value being filtered for is very sparse or one needs very fine grained primary keys. As I now understand it, the data skipping index is tied to the primary key. E.g. If I have index_granularity=8192 and GRANULARITY=1, then each 8192 rows, the index contains say the minmax for the Nth primary key. Is there an advantage to tieing the data skipping index to the primary key or would it make sense to make it its own stand-alone index which could have its own granularity defined by rows? If I had a data skipping index with GRANULARITY=4096rows then one could easily compute which primary key the current data skipping index batch belongs to since the number of rows is always fixed. That way one could have a finer grained data skipping index if filtering just by that column. It would also make for easier understanding of the index. |
Correct.
Every column has the .mrk file along with .bin (data) file. These files store "marks" - offsets in data file, that allow to read or skip data for specific granules. These marks have primary key index granularity. If you have different granularity for secondary keys, you either:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
The documentation on data skipping indexes states:
What exactly is a granule? Is it a row?
As a related question: are there plans for an index type similar to btree/hash secondary indexes of traditional RDBMS so a WHERE could efficiently look up rows without needing to be part of a prefix of the primary key or scanning all rows for the given column?
As I understand it, the current data skipping indexes basically allow only to answer the question "does this block of rows contain the value that I am looking for?" instead of "which rows in this block contain the value that I am looking for".
The text was updated successfully, but these errors were encountered: