New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of WHERE vs PREWHERE when selecting by primary key #2601
Comments
Another example. |
Table has an index (tags_id, created_at), 100M rows. Prewhere on index columns improves performance 3 times.
|
I checked the trace, everything is the same, the only difference is in query pipeline that has 'Filter' stage in case with 'where': where, slow:
prewhere, fast:
|
…RE even by indexed columns #2601
This is obsolete statement. I have removed it from the docs. Sometimes it does make sense to use PREWHERE even for indexed columns. Then PREWHERE will read columns specified in PREWHERE conditions, filter them by this condition. And it can possibly narrow down ranges to be read by cutting their "tail". Then other columns will be read for narrower ranges that can span less than index_granularity rows. It will help especially if other columns are heavy enough. (possibility to read ranges less than one granule was implemented about one year ago in #903)
Actually the case when PREWHERE helps for indexed fields is not frequent. If we will allow to move conditions on indexed fields to PREWHERE, they will be almost always selected for PREWHERE, because these fields are usually high compressed. But it will usually make less profit than moving other columns. We will leave current behaviour as is before we can implement some smarter solution.
Your test case also have some complications, because amount of filtered data is not accounted correctly for |
Cool. :) Didn't know that.
I see. So generally it will lead to read of PK for all the granules, and if PK matches a lot of granules it will do a lot of extra work.
That have sense. Probably possible optimization is something like check PREWHERE for those granules which match PK partially. (i.e. if
In those cases when PREWHERE is more efficient than WHERE the biggest difference is the size of data read (i was checking in server logs / querylog). So that's why i've used |
According to docs: "Keep in mind that it does not make much sense for PREWHERE to only specify those columns that have an index, because when using an index, only the data blocks that match the index are read."
In practice there is a significant difference between WHERE and PREWHERE when selecting by PK. If PK conditions is inside WHERE Clickhouse read much more data and responses much slower that in case with PREWHERE.
If it's expected behaviour when why optimize_move_to_prewhere skips PK fields?
Test case:
The text was updated successfully, but these errors were encountered: