Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Slow order by primary key with small limit on big data #1344
Hello. I have such table:
It has 4,3 billion records:
When I just select with small limit, everything works perfect:
But it returns first five logs, but what if I want to get last five logs? I do this by ordering by primary key:
It starts processing all 4.3 billion rows, so it's slow.
Even if I use WHERE to reduce number of rows it's not so fast:
5 seconds on selecting 5 rows from 45 million rows when order by primary key looks slow.
Is this OK? What am I doing wrong in getting last N rows (by primary key)? Or maybe table is configured wrong?
PS. Versions is:
ClickHouse don't have optimization which allows skipping rows while reading first (or last) N rows ordered by primary key. So, results you see are expected.
Even when you need to read first N rows, ordered by primary key, it's necessary to merge data from different parts, which is full scan if your query doesn't have limitation on date column or primary key.
However, it's possible to implement more efficiently: read first (or last) N rows from each block. (Also, there are some other technical details).
If you store (structured) logs, it's a frequent need to see "last logs". In this scenario, table primary key is timestamp (or, timestamp + nanoseconds, as shown above).
Time-to-time it could be "last logs" without any filter, or filtering by couple columns. But it always last logs. By last I mean last 500-1000 results, it's usually enough.
Will this optimisation be implemented in some near future? Or maybe you could suggest some workaround for this case?
If it's possible in clichouse internals to just get last N elements without full scan, maybe there shoud be special construction like
Just tried, this limit-with-offset solution also does not work, it does full scan:
This is planned to do after more fundamental modifications in query pipeline.
referenced this issue
Sep 20, 2018
referenced this issue
Oct 29, 2018
Yes, that will be a great feature, @alexey-milovidov, could you share when it will be available.
Meanwhile I came up with the following workaround:
It may require an iterative process -