New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a method to estimate the number of rows in the index #769
Comments
I can add a function like this. I'm just dubious as to how much faster it'll be than zdb.count(). |
@eeeebbbbrrrr, without a real necessity, I wouldn't ask for it... Look for yourself:
The difference is x238 times! Obviously, the point is that Mostly, I need this feature just to detect in advance that a user is going to make a very frequent query, i.e. with extremely low selectivity (then I can turn off the scoring of results in this case, hehe). There are other uses as well. |
@eeeebbbbrrrr, hello! It would be great if we could get back to this issue. |
@eeeebbbbrrrr - I am willing to take this up over the weekend/next week if it's fine, as this would be pretty useful in our case as well. Will try to align it with other functions that are part of ZomboDB. |
I'd be happy to review and merge it. |
@mwieczorkiewicz, hello! Any updates? |
ZomboDB version: 3000.0.12
Postgres version: 14.x
Elasticsearch version: 8.3
This issue has already been mentioned in a neighboring issue (by my mistake). Again:
So, I need extremely fast mechanism (much faster than
zdb.count(...)
function) to estimate whether a query in ZDB will find more than 10'000 rows or not (don't be scared of this constant - it's inherent in the ES, I'll tell you about it next).Directly in ES this is quite simple: you need to send
_search
query with size equal to0
. Like this:Example response (insignificant data removed):
In short, ES in this case returns either the exact count in
hits.total.value
or a constant of 10'000 if the number of rows found >= 10'000. Basically, it's like_count
(akazdb.count(...)
), only the counter stops when it finds 10'000 rows.The 10'000 constant is actually the default value for the
track_total_hits
parameter (described here).Thus, the main difference from
_count
is that this method of estimating the number of rows works immediately on any number of rows in store.At the same time, if I try to make a query in ZDB with
size
equal to0
, we come across this behavior:Consequently, the problem:
hits.total.value
from the request result?track_total_hits parameter
(to change value 10'000 to any other)? I couldn't find anything about it in ZDB.By the way, this whole problem can be solved as follows: perhaps a function should be added to aggregate functions that returns the estimated number of rows in the request (according to the method I described) ?
It seems to me that it would be logical and useful for many to have a special function for this. Especially for those who work with a very large collection of data. The definition of the function could be this:
The text was updated successfully, but these errors were encountered: