Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce index:dcount() function #6080

Open
Korablev77 opened this issue May 19, 2021 · 4 comments
Open

Introduce index:dcount() function #6080

Korablev77 opened this issue May 19, 2021 · 4 comments
Labels
feature A new functionality in design Requires design document

Comments

@Korablev77
Copy link
Contributor

Korablev77 commented May 19, 2021

Currently one of mainstream algorithms is HyperLogLog which makes it possible to approximately count the number of distinct elements using almost no memory overhead. I suggest to implement index method :dcount() (d stands for distinct) which would apply HyperLogLog and return cardinality of index keys (obviously it makes sense only for non-unique indexes). I suppose it might be useful for some customers.

@Korablev77 Korablev77 added the feature A new functionality label May 19, 2021
@unera
Copy link
Collaborator

unera commented Aug 11, 2021

I think that new index method - is wrong idea, because we have already had :count method , that could be modified:

box.space.tester.index.primary:count('Alpha!', { iterator = 'LE', distinct = true })
box.space.tester.index.primary:count('Alpha!', { distinct = true })  -- iterator = 'ALL'

@unera unera added good first issue Good for newcomers teamC and removed teamP labels Aug 11, 2021
@Gerold103
Copy link
Collaborator

You seem to misunderstand the ticket. It is not distinct count. It is approximate distinct count. Using a flag distinct would be misleading, because it won't return an entirely correct value at least sometimes.

@kyukhin kyukhin added this to the wishlist milestone Aug 19, 2021
@alyapunov alyapunov assigned AnastasMIPT and unassigned unera Sep 27, 2021
@AnastasMIPT AnastasMIPT added in design Requires design document and removed good first issue Good for newcomers labels Sep 29, 2021
@alyapunov alyapunov added the 5sp label Oct 21, 2021
@unera
Copy link
Collaborator

unera commented Oct 22, 2021

Ok, change disticnt=true to approximate=true, disticnt=true

My points:

  1. Don't add new methods
  2. Add options
  3. I don't know if the index is useful for anyone.

Looks good to do if we have no other tasks.

@no1seman
Copy link

approximate, disticnt, doesn't matter... if this function has the same as Redis inplementation accurancy - seems that this will be enougth, but, why not to make this function a part of LUA API to make it possible to use it for tables as well as for indexes?

@kyukhin kyukhin removed this from the wishlist milestone May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality in design Requires design document
Projects
None yet
Development

No branches or pull requests

8 participants