Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kNN Classification: New meta scores/distances #1244

Closed
etiennedi opened this issue Sep 17, 2020 · 5 comments · Fixed by #1311
Closed

kNN Classification: New meta scores/distances #1244

etiennedi opened this issue Sep 17, 2020 · 5 comments · Fixed by #1311
Assignees

Comments

@etiennedi
Copy link
Member

This issue is meant to discuss new metrics to express how certain Weaviate is that something in a kNN classification was correctly classified.

@michaverhagen could you make a start by describing the ideas you already had? Thanks.

@michaverhagen
Copy link

  • Winner's percentage: for example if k=7 and the 4 of the nearest neighbors have the same attribute (this making it the winning outcome), the winner's percentage would be 57%.
  • Average winner's distance: take the average of the distance between the new concept and each of the (in the example 4) winning neighbors.

Ideally, what I would like to be able to do is the following:

  • Assume we have a fully qualified training set
  • Take a percentage of this training set, un-classify it, re-classify it and compare the new classification with the original one

Then using boundaries for the metrics, we can then have a discussion with customers: if we choose our metric at value X and assume everything with a score above X is classified correctly, then:

  • 80% of the data is assumed to be classified correctly, which means that you would have to manually check 20% of your data. Is that acceptable?
  • 98% of the 80% assumed correctly classified is in fact correct. Can you live with 2% wrongly classified data in your data.

@etiennedi
Copy link
Member Author

etiennedi commented Sep 17, 2020

Re winners percentage
Clear.

Re winning distances
Based on quickliy studying the code base:

It seems that the current winningDistance is already the mean of all winners, so what we don't currently have is the closestDistance. So, I could add that instead.

EDIT: Question was answered via Slack

Micha  3:31 PM
Do you mean closestDistance instead of numberOfNeighborsInWinningGroup (name tbd)?

etienne  3:32 PM
No, the numberOfNeighborsInWinningGroup you’ll get in any case

Micha  3:32 PM
Or do you mean all three:
winningPercentage
winningDistance = average of all winners
closestDistance

3:32
Ah ok perfect

etienne  3:33 PM
I investigated the current implemtation and realized that we already have winningDistance == average of all winners
3:33
exactly, so basically I’m proposing, since we already have winningDistance, we’ll add winningPercentage and closestDistance. Will that help or are we missing anything then?

Micha  3:34 PM
That would help greatly!

@stale
Copy link

stale bot commented Nov 16, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the autoclosed Closed by the bot. We still want this, but it didn't quite make the latest prioritization round label Nov 16, 2020
@michaverhagen
Copy link

This one is still relevant

@stale stale bot removed the autoclosed Closed by the bot. We still want this, but it didn't quite make the latest prioritization round label Nov 16, 2020
@etiennedi
Copy link
Member Author

prioritized for delivery before EOW (November 27).

@etiennedi etiennedi self-assigned this Nov 25, 2020
etiennedi added a commit that referenced this issue Nov 26, 2020
currently only happening at a persistence level, not yet implemented in
the APIs
etiennedi added a commit that referenced this issue Nov 26, 2020
antas-marcin added a commit that referenced this issue Nov 27, 2020
…n-new-distances

classification new distances
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants