Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data with noise class #265

Open
gusnunes opened this issue Sep 20, 2022 · 0 comments
Open

Data with noise class #265

gusnunes opened this issue Sep 20, 2022 · 0 comments

Comments

@gusnunes
Copy link

Using "RandomRBFGeneratorEvents" to clustering the data I realized that when the stream has noise in it, the calculation of Purity, for example, is wrong. It happens because in MembershipMatrix, the "classmap" doens't contain the key "-1" that maps the noise label to the last "workcluster" index, instead of that, the noise label key is the number of clusters and it could be mapped to any "workcluster".
The line 52 of F1 measure is useless because "mm.hasNoiseClass()" always return false and the number of classes will be the same.

For example, a cluster has 2 instances of a real class and 5 noise instances
The current implementation would calculate that group purity is the value (5/7), because the noise index it's not ignored in "mm.getClusterClassWeight()" during the "for loop". Furthermore this also happens when the group contains only noise instances, wich is completely equivocaded.

@gusnunes gusnunes changed the title Data with noise label Data with noise class Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant