-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate Synthetic Data in Clusters #66
Generate Synthetic Data in Clusters #66
Conversation
Pull Request Test Coverage Report for Build 814
💛 - Coveralls |
@yzhao062 |
I think it complains about the coverage change. You would want to add some testcases in https://github.com/yzhao062/pyod/blob/master/pyod/test/test_data.py . you could take a look first (mainly check the generated data shape, percentage of outlier etc.) :) |
@yzhao062 |
I will look into this shortly and try to understand how this new function works. Do not worry about the code coverage. I will write a test/coverage function if needed. If possible, could you give a short description of how this data generation algorithm works? This will be very helpful for code review. Thanks a lot for the contribution. |
Hi Yue, @yzhao062 It generates one (or many) clusters of data points with different/same sizes and densities based on the user's choice passed by the parameters. As per previously mentioned, having different clusters of data with different sizes and densities makes outliers detection challengeable especially for those type of algorithms that based of k-nearest neighbors such as |
opening a new pull request to avoid conflicts 15/04/2019 |
All Submissions Basics:
#65
All Submissions Cores: