-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update step_nearmiss.R #109
Conversation
Fixed a few typos. Also, a quick question: does `step_nearmiss()` provide three different versions of near miss like this [link](https://machinelearningmastery.com/undersampling-algorithms-for-imbalanced-classification/)? Thanks!
Thank you for the PR! This step only does what is called NearMiss-1 in that article. If you would like to see the other methods, feel free to add them in an issue so I can keep track, once I come around for more {themis} development |
Thank you. I still have two questions: 1) When using |
so If there are nominal variables it will error.
|
Thank you. The Tomek link is interesting. Based on the two plots shown, it seems like this method does not downsample the majority to the point where the classes are balanced. As you can see from the plot attached by you, there are still far more majority cases (Rest dots) than minority case (Circle dots). Could you help me understand this more? I am confused why the class ratio isn't 1 after using this method. |
Also, another follow-up question if you don't mind. Don't we just keep all the minority class observations and remove the majority class only? Why do these two points get removed at the same time? |
A tomek link is a pair of observations where they are different classes and nearest neighbors. This method then removes the whole link. |
The method could have been modified to only remove in the majority class, but right now it follows the literature (as far as I can tell) |
Tomek link removal is not as much about balancing, as it is removing "troublesome" observations |
If you find any particular documentation unclear please open an issue. :) |
Thanks for your clarification. When you check this link, you actually put |
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Fixed a few typos.
Also, a quick question: does
step_nearmiss()
provide three different versions of near miss like this link? Thanks!