The accuracy of a Random Forest is dependent upon the accuracy of the individual trees as well as the diversity between them. This is an attempt to add an additional element of diversity through randomness between the trees without sacrificing their accuracy.
xgess/New-Random-Forest-Idea
master
Name already in use
Code
-
Clone
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more about the CLI.
- Open with GitHub Desktop
- Download ZIP
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Alex's Crazy Random Forest Idea =============================== Random Forests typically promote diversity between trees by using a number of predictors to consider at each node (M) which is less than the total number of predictors (P). M seems to range from about 0.3 P to about 0.6 P, but is constant in the entire forest. This implementation chooses M from a distribution. In an attempt to increase diversity without losing accuracy (for each of the trees) this code draws M from a distribution before choosing the predictor variables to use at each node. My thought is that this will allow some nodes in each tree to be much more accurate than other nodes. Since the trees are grown out completely (i.e. not pruned) this might make a difference. Tests indicate nothing conclusive. This implementation was a little better for some data sets and a little worse for others. All the data and tests are in this repo. The professor of the class and I both agree it is worth exploring further with more data and a faster implementation perhaps in something like C that can then be hooked into R. If I wind up doing that, I'll add a link here. The main code (building and traversing of the trees) is in randomForestM.R and the test scripts are in the files that begin with overnight. There are comparisons with the standard random forest code in those test scripts.
About
The accuracy of a Random Forest is dependent upon the accuracy of the individual trees as well as the diversity between them. This is an attempt to add an additional element of diversity through randomness between the trees without sacrificing their accuracy.