Skip to content

The accuracy of a Random Forest is dependent upon the accuracy of the individual trees as well as the diversity between them. This is an attempt to add an additional element of diversity through randomness between the trees without sacrificing their accuracy.

xgess/New-Random-Forest-Idea

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Alex's Crazy Random Forest Idea
===============================

Random Forests typically promote diversity between trees by using a number of predictors to 
consider at each node (M) which is less than the total number of predictors (P). M seems to 
range from about 0.3 P to about 0.6 P, but is constant in the entire forest. This 
implementation chooses M from a distribution. 


In an attempt to increase diversity without losing accuracy 
(for each of the trees) this code draws M from a distribution before choosing the predictor 
variables to use at each node. My thought is that this will allow some nodes in each tree to 
be much more accurate than other nodes. Since the trees are grown out completely (i.e. not
pruned) this might make a difference. Tests indicate nothing conclusive. This implementation 
was a little better for some data sets and a little worse for others. All the data and tests 
are in this repo. The professor of the class and I both agree it is worth exploring further 
with more data and a faster implementation perhaps in something like C that can then be 
hooked into R. If I wind up doing that, I'll add a link here. 

The main code (building and traversing of the trees) is in randomForestM.R and the test scripts 
are in the files that begin with overnight. There are comparisons with the standard random forest 
code in those test scripts. 

About

The accuracy of a Random Forest is dependent upon the accuracy of the individual trees as well as the diversity between them. This is an attempt to add an additional element of diversity through randomness between the trees without sacrificing their accuracy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages