RandomForestClassifier

Random Forests

In random forests (see RandomForestClassifier and RandomForestRegressor classes), each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set. In addition, when splitting a node during the construction of the tree, the split that is chosen is no longer the best split among all features. Instead, the split that is picked is the best split among a random subset of the features. As a result of this randomness, the bias of the forest usually slightly increases (with respect to the bias of a single non-random tree) but, due to averaging, its variance also decreases, usually more than compensating for the increase in bias, hence yielding an overall better model. In contrast to the original publication [B2001], the scikit-learn implementation combines classifiers by averaging their probabilistic prediction, instead of letting each classifier vote for a single class. 1.11.2.2. Extremely Randomized Trees

In extremely randomized trees (see ExtraTreesClassifier and ExtraTreesRegressor classes), randomness goes one step further in the way splits are computed. As in random forests, a random subset of candidate features is used, but instead of looking for the most discriminative thresholds, thresholds are drawn at random for each candidate feature and the best of these randomly-generated thresholds is picked as the splitting rule. This usually allows to reduce the variance of the model a bit more, at the expense of a slightly greater increase in bias:

from sklearn.model_selection import cross_val_score from sklearn.datasets import make_blobs from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import ExtraTreesClassifier from sklearn.tree import DecisionTreeClassifier

X, y = make_blobs(n_samples=10000, n_features=10, centers=100, ... random_state=0)

clf = DecisionTreeClassifier(max_depth=None, min_samples_split=2, ... random_state=0) scores = cross_val_score(clf, X, y) scores.mean()
0.97...

clf = RandomForestClassifier(n_estimators=10, max_depth=None, ... min_samples_split=2, random_state=0) scores = cross_val_score(clf, X, y) scores.mean()
0.999...

clf = ExtraTreesClassifier(n_estimators=10, max_depth=None, ... min_samples_split=2, random_state=0) scores = cross_val_score(clf, X, y) scores.mean() > 0.999 True

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
DieTanicRandomForestClassifier.py		DieTanicRandomForestClassifier.py
README.md		README.md
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RandomForestClassifier

About

Releases

Packages

Languages

vikashtiwary118/RandomForestClassifier

Folders and files

Latest commit

History

Repository files navigation

RandomForestClassifier

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages