Skip to content

Latest commit

 

History

History
37 lines (22 loc) · 2.05 KB

File metadata and controls

37 lines (22 loc) · 2.05 KB

Random Forest

Random Forest is an ensemble supervised learning algorithm and capable of performing both regression and classification tasks. The Random Forest Classifier creates a so-called forest with a number of decision trees as shown below:

How does Random Forest work?

The Random Forest algorithm uses Bagging (Bootstrap Aggregating) for splitting the dataset into chunks so as to fit those with separate decision trees. In case of bagging, we train each decision tree on a different sample of data and also the sampling of dataset happens with replacement. Here, each decision tree acts as a base learner. This results in ensemble learning. The general idea of the ensemble methods is that a combination of learning models increases the overall result.

Random Forest Classifier creates a set of decision trees from the randomly selected subset of a training set. It then aggregates the votes from different decision trees to decide the final class of the test object.

Let’s take an example:

Suppose, we have a training dataset : [X1, X2, X3, … X10] Random forest may create three decision trees taking the input from subset using bagging as shown below:

Finally, it predicts the outcome based on the majority of votes (in case of classification) or aggregation (in case of regression) from each of the decision trees made!

Advantages

  • Random Forest algorithm avoids overfitting
  • The same algorithm can be used for both classification and regression
  • Improves feature engineering by identifying the most important features

Disadvantages of Random Forest

  • Difficult to interpret due to the randomness
  • Computationally expensive compared to a decision tree