You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+30-2
Original file line number
Diff line number
Diff line change
@@ -41,8 +41,8 @@ Information Gain-
41
41
42
42
Criteria to measure the impurity of a node I(node):
43
43
1. Variance (Regression) [Variance reduction of a node N is defined as the total reduction of the variance of the target variable x due to the split at this node]
44
-
2. Gini impurity (Classification) [Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset]
45
-
3. Entropy (Classification) [Information entropy is the average rate at which information is produced by a stochastic source of data]
44
+
2. Gini impurity (Classification) [Measure of impurity. Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset]
45
+
3. Entropy (Classification) [Measure of purity. Information entropy is the average rate at which information is produced by a stochastic source of data]
46
46
47
47
Note
48
48
- Most of the time, the gini index and entropy lead to the same results.
@@ -191,3 +191,31 @@ Feature Importance
191
191
- In sklearn :
192
192
- how much the tree nodes use a particular feature (weighted average) to reduce impurity
193
193
- accessed using the attribute feature_importance_
194
+
195
+
196
+
Part 4 BOOSTING
197
+
198
+
- Boosting refers to an ensemble method in which several models are trained sequentially with each model learning from the errors of its predecessors.
199
+
- Boosting: Ensemble method combining several weak learners to form a strong learner.
200
+
- Weak learner: Model doing slightly better than random guessing.
201
+
- Example of weak learner: Decision stump (CART whose maximum depth is 1).
202
+
- Train an ensemble of predictors sequentially.
203
+
- Each predictor tries to correct its predecessor.
204
+
- Most popular boosting methods: AdaBoost, Gradient Boosting.
205
+
206
+
Adaboost
207
+
- Stands for Adaptive Boosting.
208
+
- Each predictor pays more attention to the instances wrongly predicted by its predecessor.
209
+
- Achieved by changing the weights of training instances.
210
+
- Each predictor is assigned a coefficient α.
211
+
- α depends on the predictor's training error
212
+
- Learning rate: 0 < η ≤ 1. It help to shrink coeeficient α. It is the tradeoff between η and number of estimator.
213
+
- Smaller number of η should be compensiated by high number of estimator.
0 commit comments