# Scikit-Learn Estimators

## Table of Content 

- [Regression](#regression)
  - [Distance Based](#distance-based)
    - [Linear Regression](#linear-regression)
  - [Tree Based](#tree-based)
    - [Decision Tree Regressor](#decision-tree-regressor)
    - [Voting Regressor](#voting-regressor)
    - [Bagging Regressor](#bagging-regressor)
    - [Random Forest Regressor](#random-forest-regressor)
    - [Extra Tree Regressor](#extra-tree-regressor)
    - [AdaBoost Regressor](#adaboost-regressor)
- [Classification](#classification)
  - [Tree Based](#tree-based)
    - [Decision Tree Classifier](#decision-tree-classifier)
    - [Bagging Classifier](#bagging-classifier)
    - [Voting Classifier](#voting-classifier)
    - [Random Forest Classifier](#random-forest-classifier)
    - [Extra Tree Classifier](#extra-tree-classifier)
    - [AdaBoost Classifier](#adaboost-classifier)

# Regression

## Distance Based

### Linear Regression

In [None]:
from sklearn.linear_model import LinearRegression

LinearRegression(
    fit_intercept=True, 
    normalize='deprecated', 
    copy_X=True, 
    n_jobs=-1, 
    positive=False
)

### Preprocessing

* Requires standardization (subtract mean and divide by standard deviation). Based on Kutner, M. H., Nachtsheim, C., Neter, J. Applied linear statistical models.

* Multicollinearity- when predictor variables are correlated, the regression coefficient of any one variable depends on which other predictor variables are included and which ones are left out. The common interpretation of a regression coefficient as measuring the change in the expected value of the response variable when the given predictor is increased by one unit while all other predictor variables are held constant is not fully applicable.

* Sensitive to outliers, influential observations:

    * Outlying target values (studentized deleted residuals, studentized residuals)
  
    * Outlying X observations (hat matrix, leverage values)

    * Influential cases (DFFITS, Cook's Distance, DFBETAS)
  

## Tree Based

### Decision Tree Regressor

In [None]:
from sklearn.tree import DecisionTreeRegressor

DecisionTreeRegressor(
    criterion='squared_error', 
    splitter='best', 
    max_depth=None, 
    min_samples_split=2, 
    min_samples_leaf=1, 
    min_weight_fraction_leaf=0.0, 
    # A good value is m = sqrt(p) based on ISL book
    max_features=None, 
    random_state=None, 
    max_leaf_nodes=None, 
    min_impurity_decrease=0.0, 
    ccp_alpha=0.0
)

### Preprocessing

* No need to standardize.

* High variance (tend to overfit the training data).

### Hyperparameter Tuning

* min_samples_leaf

* min_samples_split

* max_depth 

* max_features 

In [None]:
parameter = [
    {'max_features': ['sqrt', 'log2'], 'ccp_alpha': [0.0, 0.25, 0.5, 0.75], 'min_samples_split': [10, 100, 200]}
]

### Bagging Regressor

In [None]:
from sklearn.ensemble import BaggingRegressor

BaggingRegressor(
    # If None, then the base estimator is a DecisionTreeRegressor
    base_estimator=None, 
    # Number of base estimators in the ensemble
    n_estimators=10,
    max_samples=1.0, 
    max_features=1.0,
    # Bootstrap samples with replacement
    bootstrap=True, 
    bootstrap_features=False,
    # Use out-of-bag samples to estimate the generalization error
    oob_score=True, 
    warm_start=False,
    random_state=None,
    n_jobs=-1,
    verbose=0
)

### Preprocessing

* No need to standardize.

* Reduce variance of individual weak learners.

### Voting Regressor

In [None]:
from sklearn.ensemble import VotingRegressor

VotingRegressor(
    # List of (str, estimator) tuples
    estimators=[('lr', LinearRegression()), ('dt', DecisionTreeRegressor())],
    weights=None, 
    n_jobs=-1, 
    verbose=False
)

### Requirement

* Generally taken out towards the end of the project once there are a few strong candidate models. A voting regressor is an ensemble meta-estimator that fits several base regressors, each on the whole dataset. Then it averages the individual predictions to form a final prediction.

### Random Forest Regressor

In [None]:
from sklearn.ensemble import RandomForestRegressor

RandomForestRegressor(
    n_estimators=100, 
    criterion='squared_error', 
    max_depth=None, 
    min_samples_split=2, 
    min_samples_leaf=1, 
    min_weight_fraction_leaf=0.0, 
    max_features=1.0, 
    max_leaf_nodes=None, 
    min_impurity_decrease=0.0, 
    bootstrap=True, 
    # Set to False by default
    oob_score=True, 
    # Change to -1 to use all cores
    n_jobs=-1, 
    # Controls bootstrapping of the samples when creating trees and getting a random subsets of features to search for the best feature for splitting
    random_state=None, 
    verbose=0, 
    warm_start=False, 
    ccp_alpha=0.0, 
    max_samples=None
)

# Roughly equivalent to the following
BaggingRegressor(
    DecisionTreeRegressor(criterion='mse', splitter='random', max_features=1.0),
    n_estimators=100, 
    random_state=None
)

### Preprocessing

* No need to standardize.

* Reduce variance of individual weak learners. Trading higher bias (underfitting) for a lower variance (overfitting).

* Differs from bagging by introducing more randomness and diversity such that each split only considers a subset of the predictors or features, which can be controlled via `max_features`. Instead of searching for the very best feature among all features when splitting a node, it searches for the best feature among a random subset of features.

### Hyperparameter Tuning

* `RandomForestRegressor` has most of the hyperparameters of a `DecisionTreeRegressor` (to control how trees are grown) and all the hyperparameters of a `BaggingRegressor` to control the ensemble itself.

In [None]:
parameter = [
    {'n_estimators': [100, 300, 500],
     'max_features': ['sqrt', 'log2'], 
     'ccp_alpha': [0.0, 0.25, 0.5, 0.75], 
     'min_samples_split': [10, 100, 200], 
     'max_samples': [0.5, 0.75, 1.0]}
]

### Extra Tree Regressor

In [None]:
from sklearn.ensemble import ExtraTreesRegressor

ExtraTreesRegressor(
    n_estimators=100, 
    criterion='squared_error', 
    max_depth=None, 
    min_samples_split=2, 
    min_samples_leaf=1, 
    min_weight_fraction_leaf=0.0, 
    max_features=1.0, 
    max_leaf_nodes=None, 
    min_impurity_decrease=0.0, 
    # Changed default values from False to True
    bootstrap=True, 
    oob_score=True, 
    n_jobs=-1, 
    random_state=None, 
    verbose=0, 
    warm_start=False, 
    ccp_alpha=0.0, 
    max_samples=None
)

### Preprocessing

* No need to standardize.

* Even more randomness and diversity than `RandomForestClassifier` by using random thresholds for each feature rather than searching for the best possible thresholds at each split.

### Hyperparameter Tuning

In [None]:
parameter = [
    {'n_estimators': [100, 300, 500],
     'max_features': ['sqrt', 'log2'], 
     'ccp_alpha': [0.0, 0.25, 0.5, 0.75], 
     'min_samples_split': [10, 100, 200], 
     'max_samples': [0.5, 0.75, 1.0]}
]

### AdaBoost Regressor

In [None]:
from sklearn.ensemble import AdaBoostRegressor

AdaBoostRegressor(
    # Base esitmator where sample weighting is required, as well as proper classes_ and n_classes_ attributes
    # If None, then a DecisionTreeRegressor initialized with max_depth=3
    base_estimator=None, 
    # In case of perfect fit, the learning procedure is stopped early
    n_estimators=500, 
    # Higher learng rate penalizes weak learners with higher error rates more and rewards stronger learners with lower error rates more
    learning_rate=1.0, 
    loss='linear', 
    random_state=None
)

### Preprocessing

* Depending on the base estimator used, inputs may need to be standardized.

### Hyperparameter Tuning

* base_estimator

* n_estimators (controlling underfitting (high bias) or overderfitting (high variance) trade-off)

* learning_rate (how much should worst weak learners be penalized and stronger learners to be rewarded)

* loss (the loss function to use when updating the weights after each boosting iteration)

In [None]:
from sklearn.svm import SVR, LinearSVR

parameter = [
    {'base_estimator': [SVR(kernel='rbf'), LinearSVR()],
     'n_estimators': [100, 500, 1000],
     'loss': ['linear', 'square', 'exponential'],
     'learning_rate': [1, 5, 10]}
]

# Classification

## Tree Based

### Decision Tree Classifier

In [None]:
from sklearn.tree import DecisionTreeClassifier

DecisionTreeRegressor(
    criterion='gini', 
    splitter='best', 
    # Hyperparameter
    max_depth=None, 
    # Hyperparameter
    min_samples_split=2, 
    # Hyperparameter
    min_samples_leaf=1, 
    min_weight_fraction_leaf=0.0,
    # A good value is m = sqrt(p) based on ISL book
    max_features=None, 
    random_state=None, 
    max_leaf_nodes=None, 
    min_impurity_decrease=0.0, 
    class_weight=None, 
    # Hyperparameter
    ccp_alpha=0.0
)

### Preprocessing

* No need to standardize.

* High variance (tend to overfit the training data).

### Hyperparameter Tuning

* min_samples_leaf (Regularizer)

* min_samples_split (Regularizer)

* max_depth (Pruning)

* max_features (Pruning)

In [None]:
parameter = [
    {'max_features': ['sqrt', 'log2'], 'ccp_alpha': [0.0, 0.25, 0.5, 0.75], 'min_samples_split': [10, 100, 200]}
]

### Bagging Classifier

In [None]:
from sklearn.ensemble import BaggingClassifier

BaggingClassifier(
    # If None, then the base estimator is a DecisionTreeClassifier
    base_estimator=None, 
    # Number of base estimators in the ensemble
    n_estimators=10,
    max_samples=1.0, 
    max_features=1.0,
    # Bootstrap samples with replacement
    bootstrap=True, 
    bootstrap_features=False,
    # Use out-of-bag samples to estimate the generalization error
    oob_score=True, 
    warm_start=False,
    random_state=None,
    n_jobs=-1,
    verbose=0
)

### Preprocessing

* No need to standardize.

* Reduce variance of individual weak learners.

### Voting Classifier

In [None]:
from sklearn.ensemble import VotingClassifier

VotingClassifier(
    estimators=[('dt', DecisionTreeClassifier())],
    # Hard or soft voting
    voting='hard', 
    weights=None, 
    n_jobs=-1, 
    # If voting='soft' and flatten_transform=True, transform method returns matrix with shape (n_samples, n_classifiers * n_classes)
    # If flatten_transform=False, it returns (n_classifiers, n_samples, n_classes)
    flatten_transform=True, 
    verbose=False
)

### Requirement

* If ‘hard’, uses predicted class labels for majority rule voting. Else if ‘soft’, predicts the class label based on the `argmax` of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers. All classifier must have a `predict_proba` method.

### Random Forest Classifier

In [None]:
from sklearn.ensemble import RandomForestClassifier

RandomForestClassifier(
    n_estimators=100,
    criterion='gini', 
    max_depth=None, 
    min_samples_split=2, 
    min_samples_leaf=1, 
    min_weight_fraction_leaf=0.0, 
    max_features='sqrt', 
    max_leaf_nodes=None, 
    min_impurity_decrease=0.0, 
    bootstrap=True, 
    # Set to False by default
    oob_score=True, 
    n_jobs=-1, 
    random_state=None, 
    verbose=0, 
    warm_start=False, 
    class_weight=None, 
    ccp_alpha=0.0, 
    max_samples=None
)

# Roughly equivalent to the following
BaggingClassifier(
    DecisionTreeClassifier(criterion='gini', splitter='random', max_features='sqrt'),
    n_estimators=100, 
    random_state=None
)

### Preprocessing

* No need to standardize.

* Reduce variance of individual weak learners. Trading higher bias (underfitting) for a lower variance (overfitting).

* Differs from bagging by introducing more randomness and diversity such that each split only considers a subset of the predictors or features, which can be controlled via `max_features`. Instead of searching for the very best feature among all features when splitting a node, it searches for the best feature among a random subset of features.

### Hyperparameter Tuning

* `RandomForestClassifer` has most of the hyperparameters of a `DecisionTreeClassifier` (to control how trees are grown) and all the hyperparameters of a `BaggingClassifier` to control the ensemble itself.

In [None]:
parameter = [
    {'n_estimators': [100, 300, 500],
     'max_features': ['sqrt', 'log2'], 
     'ccp_alpha': [0.0, 0.25, 0.5, 0.75], 
     'min_samples_split': [10, 100, 200], 
     'max_samples': [0.5, 0.75, 1.0]}
]

### Extra Tree Classifier

In [None]:
from sklearn.ensemble import ExtraTreesClassifier

ExtraTreesClassifier(
    n_estimators=100, 
    criterion='gini', 
    max_depth=None, 
    min_samples_split=2, 
    min_samples_leaf=1, 
    min_weight_fraction_leaf=0.0, 
    max_features='sqrt', 
    max_leaf_nodes=None, 
    min_impurity_decrease=0.0, 
    # Changed default values from False to True
    bootstrap=True, 
    oob_score=True, 
    n_jobs=-1, 
    random_state=None, 
    verbose=0, 
    warm_start=False, 
    class_weight=None, 
    ccp_alpha=0.0, 
    max_samples=None
)

### Preprocessing

* No need to standardize.

* Even more randomness and diversity than `RandomForestClassifier` by using random thresholds for each feature rather than searching for the best possible thresholds at each split.

### Hyperparameter Tuning

In [None]:
parameter = [
    {'n_estimators': [100, 300, 500],
     'max_features': ['sqrt', 'log2'], 
     'ccp_alpha': [0.0, 0.25, 0.5, 0.75], 
     'min_samples_split': [10, 100, 200], 
     'max_samples': [0.5, 0.75, 1.0]}
]

### AdaBoost Classifier

In [None]:
from sklearn.ensemble import AdaBoostClassifier

AdaBoostClassifier(
    # Base esitmator where sample weighting is required, as well as proper classes_ and n_classes_ attributes
    # If None, then the base estimator is DecisionTreeClassifier initialized with max_depth=1
    base_estimator=None, 
    # In case of perfect fit, the learning procedure is stopped early
    n_estimators=500, 
    # Higher learng rate penalizes weak learners with higher error rates more and rewards stronger learners with lower error rates more
    learning_rate=1.0, 
    # The base_estimator must support calculation of class probabilities with 'SAMME.R', and 'SAMME' uses the SAMME discrete boosting algorithm
    algorithm='SAMME.R', 
    random_state=None
)

### Preprocessing

* Depending on the base estimator used, inputs may need to be standardized.

### Hyperparameter Tuning

* base_estimator

* n_estimators (controlling underfitting (high bias) or overderfitting (high variance) trade-off)

* learning_rate (how much should worst weak learners be penalized and stronger learners to be rewarded)

In [None]:
from sklearn.svm import SVC, LinearSVC

parameter = [
    {'base_estimator': [SVC(kernel='rbf'), DecisionTreeClassifier(max_depth=3), LinearSVC()],
     'n_estimators': [100, 500, 1000],
     'learning_rate': [1, 5, 10]}
]