# Scikit-Learn Estimators

## Table of Content 

- [Regression](#regression)
  - [Linear Regression](#linear-regression)
  - [Decision Tree Regressor](#decision-tree-regressor)
  - [Voting Regressor](#voting-regressor)
  - [Bagging Regressor](#bagging-regressor)
- [Classification](#classification)
  - [Decision Tree Classifier](#decision-tree-classifier)
  - [Bagging Classifier](#bagging-classifier)
  - [Voting Classifier](#voting-classifier)

## Regression

### Linear Regression

In [None]:
from sklearn.linear_model import LinearRegression

LinearRegression(
    fit_intercept=True, 
    normalize='deprecated', 
    copy_X=True, 
    n_jobs=None, 
    positive=False
)

### Preprocessing

* Requires standardization (subtract mean and divide by standard deviation). Based on Kutner, M. H., Nachtsheim, C., Neter, J. Applied linear statistical models.

* Multicollinearity- when predictor variables are correlated, the regression coefficient of any one variable depends on which other predictor variables are included and which ones are left out. The common interpretation of a regression coefficient as measuring the change in the expected value of the response variable when the given predictor is increased by one unit while all other predictor variables are held constant is not fully applicable.

* Sensitive to outliers, influential observations:

    * Outlying target values (studentized deleted residuals, studentized residuals)
  
    * Outlying X observations (hat matrix, leverage values)

    * Influential cases (DFFITS, Cook's Distance, DFBETAS)
  

## Decision Tree Regressor

In [None]:
from sklearn.tree import DecisionTreeRegressor

DecisionTreeRegressor(
    criterion='squared_error', 
    splitter='best', 
    max_depth=None, 
    min_samples_split=2, 
    min_samples_leaf=1, 
    min_weight_fraction_leaf=0.0, 
    max_features=None, 
    random_state=None, 
    max_leaf_nodes=None, 
    min_impurity_decrease=0.0, 
    ccp_alpha=0.0
)

### Preprocessing

* No need to standardize.

* High variance (tend to overfit the training data).

### Hyperparameter Tuning

* min_samples_leaf (Regularizer)

* min_samples_split (Regularizer)

* max_depth (Pruning)

* max_features (Pruning)

In [None]:
parameter = [
    {'max_features': ['sqrt', 'log2'], 'ccp_alpha': [0.25, 0.5, 0.75], 'min_samples_split': [5, 6, 7, 8, 9, 10]}
]

### Bagging Regressor

In [None]:
from sklearn.ensemble import BaggingRegressor

BaggingRegressor(
    # If None, then the base estimator is a DecisionTreeRegressor
    base_estimator=None, 
    # Number of base estimators in the ensemble
    n_estimators=10,
    max_samples=1.0, 
    max_features=1.0,
    # Bootstrap samples with replacement
    bootstrap=True, 
    bootstrap_features=False,
    # Use out-of-bag samples to estimate the generalization error
    oob_score=True, 
    warm_start=False,
    random_state=None,
    n_jobs=None,
    verbose=0
)

### Preprocessing

* No need to standardize.

* Reduce variance of individual weak learners.

### Voting Regressor

In [None]:
from sklearn.ensemble import VotingRegressor

VotingRegressor(
    # List of (str, estimator) tuples
    estimators=[('lr', LinearRegression()), ('dt', DecisionTreeRegressor())],
    weights=None, 
    n_jobs=None, 
    verbose=False
)

### Requirement

* Generally taken out towards the end of the project once there are a few strong candidate models. A voting regressor is an ensemble meta-estimator that fits several base regressors, each on the whole dataset. Then it averages the individual predictions to form a final prediction.

## Classification

### Decision Tree Classifier

In [None]:
from sklearn.tree import DecisionTreeClassifier

DecisionTreeRegressor(
    criterion='gini', 
    splitter='best', 
    # Hyperparameter
    max_depth=None, 
    # Hyperparameter
    min_samples_split=2, 
    # Hyperparameter
    min_samples_leaf=1, 
    min_weight_fraction_leaf=0.0,
    # Hyperparameter
    max_features=None, 
    random_state=None, 
    max_leaf_nodes=None, 
    min_impurity_decrease=0.0, 
    class_weight=None, 
    # Hyperparameter
    ccp_alpha=0.0
)

### Preprocessing

* No need to standardize.

* High variance (tend to overfit the training data).

### Hyperparameter Tuning

* min_samples_leaf (Regularizer)

* min_samples_split (Regularizer)

* max_depth (Pruning)

* max_features (Pruning)

In [None]:
parameter = [
    {'max_features': ['sqrt', 'log2'], 'ccp_alpha': [0.25, 0.5, 0.75], 'min_samples_split': [5, 6, 7, 8, 9, 10]}
]

### Bagging Classifier

In [None]:
from sklearn.ensemble import BaggingClassifier

BaggingClassifier(
    # If None, then the base estimator is a DecisionTreeClassifier
    base_estimator=None, 
    # Number of base estimators in the ensemble
    n_estimators=10,
    max_samples=1.0, 
    max_features=1.0,
    # Bootstrap samples with replacement
    bootstrap=True, 
    bootstrap_features=False,
    # Use out-of-bag samples to estimate the generalization error
    oob_score=True, 
    warm_start=False,
    random_state=None,
    n_jobs=None,
    verbose=0
)

### Preprocessing

* No need to standardize.

* Reduce variance of individual weak learners.

### Voting Classifier

In [None]:
from sklearn.ensemble import VotingClassifier

VotingClassifier(
    estimators=[('dt', DecisionTreeClassifier())],
    # Hard or soft voting
    voting='hard', 
    weights=None, 
    n_jobs=None, 
    # If voting='soft' and flatten_transform=True, transform method returns matrix with shape (n_samples, n_classifiers * n_classes)
    # If flatten_transform=False, it returns (n_classifiers, n_samples, n_classes)
    flatten_transform=True, 
    verbose=False
)

### Requirement

* If ‘hard’, uses predicted class labels for majority rule voting. Else if ‘soft’, predicts the class label based on the `argmax` of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers. All classifier must have a `predict_proba` method.