## Scikit-Learn (Sklearn) Course

<span>
0. sklearn workflow overview<br>
1. preparing data (exploring, cleaning, transforming, reducing, splitting)<br>
2. selecting machine learning model / algorithm<br>
3. algorithm training and prediction<br>
<span style="color:orange">4. algorithm evaluation</span><br>
5. improving algorithm<br>
6. saving and loading algorithm<br>
7. putting it all together
</span>

## 4. Evaluating Model

#### General concepts

--- resources  
[sklearn documentation > model evaluation](https://scikit-learn.org/stable/modules/model_evaluation.html)  
[statquest youtube video: ROC and AUC explained](https://www.youtube.com/watch?v=4jRBRDbJemM)  
[sklearn documentation > ROC curve for multiclass classification algorithms](https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html)  
<span style="color:red">>>> One-vs-Rest multiclass ROC<span>

--- sklearn built-in evaluation techniques  
`.score()` method  
cross valiadion  
metric functions

--- cross validation  
creates `cv=k` different train/test splits from the same dataset (k-fold cross validation)  
trains and scores the algorithm on all splits > training covers the entire dataset  
scoring metric is defined by the `scoring=` parameter > `scoring=None` invokes the default scorer  
provides algorithm metric as mean of scores from all splits

--- regression model metrics  
coefficient of determination (R^2)  
mean absolute error (MAE)  
mean squared error (MSE)

--- coding tricks within jupyter notebook  
**`!command`, e.g., `!dir`** runs terminal command within jupyter notebook  
**`sklearn.__version__`** displays version of installed module

#### Creating classification model

In [None]:
### imports
import numpy, pandas
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

In [None]:
### preparing data

### loading heart disease classification data into dataframe
heart_disease = pandas.read_csv(filepath_or_buffer="data-heart-disease.csv")

### splitting data features/target
features = heart_disease.drop(columns="target")
target = heart_disease.loc[:, "target"]

### splitting data train/test
numpy.random.seed(42)
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size=0.2)

In [None]:
### random forest classifier training and prediction

### instantiating model
numpy.random.seed(42)
classifier = RandomForestClassifier(n_estimators=100)

### training model / prediction
classifier.fit(X=features_train, y=target_train)
target_preds = classifier.predict(X=features_test)

#### Evaluating classification model

--- classification model metrics  
`.score()` method, confusion matrix, classification report, ROC curve, AUC  
accuracy, precision, recall, f1-score, TNR, FPR, FNR, TPR  
cross validation

In [None]:
### imports
from matplotlib import pyplot
from sklearn.metrics import ConfusionMatrixDisplay, classification_report
from sklearn.metrics import roc_curve, RocCurveDisplay, roc_auc_score
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import cross_val_score

--- `.score()` method  
default `.score()` and cross validation metric for classification algorithms is accuracy  
**accuracy:** true predictions / all predictions

In [None]:
### evaluating algorithm with .score() method on training data
classifier.score(X=features_train, y=target_train)

In [None]:
### evaluating algorithm with .score() method on test data
classifier.score(X=features_test, y=target_test)

--- accuracy  
formula = true predictions / all predictions  
accuracy is a good metric when classes are balanced

In [None]:
### evaluating algorithm with accuracy score
accuracy_score(y_true=target_test, y_pred=target_preds)

In [None]:
### evaluating algorithm with cross-validated default score (accuracy)
numpy.random.seed(42)
cv_accuracy = cross_val_score(estimator=classifier, X=features, y=target, cv=5, scoring=None)
cv_accuracy.mean()

--- confusion matrix  
two-dimensional array of targets (rows) vs predictions (columns)  
a quick way to compare targets to predictions  
gives an idea of where the algorithm is confused

In [None]:
### evaluating algorithm with confusion matrix
pandas.crosstab(index=target_test, rownames=["Targets"], columns=target_preds, colnames=["Predictions"])

In [None]:
### visualizing confusion matrix with sklearn
ConfusionMatrixDisplay.from_predictions(y_true=target_test, y_pred=target_preds);

--- classification report  
a summary table of several classification metrics  
**precision:** true within-class predictions / all within-class predictions  
**recall:** true within-class predictions / all within-class targets  
**f1-score:** within-class mean of precision and recall  
**support:** all within-class targets  
**accuracy:** true predictions / all predictions  
**macro average:** accross-class means of precision, recall, and f1-score  
**weighted average:** support-weighted accross-class means of precision, recall, and f1-score

In [None]:
### evaluating algorithm with classification report
print(classification_report(y_true=target_test, y_pred=target_preds))

--- precision  
formula = true positive predictions / all positive predictions  
precision is a better metric when classes are unbalanced  
precision becomes important when false positives are concerned

In [None]:
### evaluating algorithm with precision score
precision_score(y_true=target_test, y_pred=target_preds)

In [None]:
### evaluating algorithm with cross-validated precision
numpy.random.seed(42)
cv_precision = cross_val_score(estimator=classifier, X=features, y=target, cv=5, scoring="precision")
cv_precision.mean()

--- recall  
formula = true positive predictions / all positive targets  
recall is a better metric when classes are unbalanced  
recall becomes important when false negatives are concerned

In [None]:
### evaluating algorithm with recall score
recall_score(y_true=target_test, y_pred=target_preds)

In [None]:
### evaluating algorithm with cross-validated recall
numpy.random.seed(42)
cv_recall = cross_val_score(estimator=classifier, X=features, y=target, cv=5, scoring="recall")
cv_recall.mean()

--- f1-score  
formula = mean of precision and recall  
f1-score is a better metric when classes are unbalanced  
f1-score becomes important when both false positives and false negatives are concerned

In [None]:
### evaluating algorithm with f1-score
f1_score(y_true=target_test, y_pred=target_preds)

In [None]:
### evaluating algorithm with cross-validated f1-score
numpy.random.seed(42)
cv_f1 = cross_val_score(estimator=classifier, X=features, y=target, cv=5, scoring="f1")
cv_f1.mean()

--- receiver operating characteristic (ROC) curve  
plots true positive rate (tpr) over false positive rate (fpr)  
visualizes algorithm performance at various algorithm decision thresholds  
suitable for binary classification models  
**true negative rate = specificity:** true negative predictions / all negative targets  
**false positive rate = 1 - specificity:** false positive predictions / all negative targets  
**false negative rate = 1 - sensitivity:** false negative predictions / all positive targets  
**true positive rate = sensitivity = recall:** true positive predictions / all positive targets

--- ROC curve for multiclass classification models  
a ROC curve works with binary output, so multiclass output must be binarized  
one-vs-rest binarization: comparing each class to all the others  
one-vs-one binarization: comparing every pairwise combination of classes


In [None]:
### function for plotting ROC curve

### function init
def plotRoc(plot_title, tpr, fpr):
    """
    Plots ROC curve, i.e., true positive rate (tpr) over false positive rate (fpr)
    """

    ### plotting ROC curve
    pyplot.plot(fpr, tpr, color="orange", label="ROC Curve")

    ### plotting baseline
    pyplot.plot([0,1], [0,1], color="blue", linestyle="--", label="Guessing")

    ### customizing plot
    pyplot.title(plot_title)
    pyplot.ylabel("True Positive Rate")
    pyplot.xlabel("False Positive Rate")
    pyplot.legend()

    ### rendering plot
    pyplot.show()

    ### function termination
    return

In [None]:
### evaluating model with ROC curve
target_probs_positive = classifier.predict_proba(X=features_test)[:, 1]
classifier_fpr, classifier_tpr, classifier_thresholds = roc_curve(y_true=target_test, y_score=target_probs_positive)
plotRoc("Receiver Operating Characteristic (ROC)", classifier_tpr, classifier_fpr);

In [None]:
# plotting ROC curve with sklearn
RocCurveDisplay.from_predictions(y_pred=target_probs_positive, y_true=target_test);

In [None]:
### perfect ROC curve
perfect_fpr, perfect_tpr, perfect_threshold = roc_curve(y_true=target_test, y_score=target_test)
plotRoc("Perfect ROC", perfect_tpr, perfect_fpr);

--- area under the ROC curve (AUC)  
integral of ROC curve > ranges between 0.0-1.0  
used to compare the performance of different algorithms

In [None]:
### evaluating model with AUC score ------------------------------------------------------------------------------------
classifier_auc = roc_auc_score(y_true=target_test, y_score=target_probs_positive)
perfect_auc = roc_auc_score(y_true=target_test, y_score=target_test)
classifier_auc, perfect_auc

#### Creating regression model

In [None]:
### imports
import numpy, pandas
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

In [None]:
### preparing data

### loading california housing dataset
housing_dict = fetch_california_housing()

### creating california housing dataframe
housing_df = pandas.DataFrame(data=housing_dict["data"], columns=housing_dict["feature_names"])
housing_df["MedHouseVal"] = housing_dict["target"]

### splitting data features/target
features = housing_df.drop(columns="MedHouseVal")
target = housing_df.loc[:, "MedHouseVal"]

### splitting data train/test
numpy.random.seed(42)
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size=0.2)
target_test: numpy.ndarray

In [None]:
### random forest regressor training and prediction

### instantiating model
numpy.random.seed(42)
regressor = RandomForestRegressor(n_estimators=100)

### training model / prediction
regressor.fit(X=features_train, y=target_train)
target_preds = regressor.predict(X=features_test)

#### Evaluating regression model

In [None]:
### imports
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

--- `.score()` method  
default `.score()` and cross validation metric for regression models is [coefficient of determination (r^2)](https://en.wikipedia.org/wiki/Coefficient_of_determination)

In [None]:
### evaluating algorithm with .score() method on test data
regressor.score(X=features_test, y=target_test)

--- coefficient of determination (r2-score)  
when a model predicts the mean of targets, its r2-score is 0.0  
when a model perfectly predicts all targets, its r2-score is 1.0  
`numpy.full(shape=, fill_value=)` creates an array of `shape=` filled with `fill_value=`

In [None]:
### getting r2-score of 0.0
target_test_mean = numpy.full(shape=len(target_test), fill_value=target_test.mean())
r2_score(y_true=target_test, y_pred=target_test_mean)

In [None]:
### getting r2-score of 1.0
r2_score(y_true=target_test, y_pred=target_test)

In [None]:
### evaluating algorithm with cross-validated default score (r2-score)
numpy.random.seed(42)
cv_r2 = cross_val_score(estimator=regressor, X=features, y=target, cv=3, scoring=None)
cv_r2.mean()

--- mean absolute error (MAE)  
mean of absolute differences between predictions and targets  
represents the linear magnitude of prediction error

In [None]:
### computing mean absolute error with sklearn function
mean_absolute_error(y_true=target_test, y_pred=target_preds)

In [None]:
### computing mean absolute error step-by-step
error_df = pandas.DataFrame(data={"target preds": target_preds, "target test": target_test})
error_df["differences"] = numpy.abs(error_df["target preds"] - error_df["target test"])
error_df["differences"].mean()

In [None]:
### evaluating algorithm with cross-validated mean absolute error
numpy.random.seed(42)
cv_mae = cross_val_score(estimator=regressor, X=features, y=target, cv=3, scoring="neg_mean_absolute_error")
cv_mae.mean()

--- mean squared error (MSE)  
mean of squared differences between predictions and targets  
squaring emphasizes large errors and diminishes small errors  
there is also root mean squared error (RMSE) - see sklearn documentation

In [None]:
### computing mean squared error with sklearn function
mean_squared_error(y_true=target_test, y_pred=target_preds)

In [None]:
### computing mean squared error step-by-step
error_df["squared differences"] = numpy.square(error_df["differences"])
error_df["squared differences"].mean()

--- cross validation  
cross validation scoring is also available for evaluating regression algorithms

In [None]:
### evaluating algorithm with cross-validated mean squared error
numpy.random.seed(42)
cv_mse = cross_val_score(estimator=regressor, X=features, y=target, cv=3, scoring="neg_mean_squared_error")
cv_mse.mean()