## Scikit-Learn (Sklearn) Course

<span>
0. sklearn workflow overview<br>
1. preparing data (exploring, cleaning, transforming, reducing, splitting)<br>
2. selecting the machine learning model / algorithm<br>
3. training the algorithm and making predictions<br>
4. evaluating the algorithm<br>
5. improving the model<br>
<span style="color:orange">6. saving and loading the algorithm</span><br>
7. putting it all together
</span>

## 6. Saving and Loading the Algorithm

#### General concepts

--- saving / loading tools  
pickle python library (pickle.dump and pickle.load)  
joblib python library (joblib.dump and joblib.load)  
the joblib module is more efficient for trained sklearn algorithms

#### Creating and training the algorithm

In [None]:
### imports
import numpy, pandas
from sklearn.model_selection import cross_val_score, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier

In [None]:
### preparing data

### loading heart disease data into dataframe
heart_disease = pandas.read_csv("data-heart-disease.csv")

### splitting data features <> target
features = heart_disease.drop(columns="target")
target = heart_disease.loc[:, "target"]

In [None]:
### running randomized grid search

### creating random grid
random_grid = {
    "max_depth": [None, 5, 10, 20, 30],
    "max_features": ["sqrt"],
    "min_samples_leaf": [1, 2, 4],
    "min_samples_split": [2, 4, 6],
    "n_estimators": [10, 100, 200, 500, 1000, 1200]}

### creating randomized grid search object
classifier_rscv = RandomizedSearchCV(
    estimator=RandomForestClassifier(n_jobs=-1),
    param_distributions=random_grid,
    n_iter=45, cv=5, verbose=True)

### training randomized grid search object
numpy.random.seed(42)
classifier_rscv.fit(X=features, y=target);

In [None]:
### reading best parameters
classifier_rscv.best_params_

In [None]:
### classification algorithm evaluation function
def evaluateAlgo(algorithm, features, target):
    numpy.random.seed(42)
    metrics_dict = {
        "Accuracy": cross_val_score(estimator=algorithm, X=features, y=target, cv=5, scoring="accuracy").mean(),
        "Precision": cross_val_score(estimator=algorithm, X=features, y=target, cv=5, scoring="precision").mean(),
        "Recall": cross_val_score(estimator=algorithm, X=features, y=target, cv=5, scoring="recall").mean(),
        "F1 Score": cross_val_score(estimator=algorithm, X=features, y=target, cv=5, scoring="f1").mean()}
    print(f"""Accuracy: {100.0 * metrics_dict["Accuracy"]:.3f}%""")
    print(f"""Precision: {100.0 * metrics_dict["Precision"]:.3f}%""")
    print(f"""Recall: {100.0 * metrics_dict["Recall"]:.3f}%""")
    print(f"""F1 Score: {100.0 * metrics_dict["F1 Score"]:.3f}%""")
    return metrics_dict

In [None]:
### evaluating best estimator
classifier_best = classifier_rscv.best_estimator_
classifier_metrics = evaluateAlgo(classifier_best, features, target)

#### Saving and loading the algorithm with pickle

In [None]:
### imports
import pickle

In [None]:
### saving trained algorithm
pickle.dump(obj=classifier_best, file=open(file="algo-heart-disease.pkl", mode="wb"))

In [None]:
### loading and using trained algorithm
classifier_pickle = pickle.load(file=open(file="algo-heart-disease.pkl", mode="rb"))
pickle_metrics = evaluateAlgo(classifier_pickle, features, target)

#### Saving and loading the algorithm with joblib

In [None]:
### imports
import joblib

In [None]:
### saving trained algorithm
joblib.dump(value=classifier_best, filename="algo-heart-disease.joblib");

In [None]:
### loading and using trained algorithm
classifier_joblib = joblib.load(filename="algo-heart-disease.joblib")
joblib_metrics = evaluateAlgo(classifier_joblib, features, target)