## Scikit-Learn (sklearn) Course

<span>
0. sklearn workflow overview<br>
1. preparing data (collecting, exploring, cleaning, transforming, reducing, splitting)<br>
2. defining problem / selecting machine learning model<br>
3. training model and making predictions<br>
<span style="color:orange">4. evaluating model</span><br>
5. improving model<br>
6. saving and loading model<br>
7. putting it all together
</span>

## 4. Evaluating Model

#### Concepts

--- documentation  
[sklearn documentation / model evaluation](https://scikit-learn.org/stable/modules/model_evaluation.html)

--- sklearn built-in evaluation methods  
<span>1. model.score() method<br>
2. scoring parameter<br>
3. metric functions</span>

--- default .score() metrics  
classification models: accuracy (true predictions / all perdictions)

#### Evaluating classification model

In [None]:
### imports ------------------------------------------------------------------------------------------------------------

import numpy, pandas

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

In [None]:
### preparing data -----------------------------------------------------------------------------------------------------

### loading heart disease classification data into dataframe
heart_disease = pandas.read_csv("data-heart-disease.csv")

### splitting data features/target
features = heart_disease.drop(columns="target")
target = heart_disease.loc[:, "target"]

### splitting data train/test
numpy.random.seed(42)
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size=0.2)

In [None]:
### creating random forest classifier ----------------------------------------------------------------------------------

### instantiating model
numpy.random.seed(42)
classifier = RandomForestClassifier(n_estimators=100)

### training model
classifier.fit(features_train, target_train);

In [None]:
### evaluating model with .score() method on training data -------------------------------------------------------------
classifier.score(features_train, target_train)

In [None]:
### evaluating model with .score() method on test data -----------------------------------------------------------------
classifier.score(features_test, target_test)

In [None]:
### predicting with predict() function ---------------------------------------------------------------------------------
target_prediction = classifier.predict(features_test)
target_prediction

In [None]:
### predicting with predict_proba() function ---------------------------------------------------------------------------
target_probabilities = classifier.predict_proba(features_test)
target_probabilities[:10]

In [None]:
### comparing predictions to true values / model.score method ----------------------------------------------------------
classifier.score(features_test, target_test)

In [None]:
### comparing predictions to true values / computing with numpy --------------------------------------------------------
numpy.mean(target_test == target_prediction)

In [None]:
### comparing predictions to true values / metrics.accuracy_score function ---------------------------------------------
accuracy_score(target_test, target_prediction)

#### Predicting with regression model

In [None]:
### imports ------------------------------------------------------------------------------------------------------------

from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

In [None]:
### preparing data -----------------------------------------------------------------------------------------------------

### loading california housing regression dataset
housing_dict = fetch_california_housing()

### creating california housing dataframe
housing_df = pandas.DataFrame(data=housing_dict["data"], columns=housing_dict["feature_names"])
housing_df["MedHouseVal"] = housing_dict["target"]

### splitting data features/target
features = housing_df.drop(columns="MedHouseVal")
target = housing_df.loc[:, "MedHouseVal"]

### splitting data train/test
numpy.random.seed(42)
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size=0.2)

In [None]:
### creating random forest regressor -----------------------------------------------------------------------------------

### instantiating model
numpy.random.seed(42)
regressor = RandomForestRegressor()

### training model
regressor.fit(features_train, target_train);

In [None]:
### displaying test targets (true values) ------------------------------------------------------------------------------
numpy.array(target_test[:10])

In [None]:
### predicting with predict() function ---------------------------------------------------------------------------------
target_prediction = regressor.predict(features_test)
target_prediction[:10]

In [None]:
### comparing predictions to true values / metrics.mean_absolute_error function ----------------------------------------
mean_absolute_error(target_test, target_prediction)