In [7]:
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
np.set_printoptions(precision=4, suppress=True)

### 4.2.2 Regression model evaluation metrics

Model evaluation metrics documentation - https://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics 

The ones we're going to cover are:
1. R^2 (pronounced r-squared) or coefficient of determination
2. Mean absolute error (MAE)
3. Mean squared error (MSE)

**R^2**

What R-squared does: Compares your models predictions to the mean of the targets. Values can range from negative infinity (a very poor model) to 1.  For example, if all your model does is predict the mean of the targets, it's R^2 value would be 0. And if your model perfectly predicts a range of numbers it's R^2 value would be 1.

In [8]:
from sklearn.datasets import fetch_california_housing

np.random.seed(42)
house_data = fetch_california_housing(as_frame=True)
house_df = house_data.frame
house_df

x = house_df.drop(columns=["MedHouseVal"], axis=1)
y = house_df["MedHouseVal"]

In [9]:
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=42)

In [10]:
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(x_train, y_train)
y_pred = model.predict(x_test)

In [12]:
model.score(x_test, y_test)

0.8051230593157366

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

print(mean_absolute_error(y_test, y_pred))
print(mean_squared_error(y_test, y_pred))
print(r2_score(y_test, y_pred))

0.2553684927247781
0.32754256845930246
0.8051230593157366


**Mean absolute error (MAE)**

MAE is the average of the absolute differences between predictions and actual values.

It gives you an idea of how wrong your models predictions are.

In [16]:
df = pd.DataFrame({"Actual": y_test, "Predicted": y_pred})
df["difference"] = df["Actual"] - df["Predicted"]
df.head()

Unnamed: 0,Actual,Predicted,difference
20046,0.477,0.5095,-0.0325
3024,0.458,0.74161,-0.28361
15663,5.00001,4.923257,0.076753
20484,2.186,2.52961,-0.34361
9814,2.78,2.27369,0.50631


In [17]:
df["difference"].abs().mean()

np.float64(0.32754256845930246)

**Mean squared error (MSE)**

MSE is the mean of the square of the errors between actual and predicted values.

In [20]:
df["squared_difference"] = df["difference"] ** 2
df["squared_difference"].abs().mean()

np.float64(0.2553684927247781)

In [21]:
df

Unnamed: 0,Actual,Predicted,difference,squared_difference
20046,0.47700,0.509500,-0.032500,0.001056
3024,0.45800,0.741610,-0.283610,0.080435
15663,5.00001,4.923257,0.076753,0.005891
20484,2.18600,2.529610,-0.343610,0.118068
9814,2.78000,2.273690,0.506310,0.256350
...,...,...,...,...
15362,2.63300,2.267210,0.365790,0.133802
16623,2.66800,1.993650,0.674350,0.454748
18086,5.00001,4.758219,0.241791,0.058463
2144,0.72300,0.714090,0.008910,0.000079
