Actividad: Investigar acerca de la regresión lineal multiple

### Conceptos:

- Root mean squared error (RMSE): La raíz cuadrada del error cuadrático promedio de la regresión (esta es la métrica más utilizada para comparar modelos de regresión).

- Residual standard error: Igual que la raíz del error cuadrático medio, pero ajustado por grados de libertad.

- R-squared (Coeficiente de determinación): La proporción de varianza explicada por el modelo, de 0 a 1.

- t-statistic: El coeficiente de un predictor, dividido por el error estándar del coeficiente, lo que proporciona una métrica para comparar la importancia de las variables en el modelo.

- Weighted regression: Regresión con los registros teniendo diferentes pesos.

In [20]:
# pip install statsmodels

Collecting statsmodels
  Downloading statsmodels-0.13.2-cp310-cp310-macosx_10_9_x86_64.whl (9.7 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.7/9.7 MB[0m [31m14.3 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
Collecting patsy>=0.5.2
  Downloading patsy-0.5.2-py2.py3-none-any.whl (233 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.7/233.7 KB[0m [31m36.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: patsy, statsmodels
Successfully installed patsy-0.5.2 statsmodels-0.13.2
You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.10/bin/python3 -m pip install --upgrade pip' command.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


In [22]:
import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np
from sklearn.metrics import mean_squared_error
import statsmodels.api as sm

In [2]:
data = pd.read_csv("./datasets/insurance.csv")

In [3]:
data

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.900,0,yes,southwest,16884.92400
1,18,male,33.770,1,no,southeast,1725.55230
2,28,male,33.000,3,no,southeast,4449.46200
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.880,0,no,northwest,3866.85520
...,...,...,...,...,...,...,...
1333,50,male,30.970,3,no,northwest,10600.54830
1334,18,female,31.920,0,no,northeast,2205.98080
1335,18,female,36.850,0,no,southeast,1629.83350
1336,21,female,25.800,0,no,southwest,2007.94500


In [4]:
#predictors = ['age','sex','smoker','children','bmi','region']
predictors = ['age','children','bmi']
outcome = 'charges'

In [5]:
insurance_lm = LinearRegression()
insurance_lm.fit(data[predictors],data[outcome])

In [6]:
# Intercept -> b0
insurance_lm.intercept_

-6916.243347787033

In [7]:
for name,coef in zip(predictors, insurance_lm.coef_):
    print(f'{name}: {coef}')

age: 239.9944742936463
children: 542.864652247018
bmi: 332.083364503448


In [8]:
insurance_lm.predict(pd.DataFrame({
    "age":[24,28,25],
    "children":[0,0,0],
    "bmi":[22.5,20,25]
}))

array([6315.49973659, 6445.2692225 , 7385.70262214])

In [10]:
fitted = insurance_lm.predict(data[predictors])

In [11]:
fitted

array([ 6908.77753344,  9160.97706103, 12390.94691779, ...,
        9640.92917145,  6691.39141657, 17377.08299024])

In [16]:
RMSE = np.sqrt(np.sum((data[outcome]-fitted)**2)/fitted.size)

RMSE

11355.317901125973

In [19]:
RMSE_sklearn = np.sqrt(mean_squared_error(data[outcome],fitted))

RMSE_sklearn

11355.317901125973

In [23]:
insurance_sm = sm.OLS(data[outcome],data[predictors])
results = insurance_sm.fit()
results.summary()

0,1,2,3
Dep. Variable:,charges,R-squared (uncentered):,0.596
Model:,OLS,Adj. R-squared (uncentered):,0.595
Method:,Least Squares,F-statistic:,655.7
Date:,"Thu, 25 Aug 2022",Prob (F-statistic):,6.52e-262
Time:,18:34:05,Log-Likelihood:,-14400.0
No. Observations:,1338,AIC:,28810.0
Df Residuals:,1335,BIC:,28820.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
age,205.5096,20.605,9.974,0.000,165.088,245.931
children,407.6827,257.331,1.584,0.113,-97.135,912.501
bmi,162.5084,28.007,5.802,0.000,107.566,217.451

0,1,2,3
Omnibus:,369.789,Durbin-Watson:,2.018
Prob(Omnibus):,0.0,Jarque-Bera (JB):,750.403
Skew:,1.648,Prob(JB):,1.1300000000000001e-163
Kurtosis:,4.61,Cond. No.,42.2
