# Metodologias de Avaliação

Para evitar ocorrência de *overfit* e calcular métricas mais confiáveis, podemos utilizar diversos tipos de metodologias, as quais partem do princípio de definir subconjuntos de **treinamento e teste**, separados de forma disjunta. Os dados de treinamento são empregados no ajuste do modelo, enquanto que os exemplos de teste simulam a apresentação de objetos novos, os quais não foram vistos durante o aprendizado.



# Hold Out

Nessa metodologia, o conjunto de dados é dividido em apenas duas partes: treinamento e teste. O primeiro serve para o ajuste do modelo, enquanto o segundo será usado em sua avaliação.

<center><img src="https://s3-sa-east-1.amazonaws.com/lcpi/4fa6e987-2508-400d-a751-a9c29279923d.png" alt="Drawing" style="height: 250px;"/></center>

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('titanic.csv')
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [4]:
# amostra aleatoria treino e teste
df_treino = df.sample(frac=0.6, random_state=65)
df_treino.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
828,829,1,3,"McCormack, Mr. Thomas Joseph",male,,0,0,367228,7.75,,Q
396,397,0,3,"Olsson, Miss. Elina",female,31.0,0,0,350407,7.8542,,S
222,223,0,3,"Green, Mr. George Henry",male,51.0,0,0,21440,8.05,,S
278,279,0,3,"Rice, Master. Eric",male,7.0,4,1,382652,29.125,,Q
688,689,0,3,"Fischer, Mr. Eberhard Thelander",male,18.0,0,0,350036,7.7958,,S


In [6]:
df_treino.shape

(535, 12)

In [8]:
df_teste = df[~df['PassengerId'].isin(df_treino['PassengerId'])]
df_teste.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.55,C103,S


In [9]:
df_teste.shape

(356, 12)

In [11]:
df_teste.shape[0] + df_treino.shape[0] == df.shape[0]

True

## train_test_split

In [12]:
from sklearn.model_selection import train_test_split

In [14]:
# df's
df_treino, df_teste = train_test_split(df, test_size=0.4, random_state=78)

In [15]:
df_treino.shape

(534, 12)

In [16]:
# features e targets

X = df.drop('Survived', axis=1)
y = df['Survived']


X_treino, X_teste, y_treino, y_teste = train_test_split(X, y, test_size=0.4)

In [17]:
X_treino.shape

(534, 11)

In [19]:
X_treino['Age'].isnull().sum()

112

In [20]:
X_teste['Age'].isnull().sum()

65

In [25]:
media_treino = X_treino['Age'].mean()
media_treino

29.491516587677665

In [22]:
X_treino.loc[X_treino['Age'].isnull(), 'Age'] = media_treino
X_treino['Age'].isnull().sum()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


0

In [23]:
X_teste['Age'].isnull().sum()

65

In [26]:
X_teste['Age'].mean()

29.999143835616437

In [None]:
X_teste.loc[X_teste['Age'].isnull(), 'Age'] = media_treino

In [27]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


# Amostras

Nessa metodologia, o conjunto de dados também é dividido em duas partes, porém dessa vez esse processo de divisão é realizado N vezes, gerando N modelos, cada um com sua medida de avaliação obitdo em cima do conjunto de testes. Para obter uma medida única, são calculadas a média e o desvio padrão do desempenho avaliado. As amostras podem ser realizadas tanto com reposição quanto sem reposição.

<center><img src="https://s3-sa-east-1.amazonaws.com/lcpi/c82578f4-1372-4c18-8d3a-62be46829805.png" alt="Drawing" style="height: 250px;;"/></center>

In [None]:
df.head()

In [None]:
# usar 'Pclass','SibSp','Fare','Age' e calcular roc_auc



In [None]:
# repetir N vezes


In [None]:
# media e desvio

# Cross-validation

Nessa metodologia, o conjunto de dados é dividido em K partições, gerando N modelos. Cada modelo será treinado com um conjunto de K-1 partições, tomando a partição restante como teste, na qual serão calculadas as medidas de desempenho. Aqui, cada partição será considerada como teste 1 vez, e como treino k-1 vezes. Da mesma forma que nas Amostras, aqui são calculadas a édia e o desvio padrão.

<center><img src="https://s3-sa-east-1.amazonaws.com/lcpi/411b2d93-23ea-4cbb-a0d1-a5aafb64fd5b.png" style="height: 250px;;"/></center>

In [28]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


**Hold out simples**

In [36]:
X = df[['Pclass','SibSp','Fare','Age']].copy()
y = df['Survived'].copy()

X_treino_, X_teste, y_treino_, y_teste = train_test_split(X, y, test_size=0.3, random_state=42)
X_treino, X_val, y_treino, y_val = train_test_split(X_treino_, y_treino_, test_size=0.3, random_state=43)

media_treino = X_treino['Age'].mean()
X_treino.loc[X_treino['Age'].isnull(), 'Age'] = media_treino
X_val.loc[X_val['Age'].isnull(), 'Age'] = media_treino


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


In [37]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

In [40]:
model = LogisticRegression()

model.fit(X_treino, y_treino)

y_pred = model.predict_proba(X_val)

roc_auc_score(y_val, y_pred[:,1])

0.7477137913989125

In [41]:
from sklearn.model_selection import cross_validate, cross_val_score, cross_val_predict

In [46]:
model = LogisticRegression()

media_treino = X_treino_['Age'].mean()
X_treino_.loc[X_treino_['Age'].isnull(), 'Age'] = media_treino

cv = cross_validate(estimator=model, X=X_treino_, y=y_treino_, cv=5, scoring='roc_auc')
cv

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


{'fit_time': array([0.01698613, 0.01716256, 0.01395655, 0.01496744, 0.01303554]),
 'score_time': array([0.00303578, 0.00432062, 0.00303149, 0.00199819, 0.00297594]),
 'test_score': array([0.69421713, 0.76995047, 0.70913594, 0.67460981, 0.72157191])}

In [48]:
cv['test_score'].mean()

0.713897050677048

In [49]:
cv['test_score'].std()

0.03210714461530668

## Como fazer predições?

In [51]:
y_pred = cross_val_predict(estimator=model, X=X_treino_, y=y_treino_, cv=5)
y_pred

array([1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0,
       1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0,
       1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1,
       1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

## Métricas múltiplas

In [53]:
cv = cross_validate(estimator=model, X=X_treino_, y=y_treino_, cv=5, scoring=['roc_auc', 'precision', 'recall'])
cv

{'fit_time': array([0.01927567, 0.01605654, 0.01624846, 0.0120554 , 0.01096892]),
 'score_time': array([0.00900221, 0.00603223, 0.0079999 , 0.00487494, 0.00590777]),
 'test_roc_auc': array([0.69421713, 0.76995047, 0.70913594, 0.67460981, 0.72157191]),
 'test_precision': array([0.66666667, 0.65625   , 0.63157895, 0.6       , 0.58823529]),
 'test_recall': array([0.34042553, 0.45652174, 0.26086957, 0.32608696, 0.43478261])}

In [56]:
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(max_depth=5)

cv = cross_validate(estimator=model, 
                    X=X_treino_, 
                    y=y_treino_, 
                    cv=5, 
                    scoring=['roc_auc', 'precision', 'recall'], 
                    return_train_score=True)
cv

{'fit_time': array([0.00405765, 0.00398517, 0.00528264, 0.00500607, 0.00400448]),
 'score_time': array([0.00885892, 0.00505185, 0.00602913, 0.00599504, 0.00600743]),
 'test_roc_auc': array([0.72504092, 0.7278481 , 0.72344524, 0.67238016, 0.73843367]),
 'train_roc_auc': array([0.8040882 , 0.79068302, 0.80162335, 0.80414873, 0.8226545 ]),
 'test_precision': array([0.83333333, 0.76      , 0.57894737, 0.66666667, 0.74193548]),
 'train_precision': array([0.81914894, 0.7979798 , 0.83809524, 0.93589744, 0.78861789]),
 'test_recall': array([0.42553191, 0.41304348, 0.23913043, 0.26086957, 0.5       ]),
 'train_recall': array([0.41847826, 0.42702703, 0.47567568, 0.39459459, 0.52432432])}

In [60]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(max_depth=5)

cv = cross_validate(estimator=model, 
                    X=X_treino_, 
                    y=y_treino_, 
                    cv=5, 
                    scoring=['roc_auc', 'precision', 'recall'], 
                    return_train_score=True)
cv

{'fit_time': array([0.14360142, 0.14635944, 0.14279366, 0.14403653, 0.1473012 ]),
 'score_time': array([0.02491975, 0.02582288, 0.02833676, 0.03115988, 0.02826452]),
 'test_roc_auc': array([0.75695581, 0.76912493, 0.65272427, 0.70638239, 0.78511706]),
 'train_roc_auc': array([0.85629846, 0.84828599, 0.86722217, 0.85112756, 0.85253056]),
 'test_precision': array([0.73333333, 0.70967742, 0.55      , 0.66666667, 0.76      ]),
 'train_precision': array([0.83783784, 0.81355932, 0.85576923, 0.84693878, 0.83838384]),
 'test_recall': array([0.46808511, 0.47826087, 0.23913043, 0.39130435, 0.41304348]),
 'train_recall': array([0.50543478, 0.51891892, 0.48108108, 0.44864865, 0.44864865])}

# Exercício

Treine um modelo de sua preferência para a classificação de sobreviventes do Titanic usando `Hold-out` e `Cross-Validation` (5-fold) e avalie usando o `roc_auc`. Qual score você reportaria como o esperado para o modelo quando ele entrar em produção e por que?