# Classificador de draft da NBA
O objetivo do modelo é, a partir dos dados das temporadas 1996-97 a 2021-22, classificar corretamente se o jogador é *undrafted*, ou seja, não foi draftado.

Passo a passo:
* Escolha do problema ✔
* Separação dos dados em treino, validação e teste ✔
* Selecionar algoritmos para resolver o problema ✔
* Adicionar MLFlow no treinamento dos modelos ✔
* Executar uma ferramenta de seleção de hiper-parâmetros
* Realizar diagnóstico do melhor modelo e melhorá-lo a partir disso

## Dependências

In [None]:
!pip install mlflow

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import f1_score, roc_auc_score, accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
import mlflow
import mlflow.sklearn
from urllib.parse import urlparse

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Carregando o dataset

In [6]:
# Path Tales
path = '/content/drive/MyDrive/2022.1/TA GDI/projeto1/data/classification.csv'

In [7]:
dataset = pd.read_csv(path)

In [8]:
dataset.head()

Unnamed: 0,player_name,team_abbreviation,age,player_height,player_weight,college,country,draft_year,draft_round,draft_number,...,oreb_pct,dreb_pct,usg_pct,ts_pct,ast_pct,season,undrafted,season_start_year,season_end_year,gp_pct
0,2071,16,0.153846,0.75,0.485714,65,72,0.568966,0.125,0.175758,...,0.127,0.182,0.142,0.536,0.052,0,0.0,0.0,0.0,0.865854
1,1475,13,0.461538,0.678571,0.485714,117,72,,,,...,0.016,0.115,0.151,0.535,0.099,0,1.0,0.0,0.0,0.865854
2,1466,2,0.423077,0.714286,0.533333,227,72,,,,...,0.083,0.152,0.167,0.542,0.101,0,1.0,0.0,0.0,0.902439
3,1465,9,0.153846,0.642857,0.485714,189,72,0.568966,0.125,0.151515,...,0.109,0.118,0.233,0.482,0.114,0,0.0,0.0,0.0,0.512195
4,1464,35,0.153846,0.535714,0.438095,249,72,0.551724,0.25,0.30303,...,0.087,0.045,0.135,0.47,0.125,0,0.0,0.0,0.0,0.109756


In [9]:
dataset.groupby(['undrafted'])['player_name'].count()

undrafted
0.0    9629
1.0    1875
Name: player_name, dtype: int64

Como podemos observar, nosso dataset é desbalanceado com relação a categoria 'undrafted'.

## Separação dos conjuntos de treino, validação e teste

In [10]:
RANDOM_STATE = 42

In [11]:
dataset.columns

Index(['player_name', 'team_abbreviation', 'age', 'player_height',
       'player_weight', 'college', 'country', 'draft_year', 'draft_round',
       'draft_number', 'gp', 'pts', 'reb', 'ast', 'net_rating', 'oreb_pct',
       'dreb_pct', 'usg_pct', 'ts_pct', 'ast_pct', 'season', 'undrafted',
       'season_start_year', 'season_end_year', 'gp_pct'],
      dtype='object')

Escolhemos as colunas de altura, peso, faculdade, país, jogos realizados e métricas de desempenho (pts, reb, ast...) para compor o conjunto de features e a coluna 'undrafted' é nossa label

In [12]:
X = dataset[['player_height', 'player_weight', 'college', 'country', 
             'gp', 'pts', 'reb', 'ast', 'net_rating', 'oreb_pct',
             'dreb_pct', 'usg_pct', 'ts_pct', 'ast_pct']]
y = dataset['undrafted']
X.shape, y.shape

((11504, 14), (11504,))

Designamos 70% dos dados para treino, 20% para teste e 10% para validação

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=RANDOM_STATE)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.125, random_state=RANDOM_STATE)

X_train.shape, X_val.shape, X_test.shape

((8052, 14), (1151, 14), (2301, 14))

## Seleção de algoritmos
Selecionamos os seguintes para resolução do problema de classificação:
* MLP
* RandomForest
* XGBoost
* LogisticRegression

Ademais, como notamos que o dataset é bem desbalanceado, utilizaremos o f1 score como métrica principal na validação, além de AUROC e accuracy.

In [14]:
def eval_metrics(y_true, y_pred, y_proba):
  f1 = f1_score(y_true, y_pred, average='weighted')
  acc = accuracy_score(y_true, y_pred)
  auroc = roc_auc_score(y_true, y_proba[:, 1])
  return acc, auroc, f1

### MLPs
No algoritmo MLP, vamos variar os parâmetros `hidden_layer_sizes` (número de neurônios nas camadas escondidas), `activation` (função de ativação dos neurônios) e `solver` que otimiza a atualização dos pesos.

In [None]:
EPOCHS=500

#### Tentativa 1

In [None]:
hidden_layer_sizes=(100,)
activation='relu'
solver='adam'

In [None]:
with mlflow.start_run():
  mlp = MLPClassifier(random_state=RANDOM_STATE, max_iter=EPOCHS, hidden_layer_sizes=hidden_layer_sizes, activation=activation, solver=solver)
  mlp.fit(X_train, y_train)

  y_pred = mlp.predict(X_val)
  y_proba = mlp.predict_proba(X_val)

  (acc, auroc, f1) = eval_metrics(y_val, y_pred, y_proba)

  mlflow.log_param("hidden_layer_sizes", hidden_layer_sizes)
  mlflow.log_param("activation", activation)
  mlflow.log_param("solver", solver)
  mlflow.log_metric("accuracy", acc)
  mlflow.log_metric("auroc", auroc)
  mlflow.log_metric("f1", f1)

  tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

  # Model registry does not work with file store
  if tracking_url_type_store != "file":
    # Register the model
    # There are other ways to use the Model Registry, which depends on the use case,
    # please refer to the doc for more information:
    # https://mlflow.org/docs/latest/model-registry.html#api-workflow
    mlflow.sklearn.log_model(mlp, "model", registered_model_name="MLP_NBA_Undrafted")
  else:
    mlflow.sklearn.log_model(mlp, "model")

In [None]:
!less /content/mlruns/0/b5f8ad22854f4b568521073bda44aa34/params/activation

relu
[K[?1l>[2J[?47l8

In [None]:
!less /content/mlruns/0/b5f8ad22854f4b568521073bda44aa34/params/hidden_layer_sizes

(100,)
[K[?1l>[2J[?47l8

In [None]:
!less /content/mlruns/0/b5f8ad22854f4b568521073bda44aa34/params/solver

adam
[K[?1l>[2J[?47l8

In [None]:
!less /content/mlruns/0/b5f8ad22854f4b568521073bda44aa34/metrics/auroc

1662078783968 0.7446037229928325 0
[K[?1l>[2J[?47l8

In [None]:
!less /content/mlruns/0/b5f8ad22854f4b568521073bda44aa34/metrics/f1

1662078783970 0.754230082735617 0
[K[?1l>[2J[?47l8

In [None]:
!less /content/mlruns/0/b5f8ad22854f4b568521073bda44aa34/metrics/accuracy

1662078783967 0.8201563857515204 0
[K[?1l>[2J[?47l8

#### Tentativa 2


```
hidden_layers = (10,10)
```



In [None]:
hidden_layer_sizes=(10, 10)
activation='relu'
solver='adam'

In [None]:
with mlflow.start_run():
  mlp = MLPClassifier(random_state=RANDOM_STATE, max_iter=EPOCHS, hidden_layer_sizes=hidden_layer_sizes, activation=activation, solver=solver)
  mlp.fit(X_train, y_train)

  y_pred = mlp.predict(X_val)
  y_proba = mlp.predict_proba(X_val)

  (acc, auroc, f1) = eval_metrics(y_val, y_pred, y_proba)

  mlflow.log_param("hidden_layer_sizes", hidden_layer_sizes)
  mlflow.log_param("activation", activation)
  mlflow.log_param("solver", solver)
  mlflow.log_metric("accuracy", acc)
  mlflow.log_metric("auroc", auroc)
  mlflow.log_metric("f1", f1)

  tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

  # Model registry does not work with file store
  if tracking_url_type_store != "file":
    # Register the model
    # There are other ways to use the Model Registry, which depends on the use case,
    # please refer to the doc for more information:
    # https://mlflow.org/docs/latest/model-registry.html#api-workflow
    mlflow.sklearn.log_model(mlp, "model", registered_model_name="MLP_NBA_Undrafted")
  else:
    mlflow.sklearn.log_model(mlp, "model")

In [None]:
!cat /content/mlruns/0/dac45b74013a437bae0dfae8d95fe6f1/params/activation

relu

In [None]:
!cat /content/mlruns/0/dac45b74013a437bae0dfae8d95fe6f1/params/hidden_layer_sizes

(10, 10)

In [None]:
!cat /content/mlruns/0/dac45b74013a437bae0dfae8d95fe6f1/params/solver

adam

In [None]:
!cat /content/mlruns/0/dac45b74013a437bae0dfae8d95fe6f1/metrics/auroc

1662078957608 0.7428969215696385 0


In [None]:
!cat /content/mlruns/0/dac45b74013a437bae0dfae8d95fe6f1/metrics/f1

1662078957610 0.7431908403435481 0


In [None]:
!cat /content/mlruns/0/dac45b74013a437bae0dfae8d95fe6f1/metrics/accuracy

1662078957607 0.8218940052128584 0


#### Tentativa 3

```
hidden_layers = (10,10)
activation = 'logistic'
solver = 'sgd'
```




In [None]:
hidden_layer_sizes=(10, 10)
activation='logistic'
solver='sgd'

In [None]:
with mlflow.start_run():
  mlp = MLPClassifier(random_state=RANDOM_STATE, max_iter=EPOCHS, hidden_layer_sizes=hidden_layer_sizes, activation=activation, solver=solver)
  mlp.fit(X_train, y_train)

  y_pred = mlp.predict(X_val)
  y_proba = mlp.predict_proba(X_val)

  (acc, auroc, f1) = eval_metrics(y_val, y_pred, y_proba)

  mlflow.log_param("hidden_layer_sizes", hidden_layer_sizes)
  mlflow.log_param("activation", activation)
  mlflow.log_param("solver", solver)
  mlflow.log_metric("accuracy", acc)
  mlflow.log_metric("auroc", auroc)
  mlflow.log_metric("f1", f1)

  tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

  # Model registry does not work with file store
  if tracking_url_type_store != "file":
    # Register the model
    # There are other ways to use the Model Registry, which depends on the use case,
    # please refer to the doc for more information:
    # https://mlflow.org/docs/latest/model-registry.html#api-workflow
    mlflow.sklearn.log_model(mlp, "model", registered_model_name="MLP_NBA_Undrafted")
  else:
    mlflow.sklearn.log_model(mlp, "model")

In [None]:
!cat /content/mlruns/0/c3f5c69195704f6f80d3a0b0c326c97d/params/activation

logistic

In [None]:
!cat /content/mlruns/0/c3f5c69195704f6f80d3a0b0c326c97d/params/hidden_layer_sizes

(10, 10)

In [None]:
!cat /content/mlruns/0/c3f5c69195704f6f80d3a0b0c326c97d/params/solver

sgd

In [None]:
!cat /content/mlruns/0/c3f5c69195704f6f80d3a0b0c326c97d/metrics/auroc

1662079054027 0.48951683597174234 0


In [None]:
!cat /content/mlruns/0/c3f5c69195704f6f80d3a0b0c326c97d/metrics/f1

1662079054028 0.7415467133346342 0


In [None]:
!cat /content/mlruns/0/c3f5c69195704f6f80d3a0b0c326c97d/metrics/accuracy

1662079054025 0.8218940052128584 0


### LogisticRegression
No algoritmo LogisticRegression, variaremos os parâmetros `penalty` (especifica a norma da penalidade) e `solver` (especifica o algoritmo de otimização).

#### Tentativa 1


```
penalty = 'l2'
solver = 'liblinear'
```



In [15]:
penalty = 'l2'
solver = 'liblinear'

In [16]:
with mlflow.start_run():
  lr = LogisticRegression(random_state=RANDOM_STATE, penalty=penalty, solver=solver)
  lr.fit(X_train, y_train)

  y_pred = lr.predict(X_val)
  y_proba = lr.predict_proba(X_val)

  (acc, auroc, f1) = eval_metrics(y_val, y_pred, y_proba)

  mlflow.log_param("penalty", penalty)
  mlflow.log_param("solver", solver)
  mlflow.log_metric("accuracy", acc)
  mlflow.log_metric("auroc", auroc)
  mlflow.log_metric("f1", f1)

  tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

  # Model registry does not work with file store
  if tracking_url_type_store != "file":
    # Register the model
    # There are other ways to use the Model Registry, which depends on the use case,
    # please refer to the doc for more information:
    # https://mlflow.org/docs/latest/model-registry.html#api-workflow
    mlflow.sklearn.log_model(lr, "model", registered_model_name="LR_NBA_Undrafted")
  else:
    mlflow.sklearn.log_model(lr, "model")

In [17]:
!cat /content/mlruns/0/8fe01fc13e704af8befc92a1931e42e2/params/penalty

l2

In [19]:
!cat /content/mlruns/0/8fe01fc13e704af8befc92a1931e42e2/params/solver

liblinear

In [20]:
!cat /content/mlruns/0/8fe01fc13e704af8befc92a1931e42e2/metrics/auroc

1662120201332 0.7512246686948899 0


In [21]:
!cat /content/mlruns/0/8fe01fc13e704af8befc92a1931e42e2/metrics/f1

1662120201334 0.7538235059016244 0


In [22]:
!cat /content/mlruns/0/8fe01fc13e704af8befc92a1931e42e2/metrics/accuracy

1662120201331 0.8218940052128584 0


#### Tentativa 2

```
penalty = 'l1'
solver = 'saga'
```



In [27]:
penalty = 'l1'
solver = 'saga'

In [28]:
with mlflow.start_run():
  lr = LogisticRegression(random_state=RANDOM_STATE, penalty=penalty, solver=solver)
  lr.fit(X_train, y_train)

  y_pred = lr.predict(X_val)
  y_proba = lr.predict_proba(X_val)

  (acc, auroc, f1) = eval_metrics(y_val, y_pred, y_proba)

  mlflow.log_param("penalty", penalty)
  mlflow.log_param("solver", solver)
  mlflow.log_metric("accuracy", acc)
  mlflow.log_metric("auroc", auroc)
  mlflow.log_metric("f1", f1)

  tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

  # Model registry does not work with file store
  if tracking_url_type_store != "file":
    # Register the model
    # There are other ways to use the Model Registry, which depends on the use case,
    # please refer to the doc for more information:
    # https://mlflow.org/docs/latest/model-registry.html#api-workflow
    mlflow.sklearn.log_model(lr, "model", registered_model_name="LR_NBA_Undrafted")
  else:
    mlflow.sklearn.log_model(lr, "model")



In [31]:
!cat /content/mlruns/0/0139f9f8834842c7bbb0421e078e6e5b/params/penalty

l1

In [32]:
!cat /content/mlruns/0/0139f9f8834842c7bbb0421e078e6e5b/params/solver

saga

In [33]:
!cat /content/mlruns/0/0139f9f8834842c7bbb0421e078e6e5b/metrics/auroc

1662120389437 0.663234156654463 0


In [34]:
!cat /content/mlruns/0/0139f9f8834842c7bbb0421e078e6e5b/metrics/f1

1662120389438 0.7415467133346342 0


In [35]:
!cat /content/mlruns/0/0139f9f8834842c7bbb0421e078e6e5b/metrics/accuracy

1662120389436 0.8218940052128584 0


#### Tentativa 3

```
penalty = 'elasticnet'
solver = 'saga'
```



In [36]:
penalty = 'elasticnet'
solver = 'saga'

In [37]:
with mlflow.start_run():
  lr = LogisticRegression(random_state=RANDOM_STATE, penalty=penalty, solver=solver, l1_ratio=0.5, max_iter=1000)
  lr.fit(X_train, y_train)

  y_pred = lr.predict(X_val)
  y_proba = lr.predict_proba(X_val)

  (acc, auroc, f1) = eval_metrics(y_val, y_pred, y_proba)

  mlflow.log_param("penalty", penalty)
  mlflow.log_param("solver", solver)
  mlflow.log_metric("accuracy", acc)
  mlflow.log_metric("auroc", auroc)
  mlflow.log_metric("f1", f1)

  tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

  # Model registry does not work with file store
  if tracking_url_type_store != "file":
    # Register the model
    # There are other ways to use the Model Registry, which depends on the use case,
    # please refer to the doc for more information:
    # https://mlflow.org/docs/latest/model-registry.html#api-workflow
    mlflow.sklearn.log_model(lr, "model", registered_model_name="LR_NBA_Undrafted")
  else:
    mlflow.sklearn.log_model(lr, "model")



In [38]:
!cat /content/mlruns/0/253e4d2f20f24254a7fc2d693159bfc1/params/penalty

elasticnet

In [39]:
!cat /content/mlruns/0/253e4d2f20f24254a7fc2d693159bfc1/params/solver

saga

In [40]:
!cat /content/mlruns/0/253e4d2f20f24254a7fc2d693159bfc1/metrics/auroc

1662120583300 0.709585933068633 0


In [41]:
!cat /content/mlruns/0/253e4d2f20f24254a7fc2d693159bfc1/metrics/f1

1662120583302 0.7415467133346342 0


In [42]:
!cat /content/mlruns/0/253e4d2f20f24254a7fc2d693159bfc1/metrics/accuracy

1662120583299 0.8218940052128584 0


### RandomForestClassifier
No algoritmo RandomForest, decidimos acompanhar a variação dos parâmetros `n_estimators` (número de árvores na floresta), `criterion` (função que mede a qualidade da separação) e `max_depth` (altera a profundidade máxima das árvores).

#### Tentativa 1

```
n_estimators = 100
criterion = 'gini'
max_depth = None
```



In [43]:
n_estimators = 100
criterion = 'gini'
max_depth = None

In [44]:
with mlflow.start_run():
  rf = RandomForestClassifier(random_state=RANDOM_STATE, n_estimators=n_estimators, criterion=criterion, max_depth=max_depth)
  rf.fit(X_train, y_train)

  y_pred = rf.predict(X_val)
  y_proba = rf.predict_proba(X_val)

  (acc, auroc, f1) = eval_metrics(y_val, y_pred, y_proba)

  mlflow.log_param("n_estimators", n_estimators)
  mlflow.log_param("criterion", criterion)
  mlflow.log_param("max_depth", max_depth)
  mlflow.log_metric("accuracy", acc)
  mlflow.log_metric("auroc", auroc)
  mlflow.log_metric("f1", f1)

  tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

  # Model registry does not work with file store
  if tracking_url_type_store != "file":
    # Register the model
    # There are other ways to use the Model Registry, which depends on the use case,
    # please refer to the doc for more information:
    # https://mlflow.org/docs/latest/model-registry.html#api-workflow
    mlflow.sklearn.log_model(rf, "model", registered_model_name="RF_NBA_Undrafted")
  else:
    mlflow.sklearn.log_model(rf, "model")

In [45]:
!cat /content/mlruns/0/c3e9d3f5601c4cdabc0a4f0ba49e2541/params/n_estimators

100

In [46]:
!cat /content/mlruns/0/c3e9d3f5601c4cdabc0a4f0ba49e2541/params/criterion

gini

In [47]:
!cat /content/mlruns/0/c3e9d3f5601c4cdabc0a4f0ba49e2541/params/max_depth

None

In [48]:
!cat /content/mlruns/0/c3e9d3f5601c4cdabc0a4f0ba49e2541/metrics/auroc

1662121124478 0.8366240396019182 0


In [49]:
!cat /content/mlruns/0/c3e9d3f5601c4cdabc0a4f0ba49e2541/metrics/f1

1662121124479 0.7826776956198876 0


In [50]:
!cat /content/mlruns/0/c3e9d3f5601c4cdabc0a4f0ba49e2541/metrics/accuracy

1662121124478 0.8340573414422241 0


#### Tentativa 2

```
n_estimators = 200
criterion = 'gini'
max_depth = None
```



In [51]:
n_estimators = 200
criterion = 'gini'
max_depth = None

In [53]:
with mlflow.start_run():
  rf = RandomForestClassifier(random_state=RANDOM_STATE, n_estimators=n_estimators, criterion=criterion, max_depth=max_depth)
  rf.fit(X_train, y_train)

  y_pred = rf.predict(X_val)
  y_proba = rf.predict_proba(X_val)

  (acc, auroc, f1) = eval_metrics(y_val, y_pred, y_proba)

  mlflow.log_param("n_estimators", n_estimators)
  mlflow.log_param("criterion", criterion)
  mlflow.log_param("max_depth", max_depth)
  mlflow.log_metric("accuracy", acc)
  mlflow.log_metric("auroc", auroc)
  mlflow.log_metric("f1", f1)

  tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

  # Model registry does not work with file store
  if tracking_url_type_store != "file":
    # Register the model
    # There are other ways to use the Model Registry, which depends on the use case,
    # please refer to the doc for more information:
    # https://mlflow.org/docs/latest/model-registry.html#api-workflow
    mlflow.sklearn.log_model(rf, "model", registered_model_name="RF_NBA_Undrafted")
  else:
    mlflow.sklearn.log_model(rf, "model")

In [54]:
!cat /content/mlruns/0/37399a681ba84856928153aacde4623d/params/n_estimators

200

In [55]:
!cat /content/mlruns/0/37399a681ba84856928153aacde4623d/params/criterion

gini

In [56]:
!cat /content/mlruns/0/37399a681ba84856928153aacde4623d/params/max_depth

None

In [57]:
!cat /content/mlruns/0/37399a681ba84856928153aacde4623d/metrics/auroc

1662121391197 0.8431702160573403 0


In [58]:
!cat /content/mlruns/0/37399a681ba84856928153aacde4623d/metrics/f1

1662121391198 0.7866288627891993 0


In [59]:
!cat /content/mlruns/0/37399a681ba84856928153aacde4623d/metrics/accuracy

1662121391197 0.8366637706342311 0


#### Tentativa 3

```
n_estimators = 200
criterion = 'entropy'
max_depth = 15
```



In [60]:
n_estimators = 200
criterion = 'entropy'
max_depth = 15

In [61]:
with mlflow.start_run():
  rf = RandomForestClassifier(random_state=RANDOM_STATE, n_estimators=n_estimators, criterion=criterion, max_depth=max_depth)
  rf.fit(X_train, y_train)

  y_pred = rf.predict(X_val)
  y_proba = rf.predict_proba(X_val)

  (acc, auroc, f1) = eval_metrics(y_val, y_pred, y_proba)

  mlflow.log_param("n_estimators", n_estimators)
  mlflow.log_param("criterion", criterion)
  mlflow.log_param("max_depth", max_depth)
  mlflow.log_metric("accuracy", acc)
  mlflow.log_metric("auroc", auroc)
  mlflow.log_metric("f1", f1)

  tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

  # Model registry does not work with file store
  if tracking_url_type_store != "file":
    # Register the model
    # There are other ways to use the Model Registry, which depends on the use case,
    # please refer to the doc for more information:
    # https://mlflow.org/docs/latest/model-registry.html#api-workflow
    mlflow.sklearn.log_model(rf, "model", registered_model_name="RF_NBA_Undrafted")
  else:
    mlflow.sklearn.log_model(rf, "model")

In [62]:
!cat /content/mlruns/0/37399a681ba84856928153aacde4623d/params/n_estimators

200

In [63]:
!cat /content/mlruns/0/7bf1d8758b4d46f7b8497a9243c057f7/params/criterion

entropy

In [64]:
!cat /content/mlruns/0/7bf1d8758b4d46f7b8497a9243c057f7/params/max_depth

15

In [65]:
!cat /content/mlruns/0/7bf1d8758b4d46f7b8497a9243c057f7/metrics/auroc

1662121542282 0.843335224049915 0


In [66]:
!cat /content/mlruns/0/7bf1d8758b4d46f7b8497a9243c057f7/metrics/f1

1662121542283 0.7746727489160805 0


In [67]:
!cat /content/mlruns/0/7bf1d8758b4d46f7b8497a9243c057f7/metrics/accuracy

1662121542281 0.8305821025195482 0


### XGBoost
Para o algoritmo de classificação XGBoost, experimentaremos a alternação dos parâmetros `n_estimators` (equivalente ao da RandomForest), `learning_rate` (diminuição do tamanho dos passos, utilizado para prevenção de overfitting) e `max_depth` (equivalente ao RF).

#### Tentativa 1

```
n_estimators = 100
learning_rate = 1e-1
max_depth = 6
```



In [68]:
n_estimators = 100
learning_rate = 1e-1
max_depth = 6

In [69]:
with mlflow.start_run():
  xgb = XGBClassifier(random_state=RANDOM_STATE, n_estimators=n_estimators, learning_rate=learning_rate, max_depth=max_depth)
  xgb.fit(X_train, y_train)

  y_pred = xgb.predict(X_val)
  y_proba = xgb.predict_proba(X_val)

  (acc, auroc, f1) = eval_metrics(y_val, y_pred, y_proba)

  mlflow.log_param("n_estimators", n_estimators)
  mlflow.log_param("lr", learning_rate)
  mlflow.log_param("max_depth", max_depth)
  mlflow.log_metric("accuracy", acc)
  mlflow.log_metric("auroc", auroc)
  mlflow.log_metric("f1", f1)

  tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

  # Model registry does not work with file store
  if tracking_url_type_store != "file":
    # Register the model
    # There are other ways to use the Model Registry, which depends on the use case,
    # please refer to the doc for more information:
    # https://mlflow.org/docs/latest/model-registry.html#api-workflow
    mlflow.sklearn.log_model(xgb, "model", registered_model_name="XGB_NBA_Undrafted")
  else:
    mlflow.sklearn.log_model(xgb, "model")

In [70]:
!cat /content/mlruns/0/613bc45c18564a3ab87fa0188ebf3140/params/n_estimators

100

In [71]:
!cat /content/mlruns/0/613bc45c18564a3ab87fa0188ebf3140/params/lr

0.1

In [72]:
!cat /content/mlruns/0/613bc45c18564a3ab87fa0188ebf3140/params/max_depth

6

In [73]:
!cat /content/mlruns/0/613bc45c18564a3ab87fa0188ebf3140/metrics/auroc

1662122624530 0.849894291754757 0


In [74]:
!cat /content/mlruns/0/613bc45c18564a3ab87fa0188ebf3140/metrics/f1

1662122624531 0.8148418621868618 0


In [75]:
!cat /content/mlruns/0/613bc45c18564a3ab87fa0188ebf3140/metrics/accuracy

1662122624530 0.8488271068635969 0


#### Tentativa 2

```
n_estimators = 200
learning_rate = 1e-2
max_depth = 6
```



In [76]:
n_estimators = 200
learning_rate = 1e-2
max_depth = 6

In [77]:
with mlflow.start_run():
  xgb = XGBClassifier(random_state=RANDOM_STATE, n_estimators=n_estimators, learning_rate=learning_rate, max_depth=max_depth)
  xgb.fit(X_train, y_train)

  y_pred = xgb.predict(X_val)
  y_proba = xgb.predict_proba(X_val)

  (acc, auroc, f1) = eval_metrics(y_val, y_pred, y_proba)

  mlflow.log_param("n_estimators", n_estimators)
  mlflow.log_param("lr", learning_rate)
  mlflow.log_param("max_depth", max_depth)
  mlflow.log_metric("accuracy", acc)
  mlflow.log_metric("auroc", auroc)
  mlflow.log_metric("f1", f1)

  tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

  # Model registry does not work with file store
  if tracking_url_type_store != "file":
    # Register the model
    # There are other ways to use the Model Registry, which depends on the use case,
    # please refer to the doc for more information:
    # https://mlflow.org/docs/latest/model-registry.html#api-workflow
    mlflow.sklearn.log_model(xgb, "model", registered_model_name="XGB_NBA_Undrafted")
  else:
    mlflow.sklearn.log_model(xgb, "model")

In [78]:
!cat /content/mlruns/0/77d126395a034941bffff89e73f026d4/params/n_estimators

200

In [79]:
!cat /content/mlruns/0/77d126395a034941bffff89e73f026d4/params/lr

0.01

In [80]:
!cat /content/mlruns/0/77d126395a034941bffff89e73f026d4/params/max_depth

6

In [81]:
!cat /content/mlruns/0/77d126395a034941bffff89e73f026d4/metrics/auroc

1662122750497 0.7895787139689578 0


In [82]:
!cat /content/mlruns/0/77d126395a034941bffff89e73f026d4/metrics/f1

1662122750498 0.7717092944984212 0


In [83]:
!cat /content/mlruns/0/77d126395a034941bffff89e73f026d4/metrics/accuracy

1662122750496 0.8297132927888793 0


#### Tentativa 3

```
n_estimators = 200
learning_rate = 1e-2
max_depth = 15
```



In [84]:
n_estimators = 200
learning_rate = 1e-2
max_depth = 15

In [85]:
with mlflow.start_run():
  xgb = XGBClassifier(random_state=RANDOM_STATE, n_estimators=n_estimators, learning_rate=learning_rate, max_depth=max_depth)
  xgb.fit(X_train, y_train)

  y_pred = xgb.predict(X_val)
  y_proba = xgb.predict_proba(X_val)

  (acc, auroc, f1) = eval_metrics(y_val, y_pred, y_proba)

  mlflow.log_param("n_estimators", n_estimators)
  mlflow.log_param("lr", learning_rate)
  mlflow.log_param("max_depth", max_depth)
  mlflow.log_metric("accuracy", acc)
  mlflow.log_metric("auroc", auroc)
  mlflow.log_metric("f1", f1)

  tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

  # Model registry does not work with file store
  if tracking_url_type_store != "file":
    # Register the model
    # There are other ways to use the Model Registry, which depends on the use case,
    # please refer to the doc for more information:
    # https://mlflow.org/docs/latest/model-registry.html#api-workflow
    mlflow.sklearn.log_model(xgb, "model", registered_model_name="XGB_NBA_Undrafted")
  else:
    mlflow.sklearn.log_model(xgb, "model")

In [87]:
!cat /content/mlruns/0/5dcb815a2ad94bdfbd0ce1488408cd19/params/n_estimators

200

In [88]:
!cat /content/mlruns/0/5dcb815a2ad94bdfbd0ce1488408cd19/params/lr

0.01

In [86]:
!cat /content/mlruns/0/5dcb815a2ad94bdfbd0ce1488408cd19/params/max_depth

15

In [89]:
!cat /content/mlruns/0/5dcb815a2ad94bdfbd0ce1488408cd19/metrics/auroc

1662123052672 0.8274377352653018 0


In [90]:
!cat /content/mlruns/0/5dcb815a2ad94bdfbd0ce1488408cd19/metrics/f1

1662123052673 0.799668596929013 0


In [91]:
!cat /content/mlruns/0/5dcb815a2ad94bdfbd0ce1488408cd19/metrics/accuracy

1662123052671 0.8340573414422241 0
