# Training Toy Models for Testing

The following code snippets can be used to train a toy model for a tabular dataset. As long as training and test datasets are defined in a generic way (using Pandas), we should be able to train any of these models and use it with the developer framework interchangeably.

## Assumptions

We assume the a tabular dataset was preprocessed and we have the following dataframes available:

```python
train_df, test_df = train_test_split(df, test_size=0.20)

# This guarantees a 60/20/20 split
train_ds, val_ds = train_test_split(train_df, test_size=0.25)

# For training
x_train = train_ds.drop("Exited", axis=1)
y_train = train_ds.loc[:, "Exited"].astype(int)
x_val = val_ds.drop("Exited", axis=1)
y_val = val_ds.loc[:, "Exited"].astype(int)

# For testing
x_test = test_df.drop("Exited", axis=1)
y_test = test_df.loc[:, "Exited"].astype(int)
```

## XGBoost Classifier

```python
import xgboost as xgb

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_train, y_train), (x_val, y_val)],
    verbose=False,
)
```

## SKLearn Logistic Regression

```python
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(x_train, y_train)
```

## SKLearn Random Forest

```python
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(x_train, y_train)
```

## statsmodels Logit (Logistic Regression)

statsmodels uses a different interface than SKLearn, so we access the trained model instance differently, by using the return value of `model.fit()`:

```python
import statsmodels.api as sm

model = sm.Logit(y_train, x_train)
result = model.fit()
```

The `result` object should be passed to the `vm.init_model()` function:

```python
vm_model = vm.init_model(result)
```

## PyTorch Neural Network

There are multiple ways of training a model with PyTorch. We will use the `torch.nn` module to define a simple neural network:

```python
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class Net(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = torch.sigmoid(self.fc3(x))

        return x
    
net = Net(11, 5, 1)

criterion = nn.BCELoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

epochs = 10000
for epoch in range(epochs):
    optimizer.zero_grad()
    outputs = net(torch.tensor(x_train.values, dtype=torch.float32))
    loss = criterion(outputs, torch.tensor(y_train.values, dtype=torch.float32).unsqueeze(1))
    loss.backward()
    optimizer.step()
    if epoch % 1000 == 0:
        print(f"Epoch: {epoch}, Loss: {loss.item()}")
```

When calling `vm.init_model()`, we need to instance of the nn.Module, i.e. the `net` object instance in our example:

```python
vm_model = vm.init_model(net)
```