# Model Training

In this notebook, we will ask you a series of questions regarding model selection. Based on your responses, we will ask you to create the ML models that you've chosen. 

The bonus step is completely optional, but if you provide a sufficient third machine learning model in this project, we will add `1000` points to your Kahoot leaderboard score.

**Note**: Use the dataset that you've created in your previous data transformation step (not the original model).

## Questions
Is this a classification or regression task?  

This is a classification task because we were asked to predict if a transaction is fraudulent or not.

Are you predicting for multiple classes or binary classes?  

Since there is only two possible outcomes (isFraud = 1 or isFraud= 0) then I would say it is for binary classes

Given these observations, which 2 (or possibly 3) machine learning models will you choose?  

Since it is classification, I would use **Logistic Regression**

## First Model - Logistic Regression

### 1) Create a train-test split

In [None]:
from sklearn.model_selection import train_test_split

X = resampled_df.drop("isFraud", axis=1)
y = resampled_df["isFraud"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### 2) Search for best hyperparameters
Use tools like GridSearchCV, RandomizedSearchCV, or model-specific tuning functions to find the best hyperparameters for your first model.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

param_grid = {
    "C": [0.01, 0.1, 1, 10],
    "penalty": ["l2"],
    "solver": ["lbfgs", "liblinear"]
}

grid_lr = GridSearchCV(LogisticRegression(max_iter=1000), param_grid, cv=5)
grid_lr.fit(X_train, y_train)


### 3) Train your model
Select the model with best hyperparameters and generate predictions on your test set. Evaluate your models accuracy, precision, recall, and sensitivity.  

from sklearn.metrics import classification_report, confusion_matrix

best_lr = grid_lr.best_estimator_
y_pred_lr = best_lr.predict(X_test)


classification_report(y_test, y_pred_lr)


In [None]:
from sklearn.metrics import classification_report, confusion_matrix

best_lr = grid_lr.best_estimator_
y_pred_lr = best_lr.predict(X_test)


## Second Model

Create a second machine learning object and rerun steps (2) & (3) on this model. Compare accuracy metrics between these two models. Which handles the class imbalance more effectively?

Create as many code-blocks as needed.

## Second Model - Random Forest

In [None]:
from sklearn.ensemble import RandomForestClassifier

param_grid_rf = {
    "n_estimators": [100, 200],
    "max_depth": [None, 10, 20],
    "min_samples_split": [2, 5],
    "min_samples_leaf": [1, 2]
}

grid_rf = GridSearchCV(RandomForestClassifier(random_state=42), param_grid_rf, cv=3, scoring="recall")
grid_rf.fit(X_train, y_train)


In [None]:
best_rf = grid_rf.best_estimator_
y_pred_rf = best_rf.predict(X_test)


classification_report(y_test, y_pred_rf)

### (Bonus/Optional) Third Model

Create a third machine learning model and rerun steps (2) & (3) on this model. Which model has the best predictive capabilities? 

Create as many code-blocks as needed.