# Pre-Project Assignment

### Goals

- To create at least one function to automate modeling processes and getting metrics
- Understand the model object and how to retrieve information from a model
- (Optional) Scripting!

### Imports & Sandbox Data

You can definitely add imports as you go along!

For testing your function, we'll use the [Banknote Authentication Dataset](https://archive.ics.uci.edu/ml/datasets/banknote+authentication). In this Binary Classification dataset, 0 is an **Authentic** banknote, 1 is a **Forged** banknote. The four features are Skewness, Variance, Kurtosis and Entropy.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pickle # this is for scripting

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import plot_confusion_matrix, classification_report

In [None]:
df = pd.read_csv('data_banknote_authentication.csv')
df.columns = ["Variance","Skewness", "Kurtosis", "Entropy", "Class"]

X = df.drop('Class', axis=1)
y = df['Class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20) 

In [None]:
# feel free to inspect your data here!



## Fitting and Evaluating a Model

Let's start with a Decision Tree. We're going to fit a Decision Tree with default parameters and output:
- Confusion Matrix: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_confusion_matrix.html
- Classification Report: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html

For **both** the train set and the test set.




In [None]:
# instantiation
dtree = DecisionTreeClassifier()

# fitting
dtree.fit(X_train, y_train)

# predictions
y_hat_train = dtree.predict(X_train)
y_hat_test = dtree.predict(X_test)

In [None]:
print(classification_report(y_train, y_hat_train))
print(classification_report(y_test, y_hat_test))

In [None]:
fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(18, 6))

plot_confusion_matrix(dtree, X_train, y_train, ax=ax0)
plot_confusion_matrix(dtree, X_test, y_test, ax=ax1)

ax0.title.set_text('Train Confusion Matrix')
ax1.title.set_text('Test Confusion Matrix')

## Replicate with a Random Forest

Repeat the same steps with a random forest. Start to get a sense of how your function will look.

## Function Time!!

Now create a function with your own inputs to be able to replicate the process with any instantiated model. You can change up the arguments of the function as you see fit, and make sure you're returning the model object if it's being fit within your function. 

You can even play around with what information your function gets for you! Perhaps the amount of time it takes for your model to train? 

In [None]:
def run_model(model, X_train, y_train, X_test, y_test):
    
    return model # return the model object!!!

In [None]:
# import and instantiate a different sklearn model to test your function!
# after instantiating a model, the run_model function should run and output whatever you want it to output

test_model = None

run_model(test_model, X_train, y_train, X_test, y_test)

### Why `return` model?

In short, it allows us to access the model's attributes, such as feature importances, via a variable.

In [None]:
dtree2 = DecisionTreeClassifier(max_depth=10)
dtree2.fit(X_train, y_train)

In [None]:
dtree2.feature_importances_ # for example, feature importances is an attribute of a fitted model

In [None]:
fi = sorted(list(zip(dtree2.feature_importances_, X_train.columns)))
fi = pd.DataFrame(fi, columns=['impt', 'name'])
fi

In [None]:
plt.barh(fi.name, fi.impt)
plt.title('Feature Importances');

If you were to do:

```variable = run_model(tree, X_train, y_train, X_test, y_test)```

You'd be able to access feature importances through: `variable.feature_importances_` as long as you return the model object at the end of your function. **Don't forget to return the model object!!!**

## Scripting

We have talked briefly about scripting in the past! To try it out:

1. Create a Python file ending in `.py`. For example, `functions.py`
2. Copy your import statements into the Python file, as well as your function
3. To test if your script works, do: `from functions import *` and you should be able to use your function in any notebook!