# `AA Workshop 7` â€” Coding Challenge

Complete the tasks below to practice implementing classification modeling from `W7_Classification_Basic.ipynb`.

Guidelines:
- Work in order. Run each cell after editing with Shift+Enter.
- Keep answers short; focus on making things work.
- If a step fails, read the error and fix it.

By the end you will have exercised:
- implementing an SVM classifier
- understanding evaluation metrics such as precision recall curves

## Task 1 - Detecting Forged Banknotes

To practice implementing a support vector machine classification model, we will use a well-known dataset containing data extracted from images that were taken from genuine and forged banknote-like specimens. You can find the raw data as `BankNote_Authentication.csv` in the `data` directory. Familiarize yourself with the data [via this link](https://archive.ics.uci.edu/dataset/267/banknote+authentication). 
- Load and inspect the data. Create a pairplot to do so.
- Train an SVM model (using `LinearSVC`) using all available features. 
- Properly evaluate model performance using accuracy, precision and recall. To do so, a simple two-way split suffices as we are not tuning hyper-parameters.

In [None]:
# your code here




## Task 2 - Understanding Precision-Recall Curves

You got to know several metrics to evaluate classifier performance. While accuracy, precision and recall are usually quite straightforward to interpret, ROC and Precision-Recall curves require a bit more thinking. In the notebook, we calculated the Precision-Recall Curve for our SVM model as follows:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
from sklearn.metrics import PrecisionRecallDisplay, precision_recall_curve

# prepare data
cancer_df = pd.read_csv("../data/breast_cancer.csv", index_col = "id")
X = np.array(cancer_df[['area_mean','concave points_mean']])
Y = cancer_df['diagnosis'].values
norm = StandardScaler()
X_norm = norm.fit_transform(X)

# model
model_SVM = LinearSVC(loss='hinge')
model_SVM.fit(X_norm, Y)


# generate curve
PrecisionRecallDisplay.from_estimator(model_SVM, X_norm, Y, plot_chance_level=True)
plt.show()

**Question**: Why is the chance level precision not at 50% (like for ROC curves)? Does it have to do with the data or the model? First, figure out the reason, then try to re-create the plot but with a 50% chance level precision.

In [None]:
# your code here


