### _What is ROC curve?_

The ROC curve summarizes the prediction performance of a classification model at all classification thresholds. Particularly, the ROC curve plots the False Positive Rate (FPR) on the X-axis and the True Positive Rate (TPR) on the Y-axis.

$\text{TPR (Sensitivity)} = \frac{TP}{TP + FN}$

$\text{FPR (1 - Specificity)} = \frac{FP}{TN + FP}$

### Generate synthetic dataset

In [1]:
from sklearn.datasets import make_classification
from sklearn.datasets import make_circles
import numpy as np

In [2]:
X,y = make_classification(n_samples=2000, n_features=10, n_classes=2, random_state=0)

### Add noisy features to make the problem

In [3]:
# random_state = np.random.RandomState(0)
# n_samples, n_features = X.shape
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]

NameError: name 'random_state' is not defined

### Data Splitting

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.368, random_state=0)

### Build classification model: _Random Forest_ and _Gaussian Naive Bayes_

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB

### Random Forest

In [None]:
rf = RandomForestClassifier(max_features=5, n_estimators=500)
rf.fit(X_train,y_train)

### Naive Bayes

In [None]:
clf = GaussianNB()
clf.fit(X_train, y_train)

### Prediction probabilities


In [None]:
rf_probs = rf.predict_proba(X_test)
clf_probs = clf.predict_proba(X_test)

In [None]:
rf_probs = rf_probs[:, 1]
clf_probs =clf_probs[:, 1]

### AUC ROC
#### ROC is the receiver operating characteristic AUC ROC is the area under the ROC curve

In [None]:
from sklearn.metrics import roc_auc_score, roc_curve

In [None]:
rf_auc = roc_auc_score(y_test, rf_probs)
clf_auc = roc_auc_score(y_test, clf_probs)

In [None]:
print(f'Random Forest: AUC ROC = {rf_auc}')
print(f'Naive Bayes: AUC ROC = {clf_auc}')

### Calculate ROC Curve

In [None]:
rf_fpr, rf_tpr, _ = roc_curve(y_test, rf_probs)
clf_fpr, clf_tpr, _ = roc_curve(y_test, clf_probs)

### Plot 

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.figure(dpi =150)
plt.plot(rf_fpr, rf_tpr, marker='.', label=f'Random Forest (AUC ROC = {rf_auc})')
plt.plot(clf_fpr, clf_tpr, marker='.', label=f'Naive Bayes (AUC ROC = {clf_auc})')

# Title
plt.title('ROC Plot')
# Axis labels
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
# Show legend
plt.legend() # 
# Show plot
plt.show()