In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Scikit-learn imports for modeling and evaluation
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

### Load the Cleaned Dataset
First, let's load the `heart_clean.csv` file and inspect the first few rows to understand the structure and columns. This helps confirm the data is ready for modeling.

In [2]:
# Load cleaned dataset
df = pd.read_csv("heart_clean.csv")

# Show first 5 rows
df.head()

Unnamed: 0,Age,RestingBP,Cholesterol,FastingBS,MaxHR,Oldpeak,Sex_F,Sex_M,ChestPainType_ASY,ChestPainType_ATA,...,RestingECG_LVH,RestingECG_Normal,RestingECG_ST,ExerciseAngina_N,ExerciseAngina_Y,ST_Slope_Down,ST_Slope_Flat,ST_Slope_Up,HeartDisease_0,HeartDisease_1
0,40.0,140.0,289.0,0.0,172.0,0.0,0.0,1.0,0.0,1.0,...,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0
1,49.0,160.0,180.0,0.0,156.0,1.0,1.0,0.0,0.0,0.0,...,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0
2,37.0,130.0,283.0,0.0,98.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0
3,48.0,138.0,214.0,0.0,108.0,1.5,1.0,0.0,1.0,0.0,...,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0
4,54.0,150.0,195.0,0.0,122.0,0.0,0.0,1.0,0.0,0.0,...,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0


### Define Features and Target
We'll now define our input features (X) and the target variable (y). For this analysis, we’ll use:
- `Age` and `Cholesterol` as numerical features
- `Sex_F` as a categorical feature  
Our target variable will be `HeartDisease_1`, which indicates presence of heart disease.

In [3]:
# Define input features (X) and target variable (y)
X = df[["Age", "Cholesterol", "Sex_F"]]
y = df["HeartDisease_1"]

# Preview the selected features and target
X.head(), y.head()

(    Age  Cholesterol  Sex_F
 0  40.0        289.0    0.0
 1  49.0        180.0    1.0
 2  37.0        283.0    0.0
 3  48.0        214.0    1.0
 4  54.0        195.0    0.0,
 0    0.0
 1    1.0
 2    0.0
 3    1.0
 4    0.0
 Name: HeartDisease_1, dtype: float64)

### Train-Test Split
To evaluate our models fairly, we split the dataset into a training set (80%) and a testing set (20%) using `train_test_split` from `sklearn.model_selection`. This helps us ensure that the models don’t just memorize the data but actually learn patterns that generalize.

In [4]:
from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Show the shapes of the resulting datasets
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((733, 3), (184, 3), (733,), (184,))

### Train Logistic Regression Model

We'll start by training a **Logistic Regression** model using the training data. This model is a popular choice for binary classification tasks, such as predicting the presence or absence of heart disease.

We'll fit the model on `X_train` and `y_train`, then make predictions on the test set `X_test`.

In [5]:
from sklearn.linear_model import LogisticRegression

log_model = LogisticRegression()
log_model.fit(X_train, y_train)

# Predict on the test data
y_pred_log = log_model.predict(X_test)

# Show first 5 predictions vs actual labels
pd.DataFrame({
    'Predicted': y_pred_log[:5],
    'Actual': y_test[:5].values
})

Unnamed: 0,Predicted,Actual
0,0.0,0.0
1,1.0,1.0
2,1.0,1.0
3,1.0,1.0
4,1.0,0.0


#### Interpretation

The table above shows the **first five predictions** made by the Logistic Regression model on the test dataset. It compares the predicted labels (`Predicted`) with the actual labels (`Actual`):
- The model **correctly predicted** the outcome for 4 out of 5 samples.
- In row 4, the model predicted `1.0` (presence of heart disease), while the actual value was `0.0` (no heart disease). This is an example of a **false positive**.

This preliminary result suggests the model is performing reasonably well. However, to properly evaluate its effectiveness, we’ll need to analyze more detailed metrics such as **accuracy**, **precision**, **recall**, and **F1-score** in the next steps.

### Evaluate the Logistic Regression Model

To understand how well our model performs, we’ll evaluate it using the following metrics:

- **Accuracy**: Overall correctness of the model.
- **Precision**: Correct positive predictions out of total predicted positives.
- **Recall**: Correct positive predictions out of actual positives.
- **F1-score**: Harmonic mean of precision and recall.
- **Confusion Matrix**: A table that summarizes the true vs. predicted classifications.

These metrics give us a well-rounded view of how well the model detects heart disease.

In [6]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report

# Calculate evaluation metrics
accuracy = accuracy_score(y_test, y_pred_log)
precision = precision_score(y_test, y_pred_log)
recall = recall_score(y_test, y_pred_log)
f1 = f1_score(y_test, y_pred_log)
conf_matrix = confusion_matrix(y_test, y_pred_log)

# Print the results
print("Accuracy:", round(accuracy, 2))
print("Precision:", round(precision, 2))
print("Recall:", round(recall, 2))
print("F1 Score:", round(f1, 2))
print("\nConfusion Matrix:")
print(conf_matrix)

Accuracy: 0.7
Precision: 0.73
Recall: 0.79
F1 Score: 0.76

Confusion Matrix:
[[39 33]
 [23 89]]


#### Evaluation Results for Logistic Regression Model

The logistic regression model was evaluated using accuracy, precision, recall, and F1 score. Here’s what the results tell us:

- **Accuracy (0.70)**: The model correctly predicted 70% of the test data.
- **Precision (0.73)**: When the model predicted that a patient has heart disease, it was correct 73% of the time.
- **Recall (0.79)**: Out of all patients who actually had heart disease, the model correctly identified 79% of them.
- **F1 Score (0.76)**: This is the harmonic mean of precision and recall, providing a balanced performance measure.

#### Confusion Matrix:
[[39 33]
[23 89]]

- **True Negatives (39)**: Correctly predicted as not having heart disease.
- **False Positives (33)**: Incorrectly predicted as having heart disease.
- **False Negatives (23)**: Missed cases where heart disease was actually present.
- **True Positives (89)**: Correctly identified heart disease cases.

#### Conclusion:
The model is effective at identifying most heart disease cases (high recall), making it useful for early detection. However, its moderate precision means some healthy individuals may be flagged incorrectly. In healthcare, this trade-off can be acceptable, as it's better to catch potential risks early and follow up with proper medical evaluation.

### Train K-Nearest Neighbors (KNN) Model

Now we train a K-Nearest Neighbors (KNN) classifier. KNN is a non-parametric, instance-based learning algorithm that classifies new data points based on the majority label among their nearest neighbors. It's especially effective for datasets that are not linearly separable.

In [7]:
from sklearn.neighbors import KNeighborsClassifier

# Initialize KNN with k=5
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Predict on the test data
y_pred_knn = knn.predict(X_test)

# Show first 5 predictions vs actual labels
pd.DataFrame({
    'Predicted': y_pred_knn[:5],
    'Actual': y_test[:5].values
})

Unnamed: 0,Predicted,Actual
0,0.0,0.0
1,0.0,1.0
2,1.0,1.0
3,1.0,1.0
4,0.0,0.0


#### Interpretation

The table above displays the **first five predictions** made by the K-Nearest Neighbors (KNN) model on the test dataset, comparing the predicted labels (`Predicted`) with the actual labels (`Actual`):
- The model **correctly predicted** 4 out of 5 samples.
- In row 1, the model predicted `0.0` (no heart disease), but the actual label was `1.0` (presence of heart disease) which is a **false negative**.

Overall, these initial predictions show promising performance. But, we'll need to assess the model’s full effectiveness using evaluation metrics like **accuracy**, **precision**, **recall**, and **F1-score**.

### Evaluate the KNN Model

We evaluate the KNN model using accuracy, precision, recall, F1-score, and the confusion matrix to understand its performance on the test set.

In [8]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Predict on test data
y_pred_knn = knn.predict(X_test)

# Evaluation metrics
accuracy_knn = accuracy_score(y_test, y_pred_knn)
precision_knn = precision_score(y_test, y_pred_knn)
recall_knn = recall_score(y_test, y_pred_knn)
f1_knn = f1_score(y_test, y_pred_knn)
conf_matrix_knn = confusion_matrix(y_test, y_pred_knn)

# Print results
print("KNN Accuracy:", round(accuracy_knn, 2))
print("KNN Precision:", round(precision_knn, 2))
print("KNN Recall:", round(recall_knn, 2))
print("KNN F1 Score:", round(f1_knn, 2))
print("KNN Confusion Matrix:\n", conf_matrix_knn)

KNN Accuracy: 0.65
KNN Precision: 0.69
KNN Recall: 0.75
KNN F1 Score: 0.72
KNN Confusion Matrix:
 [[35 37]
 [28 84]]


#### Evaluation Results for K-Nearest Neighbors (KNN) Model

The KNN model was evaluated using accuracy, precision, recall, and F1 score. Here's what the results tell us:

- **Accuracy (0.65)**: The model correctly predicted ~65% of the test data.
- **Precision (0.69)**: When the model predicted that a patient has heart disease, it was correct ~69% of the time.
- **Recall (0.75)**: Out of all patients who actually had heart disease, the model correctly identified 75% of them.
- **F1 Score (0.72)**: A balanced measure combining precision and recall.

#### Confusion Matrix:
[[35 37]
[28 84]]

- **True Negatives (35)**: Correctly predicted as not having heart disease.
- **False Positives (37)**: Incorrectly predicted as having heart disease.
- **False Negatives (28)**: Missed cases where heart disease was actually present.
- **True Positives (84)**: Correctly identified heart disease cases.

#### Conclusion:

The KNN model shows reasonably good recall and F1 score, indicating it can detect many true heart disease cases. However, the number of false positives (37) is relatively high, which could lead to unnecessary concern or follow-up tests. In healthcare, this trade-off might still be acceptable to avoid missing critical diagnoses.


### Train Naive Bayes Model
Here, we train a Naive Bayes model on the same training data used earlier.

In [9]:
from sklearn.naive_bayes import GaussianNB

# Initialize the Naive Bayes model
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Predict on test data
y_pred_nb = nb_model.predict(X_test)

# Display first 5 predictions vs actual
pd.DataFrame({
    'Predicted': y_pred_nb[:5],
    'Actual': y_test[:5].values
})

Unnamed: 0,Predicted,Actual
0,0.0,0.0
1,0.0,1.0
2,1.0,1.0
3,1.0,1.0
4,1.0,0.0


#### Interpretation

The table above displays the **first five predictions** made by the Naive Bayes model:
- The model correctly predicted 3 out of 5 samples.
- Row 1 shows a **false negative** (predicted `0.0`, actual `1.0`), and
- Row 4 shows a **false positive** (predicted `1.0`, actual `0.0`).

These early results indicate that while the model is capturing most of the patterns correctly, it still makes some classification errors. We'll assess its full performance in the next step using evaluation metrics.

### Evaluate the Naive Bayes Model

We evaluate the Naive Bayes model using accuracy, precision, recall, F1-score, and the confusion matrix to understand its performance on the test set.

In [10]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Evaluate model performance
nb_accuracy = accuracy_score(y_test, y_pred_nb)
nb_precision = precision_score(y_test, y_pred_nb)
nb_recall = recall_score(y_test, y_pred_nb)
nb_f1 = f1_score(y_test, y_pred_nb)
nb_cm = confusion_matrix(y_test, y_pred_nb)

# Print the results
print("Naive Bayes Accuracy:", round(nb_accuracy, 2))
print("Naive Bayes Precision:", round(nb_precision, 2))
print("Naive Bayes Recall:", round(nb_recall, 2))
print("Naive Bayes F1 Score:", round(nb_f1, 2))
print("Naive Bayes Confusion Matrix:\n", nb_cm)

Naive Bayes Accuracy: 0.7
Naive Bayes Precision: 0.72
Naive Bayes Recall: 0.83
Naive Bayes F1 Score: 0.77
Naive Bayes Confusion Matrix:
 [[35 37]
 [19 93]]


#### Evaluation Results for Naive Bayes Model

The Naive Bayes model was evaluated using accuracy, precision, recall, and F1 score. Here's what the results tell us:

- **Accuracy (0.70)**: The model correctly predicted 70% of the test data.
- **Precision (0.72)**: When the model predicted that a patient has heart disease, it was correct 72% of the time.
- **Recall (0.83)**: Out of all patients who actually had heart disease, the model correctly identified 83% of them.
- **F1 Score (0.77)**: A balanced measure combining precision and recall.

#### Confusion Matrix:
[[35 37]
[19 93]]

- **True Negatives (35)**: Correctly predicted as not having heart disease.  
- **False Positives (37)**: Incorrectly predicted as having heart disease.  
- **False Negatives (19)**: Missed cases where heart disease was actually present.  
- **True Positives (93)**: Correctly identified heart disease cases.

#### Conclusion:
The Naive Bayes model demonstrates strong recall and F1 score, making it effective at identifying patients with heart disease. Although it still has a notable number of false positives (37), the reduced false negatives (19) make it suitable for healthcare settings where catching true cases is a priority.

### Train Support Vector Machine (SVM) Model
In this section, we train a Support Vector Machine (SVM) model to classify whether a patient has heart disease. SVM is a powerful supervised learning algorithm that attempts to find the optimal hyperplane that separates the classes with the largest margin. We’ll use the same training data as with our previous models to ensure a fair comparison.

In [11]:
from sklearn.svm import SVC

# Initialize and train the SVM model
svm_model = SVC(kernel='linear')  # Using linear kernel for interpretability
svm_model.fit(X_train, y_train)

# Predict on the test data
y_pred_svm = svm_model.predict(X_test)

# Show first 5 predictions vs actual labels
pd.DataFrame({
    'Predicted': y_pred_svm[:5],
    'Actual': y_test[:5].values
})

Unnamed: 0,Predicted,Actual
0,0.0,0.0
1,1.0,1.0
2,1.0,1.0
3,1.0,1.0
4,1.0,0.0


#### Interpretation
The table above shows the **first five predictions** made by the SVM model on the test dataset. It compares the predicted labels (`Predicted`) with the actual labels (`Actual`):
- The model **correctly predicted** the outcome for 4 out of 5 samples.
- In row 4, the model predicted `1.0` (presence of heart disease), while the actual value was `0.0` (no heart disease). This is an example of a **false positive**.

Initial predictions suggest that the SVM model is performing well, but a deeper evaluation with performance metrics is needed to validate its effectiveness.


### Evaluate the SVM Model

We evaluate the SVM model using accuracy, precision, recall, F1-score, and the confusion matrix to understand its performance on the test set.

In [12]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Predict on test set
y_pred_svm = svm_model.predict(X_test)

# Calculate evaluation metrics
accuracy = accuracy_score(y_test, y_pred_svm)
precision = precision_score(y_test, y_pred_svm)
recall = recall_score(y_test, y_pred_svm)
f1 = f1_score(y_test, y_pred_svm)
conf_matrix = confusion_matrix(y_test, y_pred_svm)

# Print results
print("SVM Accuracy:", round(accuracy, 2))
print("SVM Precision:", round(precision, 2))
print("SVM Recall:", round(recall, 2))
print("SVM F1 Score:", round(f1, 2))
print("SVM Confusion Matrix:\n", conf_matrix)

SVM Accuracy: 0.67
SVM Precision: 0.67
SVM Recall: 0.9
SVM F1 Score: 0.77
SVM Confusion Matrix:
 [[ 22  50]
 [ 11 101]]


#### Evaluation Results for Support Vector Machine (SVM) Model

The SVM model was evaluated using accuracy, precision, recall, and F1 score. Here's what the results indicate:

- **Accuracy (0.67)**: The model correctly predicted 67% of the test cases.
- **Precision (0.67)**: When the model predicted the presence of heart disease, it was correct 67% of the time.
- **Recall (0.90)**: Out of all actual heart disease cases, the model successfully identified 90% of them.
- **F1 Score (0.77)**: A strong balance between precision and recall.

#### Confusion Matrix:
[[22 50]
[11 101]]

- **True Negatives (22)**: Correctly predicted as not having heart disease.
- **False Positives (50)**: Incorrectly predicted as having heart disease.
- **False Negatives (11)**: Missed actual cases of heart disease.
- **True Positives (101)**: Correctly identified heart disease cases.

#### Conclusion:
The SVM model demonstrates excellent recall, meaning it’s highly effective at catching most heart disease cases. However, it has a high false positive rate, which could result in overdiagnosis. This may still be acceptable in medical settings where failing to detect illness is riskier than false alarms.


## Final Model Comparison: Naive Bayes vs Support Vector Machine (SVM)

After evaluating all four models, we selected **Naive Bayes** and **SVM** for final comparison based on their strong performance - particularly in **recall** and **F1 score**, which are critical in healthcare where identifying true cases is vital.

### Comparison Table

| Metric      | Naive Bayes | SVM     |
|-------------|-------------|---------|
| Accuracy    | 0.70        | 0.67    |
| Precision   | 0.72        | 0.67    |
| Recall      | 0.83        | 0.90    |
| F1 Score    | 0.77        | 0.77    |

### Key Insights

- **SVM achieved the highest recall (0.90)** - making it more effective at detecting patients with heart disease.
- **Naive Bayes offers higher precision (0.72)** - meaning fewer false positives.
- **Both models share the same F1 score (0.77)** - showing a balanced overall performance.

### Final Takeaway

While both models perform well, **SVM holds a slight advantage** due to its higher recall which is a critical factor in medical diagnosis where missing a positive case can be costly. That said, **Naive Bayes** is still a strong alternative when the goal is to minimize false alarms and maintain simpler model interpretability.

## Evaluation Metrics Explained

To evaluate our models, we used four primary metrics: **Accuracy**, **Precision**, **Recall**, and **F1 Score**.

### 1. Accuracy
The proportion of total correct predictions made by the model.

**Formula:**  
Accuracy = (TP + TN) / (TP + TN + FP + FN)

- **TP (True Positives):** Correctly predicted heart disease cases.  
- **TN (True Negatives):** Correctly predicted non-disease cases.  
- **FP (False Positives):** Incorrectly predicted heart disease (but actually healthy).  
- **FN (False Negatives):** Missed actual heart disease cases.  

---

### 2. Precision
Out of all the predicted positive cases, how many were actually positive.

**Formula:**  
Precision = TP / (TP + FP)

- High precision means fewer false alarms.

---

### 3. Recall (Sensitivity)
Out of all actual positive cases, how many the model was able to catch.

**Formula:**  
Recall = TP / (TP + FN)

- High recall means fewer missed disease cases which is critical in healthcare.

---

### 4. F1 Score
Harmonic mean of precision and recall. It balances both metrics.

**Formula:**  
F1 Score = 2 × (Precision × Recall) / (Precision + Recall)

- Useful when there's an uneven class distribution or when both false positives and false negatives matter.

---

These metrics help us evaluate how well our models perform and which one is more reliable for **heart disease detection**.