<div align="center">

###### Lab 2

# National Tsing Hua University

#### Spring 2025

#### 11320IEEM 513600

#### Deep Learning and Industrial Applications
    
## Lab 2: Predicting Heart Disease with Deep Learning

</div>

### Introduction

In the realm of healthcare, early detection and accurate prediction of diseases play a crucial role in patient care and management. Heart disease remains one of the leading causes of mortality worldwide, making the development of effective diagnostic tools essential. This lab leverages deep learning to predict the presence of heart disease in patients using a subset of 14 key attributes from the Cleveland Heart Disease Database. The objective is to explore and apply deep learning techniques to distinguish between the presence and absence of heart disease based on clinical parameters.

Throughout this lab, you'll engage with the following key activities:
- Use [Pandas](https://pandas.pydata.org) to process the CSV files.
- Use [PyTorch](https://pytorch.org) to build an Artificial Neural Network (ANN) to fit the dataset.
- Evaluate the performance of the trained model to understand its accuracy.

### Attribute Information

1. age: Age of the patient in years
2. sex: (Male/Female)
3. cp: Chest pain type (4 types: low, medium, high, and severe)
4. trestbps: Resting blood pressure
5. chol: Serum cholesterol in mg/dl
6. fbs: Fasting blood sugar > 120 mg/dl
7. restecg: Resting electrocardiographic results (values 0,1,2)
8. thalach: Maximum heart rate achieved
9. exang: Exercise induced angina
10. oldpeak: Oldpeak = ST depression induced by exercise relative to rest
11. slope: The slope of the peak exercise ST segment
12. ca: Number of major vessels (0-3) colored by fluoroscopy
13. thal: 3 = normal; 6 = fixed defect; 7 = reversible defect
14. target: target have disease or not (1=yes, 0=no)

### References
- [UCI Heart Disease Data](https://www.kaggle.com/datasets/redwankarimsony/heart-disease-data) for the dataset we use in this lab.


## A. Checking and Preprocessing

In [None]:
import pandas as pd

df = pd.read_csv('heart_dataset_train_all.csv')
df

In [None]:
df.columns

In [None]:
df.info()

In [None]:
# checking for null values
df.isnull().sum()

In [None]:
df = df.dropna()

In [None]:
df.shape

In [None]:
# Mapping 'sex' descriptions to numbers
sex_description = {
    'Male': 0,
    'Female': 1,
}
df.loc[:, 'sex'] = df['sex'].map(sex_description)

# Mapping 'cp' (chest pain) descriptions to numbers
pain_description = {
    'low': 0,
    'medium': 1,
    'high': 2,
    'severe': 3
}
df.loc[:, 'cp'] = df['cp'].map(pain_description)

df

In [None]:
df.describe()

In [None]:
df.corr()

#### Converting the DataFrame to a NumPy Array

## B. Defining Neural Networks

In PyTorch, we can use **class** to define our custom neural network architectures by subclassing the `nn.Module` class. This gives our neural network all the functionality it needs to work with PyTorch's other utilities and keeps our implementation organized.

- Neural networks are defined by subclassing `nn.Module`.
- The layers of the neural network are initialized in the `__init__` method.
- The forward pass operations on input data are defined in the `forward` method.

It's worth noting that while we only define the forward pass, PyTorch will automatically derive the backward pass for us, which is used during training to update the model's weights."

## C. Training the Neural Network

In [None]:
# Check your GPU status.
!nvidia-smi

lab2 code

In [None]:
import pandas as pd
import numpy as np
import torch
from torch.utils.data import DataLoader, TensorDataset
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import CosineAnnealingLR
from tqdm.auto import tqdm
import matplotlib.pyplot as plt



# 資料預處理
df = pd.read_csv('heart_dataset_train_all.csv').dropna()
df['sex'] = df['sex'].map({'Male': 0, 'Female': 1})
df['cp'] = df['cp'].map({'low': 0, 'medium': 1, 'high': 2, 'severe': 3})
np_data = df.values
np.random.shuffle(np_data)

In [None]:

# 切分資料集
split = int(0.7 * len(np_data))
x_train, y_train = np_data[:split, :13], np_data[:split, 13]
x_val, y_val = np_data[split:, :13], np_data[split:, 13]

x_train = torch.tensor(x_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.long)
x_val = torch.tensor(x_val, dtype=torch.float32)
y_val = torch.tensor(y_val, dtype=torch.long)

train_loader = DataLoader(TensorDataset(x_train, y_train), batch_size=32, shuffle=True)
val_loader = DataLoader(TensorDataset(x_val, y_val), batch_size=32)

In [None]:
# 測試資料
test_data = pd.read_csv('heart_dataset_test.csv').values
x_test = torch.tensor(test_data[:, :13], dtype=torch.float32)
y_test = torch.tensor(test_data[:, 13], dtype=torch.long)
test_loader = DataLoader(TensorDataset(x_test, y_test), batch_size=1)

In [None]:
# 模型定義
class Model(nn.Module):
    def __init__(self, hidden_size):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(13, hidden_size),
            nn.BatchNorm1d(hidden_size),
            nn.LeakyReLU(),
            nn.Dropout(0.3),

            nn.Linear(hidden_size, hidden_size // 2),
            nn.BatchNorm1d(hidden_size // 2),
            nn.LeakyReLU(),
            nn.Dropout(0.3),

            nn.Linear(hidden_size // 2, hidden_size // 4),
            nn.BatchNorm1d(hidden_size // 4),
            nn.LeakyReLU(),
            nn.Dropout(0.3),

            nn.Linear(hidden_size // 4, 2)
        )

    def forward(self, x):
        return self.model(x)


In [None]:

# 超參數組合
# ====== 可調整區 ======
EPOCHS = 200
PATIENCE = 100
# ======================
learning_rates = [0.01, 0.001, 0.0001]
hidden_sizes = [128, 256, 512]
results = []

for lr in learning_rates:
    for hidden_size in hidden_sizes:
        print(f'Training with LR={lr}, Hidden={hidden_size}')
        model = Model(hidden_size)
        criterion = nn.CrossEntropyLoss()
        optimizer = optim.Adam(model.parameters(), lr=lr)
        scheduler = CosineAnnealingLR(optimizer, T_max=50)

        best_val_acc = -1
        best_result = {}
        wait = 0  # early stopping counter

        train_accuracies = []
        val_accuracies = []
        train_losses = []
        val_losses = []

        for epoch in tqdm(range(EPOCHS)):
            model.train()
            total_loss, correct, total = 0.0, 0, 0
            for x, y in train_loader:
                optimizer.zero_grad()
                outputs = model(x)
                loss = criterion(outputs, y)
                loss.backward()
                optimizer.step()
                total_loss += loss.item()
                correct += (outputs.argmax(1) == y).sum().item()
                total += y.size(0)
            train_acc = 100. * correct / total
            train_loss = total_loss / len(train_loader)

            model.eval()
            val_loss, val_correct, total = 0.0, 0, 0
            with torch.no_grad():
                for x, y in val_loader:
                    outputs = model(x)
                    loss = criterion(outputs, y)
                    val_loss += loss.item()
                    val_correct += (outputs.argmax(1) == y).sum().item()
                    total += y.size(0)
            val_acc = 100. * val_correct / total
            val_loss /= len(val_loader)

            train_accuracies.append(train_acc)
            val_accuracies.append(val_acc)
            train_losses.append(train_loss)
            val_losses.append(val_loss)

            # Early stopping 檢查
            if val_acc > best_val_acc:
                best_val_acc = val_acc
                wait = 0
                best_result = {
                    'train_loss': train_loss,
                    'train_acc': train_acc,
                    'val_loss': val_loss,
                    'val_acc': val_acc
                }
                torch.save(model.state_dict(), f'model_lr{lr}_h{hidden_size}.pth')
            else:
                wait += 1
                if wait >= PATIENCE:
                    print(f"Early stopping at epoch {epoch+1}")
                    break

            scheduler.step()

        # 測試
        model.load_state_dict(torch.load(f'model_lr{lr}_h{hidden_size}.pth'))
        model.eval()
        test_loss, test_correct = 0.0, 0
        with torch.no_grad():
            for x, y in test_loader:
                outputs = model(x)
                loss = criterion(outputs, y)
                test_loss += loss.item()
                test_correct += (outputs.argmax(1) == y).sum().item()
        test_acc = 100. * test_correct / len(test_loader)
        test_loss /= len(test_loader)

        results.append([
            lr, hidden_size,
            best_result['train_loss'], best_result['train_acc'],
            best_result['val_loss'], best_result['val_acc'],
            test_loss, test_acc
        ])

        # 繪圖
        fig, ax = plt.subplots(1, 2, figsize=(15, 5))
        ax[0].plot(train_accuracies, label='Train')
        ax[0].plot(val_accuracies, label='Val')
        ax[0].set_title('Model Accuracy')
        ax[0].set_xlabel('Epochs')
        ax[0].set_ylabel('Accuracy')
        ax[0].legend()

        ax[1].plot(train_losses, label='Train')
        ax[1].plot(val_losses, label='Val')
        ax[1].set_title('Model Loss')
        ax[1].set_xlabel('Epochs')
        ax[1].set_ylabel('Loss')
        ax[1].legend()

        plt.tight_layout()
        plt.suptitle(f'LR={lr}, Hidden={hidden_size}', fontsize=14, y=1.05)
        plt.show()




In [None]:
# 輸出為 CSV
df_results = pd.DataFrame(results, columns=[
    'Learning Rate', 'Hidden Size',
    'Best Train Loss', 'Best Train Acc',
    'Best Val Loss', 'Best Val Acc',
    'Best Test Loss', 'Best Test Acc'
])
print(df_results)
df_results.to_csv('hyperparameter_experiment_results.csv', index=False)
print("✅ 結果已儲存至 hyperparameter_experiment_results.csv")
