---
# Assigment 1

Name: DUONG SON THONG

Student Number: 223593948

Email: s223593948@deakin.edu.au

Postgraduate (SIT744)

---

## Introduction

Data link: https://www.kaggle.com/datasets/nanditapore/healthcare-diabetes


The Diabetes Prediction Dataset is a useful resource for researchers, data scientists, and healthcare professionals working on diabetes risk prediction. It contains a range of important health-related features that were carefully collected to support the development of deep learning models. These models can help identify individuals who may be at risk of developing diabetes, supporting efforts in early detection and more effective treatment planning.

# Describe the dataset

This dataset comprises medical diagnostic measurements for **2,768 samples**, primarily used for predictive modeling of diabetes. It contains **10 columns**, each representing patient attributes or outcomes related to diabetes diagnostics.

**Variables**

| Variable Name | Label                             |
|---------------------|-------------------------------------------------|
| Pregnancies         | Number of times the individual has been pregnant|
| Glucose             | Plasma glucose concentration (mg/dL)            |
| BloodPressure       | Diastolic blood pressure (mm Hg)                |
| SkinThickness       | Triceps skinfold thickness (mm)                 |
| Insulin             | 2-Hour serum insulin (mu U/ml).                 |
| BMI                 | Body mass index (weight in kg/(height in m)^2)  |
| DiabetesPedigreeFunction      | A function representing diabetes likelihood based on family history          |
| Age                 | Age of the individual (years)  |
| Outcome                 | Binary indicator of diabetes presence (1 = diabetic, 0 = non-diabetic)  |
| Id                 | Unique identifier for each record.  |

**Import package**

In [9]:
import torch
from torch import nn
import pandas as pd
from torch.utils.data import DataLoader, random_split, Dataset
from sklearn.preprocessing import StandardScaler


## **Set 1 Build a Simple Neural Network (P-Level Tasks)**

1. Determine a machine learning problem (e.g., Forecasting Energy Consumption in a Building) that you want to solve with neural networks. 

    This dataset contains medical diagnostic measurements from female patients. The task is to predict whether a patient has diabetes (binary classification) based on features such as glucose level, BMI, age, and more. Neural networks can be employed to model complex relationships between these health indicators and the likelihood of diabetes.

2. Describe the problem, objectives.
    Primary Goal: Develop a neural network to accurately classify whether a patient has diabetes (Outcome: 1) or not (Outcome: 0)
        
3. potential ethical concerns, including dataset biases.    

In [30]:
datasetCsv = pd.read_csv('dataset.csv')
datasetCsv.head()

Unnamed: 0,Id,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,1,6,148,72,35,0,33.6,0.627,50,1
1,2,1,85,66,29,0,26.6,0.351,31,0
2,3,8,183,64,0,0,23.3,0.672,32,1
3,4,1,89,66,23,94,28.1,0.167,21,0
4,5,0,137,40,35,168,43.1,2.288,33,1


In [None]:
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"

class CustomDataset(Dataset):
    def __init__(self, csv_name):
        datasetCsv = pd.read_csv(csv_name)
        self.len = len(datasetCsv)
        scaleData = StandardScaler() # normalize
        scaledData = scaleData.fit(datasetCsv)
        self.data = scaledData
        
    def __len__(self):
        return len(self.len)
    
    def __getitem__(self, index):
        row = self.data.iloc[index]
        features = torch.tensor(row[:-1].values, dtype=torch.float32)
        label = torch.tensor(row[-1], dtype=torch.float32).unsqueeze(0)  # shape [1]
        return features, label

class NeuralNetworksda(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(9, 9),
            nn.ReLU(),
            nn.Linear(9, 1),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        logits = self.linear_relu_stack(x)
        return logits

def train(dataLoader, model, loss_fn, optimizer):
        size = len(dataLoader.dataset)
        model.train() #set model to the training mode, it just switches internal setting
        for batch, (X, y) in enumerate(dataLoader):
            X, y = X.to(device), y.to(device)
            predict = model(X)
            loss = loss_fn(predict, y)
            #backpropagation
            loss.backward()
            optimizer.step()
            optimizer.zero_grad() #?
            if batch % 100 == 0:
                loss, current = loss.item(), (batch + 1) * len(X)
                print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
                
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            pred = pred.flatten()
            y = y.flatten()
            for i in range(len(pred)):
                predict = 1 if pred[i] >= 0.5 else 0
                if predict == y[i]:
                   correct += 1 
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {correct}%, Avg loss: {test_loss:>8f} \n")
    
customDataset = CustomDataset('dataset.csv')

trainData, testData = random_split(customDataset, [0.8, 0.2])
print(len(trainData))
print(len(testData))
trainDataLoader = DataLoader(trainData, batch_size=64, shuffle=True)
testDataLoader = DataLoader(testData, batch_size=64, shuffle=False)
model = NeuralNetworksda().to(device)
loss_fn = nn.BCELoss() # use binary cross entropy because we have 2 classes in target
optimizer = torch.optim.SGD(model.parameters(), lr= 0.0001) #? adam

epochs = 100
for t in range(epochs):
    print(f'Epoch {t + 1} \n-----------------------------')
    train(trainDataLoader, model, loss_fn, optimizer)
    test(testDataLoader, model, loss_fn)
print("DONE!")
