# End-to-end PyTorch Artificial Neural Network training, validation, and testing tutorial for a TopologicPy Dataset

## This Script

1. Loads the dataset with the ANN helper class  
2. Builds a PyTorch ANN model (classification by default)  
3. Trains with a train/val split  
4. Evaluates on validation and test sets  
5. Visualizes learning curves, confusion matrix, and prints evaluation metrics  

---

## Notes

- **Requires:** `topologicpy > 0.9.6`, `torch`, `pandas`, `pyyaml`, `numpy`, `plotly`, `scikit-learn`
- **Example Datasets can found at** `https://github.com/wassimj/topologicpy/tree/main/assets/MachineLearning`

### Installation Example

```bash
pip install torch pandas pyyaml numpy plotly scikit-learn
# then install torch-geometric following their official instructions for your OS/CUDA


In [2]:
# This cell is not needed if you have pip installed topologicpy
import sys
sys.path.append("C:/Users/sarwj/OneDrive - Cardiff University/Documents/GitHub/topologicpy/src")

### Import the needed libraries and add a utility function

In [17]:
from __future__ import annotations
import pandas as pd
from topologicpy.ANN import ANN
from topologicpy.Helper import Helper

### Check TopologicPy Version

In [4]:
print("The script is compatible with TopologicPy v0.9.6 or newer.")
print(Helper.Version())

The script is compatible with TopologicPy v0.9.6 or newer.
The version that you are using (0.9.6) is EQUAL TO the latest version available on PyPI.


### Specify the Location of the Training Dataset

In [5]:
path=r"C:\Users\sarwj\OneDrive - Cardiff University\Documents\GitHub\topologicpy\assets\MachineLearning\synthetic_small.csv"

### Load the CSV Dataset (The Example Has Categorical Labels, Task is Graph-level Classification)

In [6]:
# IMPORTANT: ANN.ByCSVPath expects a FILE path (not a folder)
ann = ANN.ByCSVPath(
path=path,
task="classification",
featuresKeys=[f"x{i}" for i in range(50)],
labelHeader="label"
)

### Set Hyperparameters

In [7]:
# IMPORTANT: epochs/batch_size are set via SetHyperparameters (Train() takes no args)
ann.SetHyperparameters(
epochs=125,
batch_size=3,          # <-- FIX (avoids batch size 1 with 5 training samples)
lr=1e-4,
weight_decay=1e-4,
hidden_dims=(64, 64),
dropout=0.1,
batch_norm=True,
early_stopping=True,
early_stopping_patience=20,
verbose=True
)

### Train the Model

In [8]:
history = ann.Train()

Epoch 001/125: train_loss=1.576394, val_loss=1.406928
Epoch 002/125: train_loss=1.451666, val_loss=1.268080
Epoch 003/125: train_loss=1.387208, val_loss=1.186334
Epoch 004/125: train_loss=1.355674, val_loss=1.164682
Epoch 005/125: train_loss=1.330808, val_loss=1.118697
Epoch 006/125: train_loss=1.316929, val_loss=1.090494
Epoch 007/125: train_loss=1.317085, val_loss=1.067774
Epoch 008/125: train_loss=1.298381, val_loss=1.021253
Epoch 009/125: train_loss=1.297166, val_loss=1.033188
Epoch 010/125: train_loss=1.295044, val_loss=1.034929
Epoch 011/125: train_loss=1.280333, val_loss=0.964638
Epoch 012/125: train_loss=1.277552, val_loss=0.987840
Epoch 013/125: train_loss=1.268345, val_loss=0.974182
Epoch 014/125: train_loss=1.267703, val_loss=0.954397
Epoch 015/125: train_loss=1.268512, val_loss=0.967355
Epoch 016/125: train_loss=1.268156, val_loss=0.962473
Epoch 017/125: train_loss=1.260289, val_loss=0.917163
Epoch 018/125: train_loss=1.256380, val_loss=0.931854
Epoch 019/125: train_loss=1.

### Validate the Model

In [9]:
val_metrics = ann.Validate()
print("\nValidation metrics:", {k: v for k, v in val_metrics.items() if k not in ("y_true", "y_pred", "y_prob")})


Validation metrics: {'accuracy': 0.8448, 'f1_macro': 0.8447565625373343, 'precision_macro': 0.8463216314544704, 'recall_macro': 0.8446003850262492}


### Test the Model

In [10]:
test_metrics = ann.Test()
print("Test metrics:", {k: v for k, v in test_metrics.items() if k not in ("y_true", "y_pred", "y_prob")})

Test metrics: {'accuracy': 0.84, 'f1_macro': 0.8396405847770108, 'precision_macro': 0.8400993114534995, 'recall_macro': 0.8398854902371073}


### Plot the Training and Validation Loss Curves

In [11]:
# Plot learning curves
fig_hist = ann.PlotHistory(title="ANN Learning Curves")
fig_hist.show()

### Plot the Confusion Matrix (For Categorical Labels)

In [12]:
# Plot confusion matrix
fig_cm = ann.PlotConfusionMatrix(split="test", normalize=False, title="ANN Confusion Matrix (Test)", backgroundColor="white", marginTop=100, marginLeft=100, marginRight=100, marginBottom=100)
fig_cm.show()

### Save the Model

In [13]:
ann.SaveModel(r"C:\Users\sarwj\OneDrive - Cardiff University\Desktop\ann_model.pt")

# ----- PHASE 2: PREDICTION OF UNSEEN DATASET ------

### Load Testing Dataset

In [14]:
path=r"C:\Users\sarwj\OneDrive - Cardiff University\Documents\GitHub\topologicpy\assets\MachineLearning\synthetic_medium.csv"

### Load the Pre-trained Model

In [15]:
# IMPORTANT: ANN.ByCSVPath expects a FILE path (not a folder)
ann_2 = ANN.ByCSVPath(
path=path,
task="classification",
featuresKeys=[f"x{i}" for i in range(50)],
labelHeader="label"
)


# IMPORTANT: epochs/batch_size are set via SetHyperparameters (Train() takes no args)
ann_2.SetHyperparameters(
epochs=125,
batch_size=3,          # <-- FIX (avoids batch size 1 with 5 training samples)
lr=1e-4,
weight_decay=1e-4,
hidden_dims=(64, 64),
dropout=0.1,
batch_norm=True,
early_stopping=True,
early_stopping_patience=20,
verbose=True
)
ann_2.LoadModel(r"C:\Users\sarwj\OneDrive - Cardiff University\Desktop\ann_model.pt")

### Predict the Dataset

In [18]:
# Optional: predict on the same dataset (or pass path=... for new csv)
pred_pkg = ann_2.Predict(return_proba=True, attach_to_df=True)
df_pred = pred_pkg.get("df", None)
if isinstance(df_pred, pd.DataFrame):
    print("\nPredictions head:")
    print(df_pred.head())


Predictions head:
         x0        x1        x2        x3        x4        x5        x6  \
0  0.091841 -2.377438  2.712184 -0.995119 -0.002728  1.288477  2.817108   
1 -0.689990  1.874402  3.109361 -4.182978  2.523524 -0.223920  3.606342   
2 -0.523490 -4.297341 -0.515422  1.019546  5.572912 -0.553417  3.173969   
3 -1.802325 -1.260066  0.840554  6.505499  0.235186  0.636305 -1.430029   
4 -0.597411 -0.307132  2.188670  0.057907 -0.304008 -1.482764 -6.192987   

          x7        x8        x9  ...       x48       x49  label  pred_label  \
0  -4.265892 -0.812775  0.569419  ...  2.535416  0.228359      2           2   
1  -1.461277  1.897154  1.042749  ...  0.033004  1.085809      3           3   
2 -11.362759 -1.483602  2.307871  ... -0.499515  0.587651      1           1   
3   2.222442 -0.844897 -0.840617  ...  2.132421 -0.182430      0           0   
4  -6.033078  6.868454 -5.909398  ... -3.012696  0.108499      1           1   

   pred   proba_0   proba_1   proba_2   proba_3  

### Plot the Confusion Matrix (For Categorical Labels Only)

In [20]:
# Plot confusion matrix
fig_cm = ann.PlotConfusionMatrix(split="all", normalize=False, title="ANN Confusion Matrix (Test)", backgroundColor="white", marginTop=100, marginLeft=100, marginRight=100, marginBottom=100)
fig_cm.show()