# Final training

Now we will work on how to run to train the different models in order to obtain the one with the best results and opitmize it's parameters to have the best model with the best parameters

To do so we will follow the next logic:

* From the subset of all models, we will select the ones with
  * The ones with maximum interpretability
  * The ones with lowest inference

* To optimize their parameters, we will probably run some optimaztion technique, e.g. GD, or CGD, or an interative method, if the matrix is to large, and therefore finding the exact solution is not an option.


In this notebook we will work on the assumption that we have already pre-processed all the data and we receive as input a matrix X containing all ECGs + demographic data and a matrix Y containing the final labels of the train, and we can divide them between train and test accordingly.


In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# Parameters for synthetic data
n_samples = 500
n_ecg_features = 100  # e.g., ECG_0, ECG_1, ..., ECG_99

# Create synthetic numeric columns
np.random.seed(42)
df = pd.DataFrame({
    'Height': np.random.normal(170, 10, n_samples),
    'Weight': np.random.normal(70, 15, n_samples),
    'BMI': np.random.normal(24, 4, n_samples),
})

# Add synthetic ECG features
for i in range(n_ecg_features):
    df[f'ECG_{i}'] = np.random.normal(0, 1, n_samples)

# Add categorical/boolean columns
df['Gender'] = np.random.choice(['Male', 'Female'], n_samples)
df['Smoker'] = np.random.choice(['Yes', 'No'], n_samples)
df['HTA'] = np.random.choice(['Yes', 'No'], n_samples)
df['DM'] = np.random.choice(['Yes', 'No'], n_samples)
df['DLP'] = np.random.choice(['Yes', 'No'], n_samples)
df['COPD'] = np.random.choice(['Yes', 'No'], n_samples)
df['Sleep_apnea'] = np.random.choice(['Yes', 'No'], n_samples)

# Add a binary label column
df['Label'] = np.random.choice([0, 1], n_samples)

# Now proceed as before
Y = df['Label'].values

numeric_cols = ['Height', 'Weight', 'BMI'] + [col for col in df.columns if col.startswith('ECG_')]
X_numeric = df[numeric_cols].copy()

scaler = MinMaxScaler()
X_numeric_scaled = pd.DataFrame(scaler.fit_transform(X_numeric), columns=X_numeric.columns, index=X_numeric.index)

categorical_cols = ['Gender', 'Smoker', 'HTA', 'DM', 'DLP', 'COPD', 'Sleep_apnea']
X_categorical = df[categorical_cols].copy()
X_categorical_encoded = pd.get_dummies(X_categorical, drop_first=True)

X = pd.concat([X_numeric_scaled, X_categorical_encoded], axis=1)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)

print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("Y_train shape:", Y_train.shape)
print("Y_test shape:", Y_test.shape)

X_train shape: (400, 110)
X_test shape: (100, 110)
Y_train shape: (400,)
Y_test shape: (100,)


  df[f'ECG_{i}'] = np.random.normal(0, 1, n_samples)
  df['Gender'] = np.random.choice(['Male', 'Female'], n_samples)
  df['Smoker'] = np.random.choice(['Yes', 'No'], n_samples)
  df['HTA'] = np.random.choice(['Yes', 'No'], n_samples)
  df['DM'] = np.random.choice(['Yes', 'No'], n_samples)
  df['DLP'] = np.random.choice(['Yes', 'No'], n_samples)
  df['COPD'] = np.random.choice(['Yes', 'No'], n_samples)
  df['Sleep_apnea'] = np.random.choice(['Yes', 'No'], n_samples)
  df['Label'] = np.random.choice([0, 1], n_samples)
