### Import Libraries and Load Datasets

This section imports essential libraries for data manipulation, model training, and evaluation.

- **pandas**: Used for data manipulation and reading CSV files.
- **sklearn.linear_model.LogisticRegression**: Implements logistic regression.
- **sklearn.model_selection.GridSearchCV**: Used for hyperparameter tuning.
- **sklearn.metrics**: Provides metrics to evaluate model performance.
- **xgboost.XGBClassifier**: Implements the XGBoost algorithm.
- **sklearn.svm.SVC**: Implements Support Vector Classification.

### Load Datasets

Reads the training and test datasets from specified file paths into pandas DataFrames.

In [None]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score, classification_report
from xgboost import XGBClassifier
from sklearn.svm import SVC

# Paths to the datasets
train_dataset = '/home/aghasemi/CompBio481/ML_classifiers/datasets/NC_vs_AD_train.csv'
test_dataset = '/home/aghasemi/CompBio481/ML_classifiers/datasets/NC_vs_AD_test.csv'

train_df = pd.read_csv(train_dataset)
test_df = pd.read_csv(test_dataset)

### Prepare Data

**Separate Features and Target Variable for Training Data:** 
Removes the columns `ID_1` and `Diagnosis` from the training DataFrame to get the feature set `X_train` and extracts the target variable `y_train`.

**Separate Features and Target Variable for Test Data:** 
Similarly, prepares the test data by separating features and the target variable.

In [None]:
# Separate features and target variable for training data
X_train = train_df.drop(columns=['ID_1', 'Diagnosis'])
y_train = train_df['Diagnosis']

# Separate features and target variable for test data
X_test = test_df.drop(columns=['ID_1', 'Diagnosis'])
y_test = test_df['Diagnosis']

### Hyperparameter Tuning for Logistic Regression

**Define Parameter Grid:** 
Specifies the range of hyperparameters (`C`, `solver`, and `max_iter`) to test for logistic regression.

**Grid Search:** 
Uses `GridSearchCV` to perform an exhaustive search over the specified parameter grid with 5-fold cross-validation.

**Fit and Retrieve Best Parameters:** 
Fits the model with all combinations of parameters and retrieves the best parameters based on accuracy.

In [None]:
# Logistic Regression
param_grid_lr = {
    'C': [0.1, 1, 10, 100],
    'solver': ['liblinear', 'saga'],
    'max_iter': [100, 200, 500]
}

lr = LogisticRegression()
grid_search_lr = GridSearchCV(estimator=lr, param_grid=param_grid_lr, cv=5, scoring='accuracy', n_jobs=-1)
grid_search_lr.fit(X_train, y_train)
best_params_lr = grid_search_lr.best_params_

### Hyperparameter Tuning for XGBoost
**Define Parameter Grid:** 
Specifies the range of hyperparameters (`n_estimators`, `max_depth`, `learning_rate`) to test for the XGBoost classifier.

**Grid Search:** 
Uses `GridSearchCV` to find the best parameters with 5-fold cross-validation.

**Fit and Retrieve Best Parameters:** 
Fits the model with different parameter combinations and selects the best ones based on accuracy.

In [None]:
# XGBoost
param_grid_xgb = {
    'n_estimators': [100, 200, 300],
    'max_depth': [3, 4, 5],
    'learning_rate': [0.01, 0.1, 0.2]
}

xgb = XGBClassifier(use_label_encoder=False, eval_metric='logloss')
grid_search_xgb = GridSearchCV(estimator=xgb, param_grid=param_grid_xgb, cv=5, scoring='accuracy', n_jobs=-1)
grid_search_xgb.fit(X_train, y_train)
best_params_xgb = grid_search_xgb.best_params_

### Hyperparameter Tuning for SVM

**Define Parameter Grid:** 
Specifies the hyperparameters (`C`, `kernel`, `gamma`) to tune for Support Vector Machine.

**Grid Search:** 
Performs an exhaustive search with 5-fold cross-validation to find the best parameter values.

**Fit and Retrieve Best Parameters:** 
Fits the model with various parameter settings and retrieves the best combination based on accuracy.


In [None]:
# SVM
param_grid_svm = {
    'C': [0.1, 1, 10, 100],
    'kernel': ['linear', 'rbf'],
    'gamma': ['scale', 'auto']
}

svm = SVC()
grid_search_svm = GridSearchCV(estimator=svm, param_grid=param_grid_svm, cv=5, scoring='accuracy', n_jobs=-1)
grid_search_svm.fit(X_train, y_train)
best_params_svm = grid_search_svm.best_params_

**Train and Evaluate Logistic Regression**

In [12]:
# Train and evaluate the models with the best parameters
# Logistic Regression
lr_best = LogisticRegression(**best_params_lr)
lr_best.fit(X_train, y_train)
y_pred_lr = lr_best.predict(X_test)
print("Logistic Regression")
print("Best Parameters:", best_params_lr)
print("Accuracy:", accuracy_score(y_test, y_pred_lr))
print(classification_report(y_test, y_pred_lr))

Logistic Regression
Best Parameters: {'C': 0.1, 'max_iter': 100, 'solver': 'liblinear'}
Accuracy: 0.8435114503816794
              precision    recall  f1-score   support

           0       0.69      0.58      0.63        60
           1       0.88      0.92      0.90       202

    accuracy                           0.84       262
   macro avg       0.78      0.75      0.77       262
weighted avg       0.84      0.84      0.84       262



**Train and Evaluate XGBoost**

In [13]:
# XGBoost
xgb_best = XGBClassifier(**best_params_xgb, use_label_encoder=False, eval_metric='logloss')
xgb_best.fit(X_train, y_train)
y_pred_xgb = xgb_best.predict(X_test)
print("\nXGBoost")
print("Best Parameters:", best_params_xgb)
print("Accuracy:", accuracy_score(y_test, y_pred_xgb))
print(classification_report(y_test, y_pred_xgb))


XGBoost
Best Parameters: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300}
Accuracy: 0.8778625954198473
              precision    recall  f1-score   support

           0       0.80      0.62      0.70        60
           1       0.89      0.96      0.92       202

    accuracy                           0.88       262
   macro avg       0.85      0.79      0.81       262
weighted avg       0.87      0.88      0.87       262



**Train and Evaluate SVM**

In [14]:
# SVM
svm_best = SVC(**best_params_svm)
svm_best.fit(X_train, y_train)
y_pred_svm = svm_best.predict(X_test)
print("\nSVM")
print("Best Parameters:", best_params_svm)
print("Accuracy:", accuracy_score(y_test, y_pred_svm))
print(classification_report(y_test, y_pred_svm))


SVM
Best Parameters: {'C': 10, 'gamma': 'auto', 'kernel': 'rbf'}
Accuracy: 0.8473282442748091
              precision    recall  f1-score   support

           0       0.69      0.62      0.65        60
           1       0.89      0.92      0.90       202

    accuracy                           0.85       262
   macro avg       0.79      0.77      0.78       262
weighted avg       0.84      0.85      0.84       262

