**Zero-R (Zero-Rule) Algorithm**

*   Simplest classification model used to establish a baseline accuracy.
*    Ignores all predictors and simply predicts the most frequent class label in the training data.
*   Provides a naive but useful performance metric known as the **baseline accuracy**.

**Baseline Accuracies for Overall**

In [None]:
# List of file paths
file_paths = [
    '/content/ad_vs_nc_train.csv',
    '/content/dlb_vs_nc_train.csv',
    '/content/mci_vs_nc_train.csv',
    '/content/nph_vs_nc_train.csv',
    '/content/vad_vs_nc_train.csv'
]

# Function to calculate baseline accuracy
def calculate_baseline_accuracy(file_path):
    df = pd.read_csv(file_path)

    # Determine the most frequent class in the target column (Diagnosis)
    most_frequent_class = df['Diagnosis'].mode()[0]
    print(f"Most frequent class in {file_path}: {most_frequent_class}")

    # Calculate the baseline accuracy
    baseline_accuracy = (df['Diagnosis'] == most_frequent_class).mean()
    print(f"Baseline Accuracy for {file_path}: {baseline_accuracy:.4f}\n")

# Iterate over the file paths and calculate baseline accuracy for each
for file_path in file_paths:
    calculate_baseline_accuracy(file_path)

Most frequent class in /content/ad_vs_nc_train.csv: 1
Baseline Accuracy for /content/ad_vs_nc_train.csv: 0.7822

Most frequent class in /content/dlb_vs_nc_train.csv: 0
Baseline Accuracy for /content/dlb_vs_nc_train.csv: 0.6438

Most frequent class in /content/mci_vs_nc_train.csv: 0
Baseline Accuracy for /content/mci_vs_nc_train.csv: 0.9023

Most frequent class in /content/nph_vs_nc_train.csv: 0
Baseline Accuracy for /content/nph_vs_nc_train.csv: 0.7946

Most frequent class in /content/vad_vs_nc_train.csv: 0
Baseline Accuracy for /content/vad_vs_nc_train.csv: 0.7855



**Baseline Accuracies for Sex Specific**

In [10]:
import pandas as pd

# List of file paths
file_paths = [
    '/content/ad_vs_nc_train.csv',
    '/content/dlb_vs_nc_train.csv',
    '/content/mci_vs_nc_train.csv',
    '/content/nph_vs_nc_train.csv',
    '/content/vad_vs_nc_train.csv'
]

# Function to calculate baseline accuracy
def calculate_baseline_accuracy(df, group_name, file_path):
    # Determine the most frequent class in the target column (Diagnosis)
    most_frequent_class = df['Diagnosis'].mode()[0]
    print(f"Most frequent class in {group_name} of {file_path}: {most_frequent_class}")

    # Calculate the baseline accuracy
    baseline_accuracy = (df['Diagnosis'] == most_frequent_class).mean()
    print(f"Baseline Accuracy for {group_name} of {file_path}: {baseline_accuracy:.4f}\n")

# Function to process each file
def process_file(file_path):
    df = pd.read_csv(file_path)

    # Split the df into male and female datasets
    male_df = df[df['Sex'] == 1]
    female_df = df[df['Sex'] == 0]

    # Calculate baseline accuracy for the male dataset
    if not male_df.empty:
        calculate_baseline_accuracy(male_df, 'Male', file_path)
    else:
        print(f"No male data in {file_path}\n")

    # Calculate baseline accuracy for the female dataset
    if not female_df.empty:
        calculate_baseline_accuracy(female_df, 'Female', file_path)
    else:
        print(f"No female data in {file_path}\n")

# Iterate over the file paths and process each file
for file_path in file_paths:
    process_file(file_path)

Most frequent class in Male of /content/ad_vs_nc_train.csv: 1
Baseline Accuracy for Male of /content/ad_vs_nc_train.csv: 0.6694

Most frequent class in Female of /content/ad_vs_nc_train.csv: 1
Baseline Accuracy for Female of /content/ad_vs_nc_train.csv: 0.8444

Most frequent class in Male of /content/dlb_vs_nc_train.csv: 0
Baseline Accuracy for Male of /content/dlb_vs_nc_train.csv: 0.7039

Most frequent class in Female of /content/dlb_vs_nc_train.csv: 0
Baseline Accuracy for Female of /content/dlb_vs_nc_train.csv: 0.5860

Most frequent class in Male of /content/mci_vs_nc_train.csv: 0
Baseline Accuracy for Male of /content/mci_vs_nc_train.csv: 0.9531

Most frequent class in Female of /content/mci_vs_nc_train.csv: 0
Baseline Accuracy for Female of /content/mci_vs_nc_train.csv: 0.8516

Most frequent class in Male of /content/nph_vs_nc_train.csv: 0
Baseline Accuracy for Male of /content/nph_vs_nc_train.csv: 0.7898

Most frequent class in Female of /content/nph_vs_nc_train.csv: 0
Baseline A