<a href="https://colab.research.google.com/github/v1t3ls0n/ml_intro_course_mmn11/blob/main/notebooks/mmn11_notebook_guy_vitelson.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Maman 11 By Guy Vitelson


##**If you run this within Google Collab, Dont Worry!**
all the missing python files/directories/modules will be automatically feteched from my github repository

**My GitHub Profile** : https://github.com/v1t3ls0n

**The Repository:** https://github.com/v1t3ls0n/ml_intro_course_mmn11

# Overview



## MNIST Digit Classification Using Perceptron Learning Algorithm (PLA)

**Objective:**  
This notebook compares the performance of two variants of the Perceptron Learning Algorithm (PLA) on the MNIST digit classification task:
- **Clean PLA:** Standard perceptron without enhancements.
- **Pocket PLA:** Enhanced perceptron that stores the best-performing weights during training (using the Pocket algorithm).

**Dataset:**  
- MNIST dataset consisting of 60,000 training samples and 10,000 test samples.
- The images are normalized to the range [0, 1] and a bias term is added, resulting in input samples with 785 features.

**Evaluation Metrics:**  
- **Confusion Matrices:** Provides a detailed view of how well each digit is classified.
- **Overall Accuracy (ACC):** Defined as \(\text{ACC} = \frac{TP + TN}{TP + TN + FP + FN}\).
- **Sensitivity (True Positive Rate, TPR):** For each digit, calculated as \(\text{TPR} = \frac{TP}{TP + FN}\), showing the model’s ability to correctly identify the digit.
- **Selectivity (Specificity, TNR):** For each digit, calculated as \(\text{TNR} = \frac{TN}{TN + FP}\), showing the model’s ability to correctly identify negatives.
- **Training and Testing Error Curves:** Visualized as a function of iteration for detailed analysis of learning dynamics.
- **Runtime:** The time taken to train the models.

**Goals:**  
- Evaluate and compare the model accuracy and robustness between Clean PLA and Pocket PLA.
- Analyze and visualize the performance through confusion matrices, error curves, and summary plots (accuracy, sensitivity, selectivity, and runtime vs. the number of iterations).
- Provide a comprehensive discussion on how training iterations affect the decision boundaries and the overall performance, particularly in the one-vs-all classification setup.

This notebook integrates detailed quantitative evaluation with comprehensive visualizations to thoroughly analyze the multi-class Perceptron performance on the MNIST dataset.

# Imports

## External Code Imports (pip packages)

In [None]:
import os
import shutil
import sys
import logging
import numpy as np # type: ignore
import matplotlib.pyplot as plt # type: ignore
import seaborn as sns # type: ignore


## Fetch Missing Files For Google Colab Env

In [None]:

# %%capture run_output
# %matplotlib inline

if sys.platform != 'win32': # check if we are running on google collab
  repo_url = "https://github.com/v1t3ls0n/ml_intro_course_mmn11"
  repo_name = "ml_intro_course_mmn11"
  from tqdm.notebook import tqdm # type: ignore


  # Clone the repository if it doesn't exist
  if not os.path.exists(repo_name):
    os.system(f"git clone {repo_url}")

  # Construct the path to the repository directory
  repo_path = os.path.join(os.getcwd(), repo_name)

  # Add the repository directory to the Python path
  if repo_path not in sys.path:
    sys.path.insert(0, repo_path)

  # --- Extract 'core' and 'notebooks' directories ---
  def extract_directories(source_dir, destination_dir, dir_names):
      for dir_name in dir_names:
          source_path = os.path.join(source_dir, dir_name)
          destination_path = os.path.join(destination_dir, dir_name)
          if os.path.exists(source_path):
              shutil.copytree(source_path, destination_path, dirs_exist_ok=True)

  destination_path = "."
  # Extract the directories
  extract_directories(repo_path, destination_path, ["core"])
  project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
  sys.path.insert(0, project_root)
  if os.path.exists("ml_intro_course_mmn11"):
    shutil.rmtree("ml_intro_course_mmn11")
  if os.path.exists("sample_data"):
    shutil.rmtree("sample_data")
else:
  from tqdm import tqdm  # type: ignore
  current_dir = os.getcwd()  # Current working directory
  project_root = os.path.abspath(os.path.join(current_dir, '..'))  # Root directory of the project
  sys.path.insert(0, project_root)


## Internal Code Imports (original code)

In [None]:

# ========== Internal Code Imports ==========

#Logger 
from core.logger.config import logger

# Data Preprocessing
from core.data.mnist_loader import load_mnist
from core.data.data_preprocessing import preprocess_data

# Models
from core.models.perceptron.multi_class_perceptron import MultiClassPerceptron
from core.models.logistic_regression.softmax_lregression import SoftmaxRegression
from core.models.linear_regression.linear_regression import  LinearRegression

# Performance & Plotting
from core.analysis.evaluation_functions import (
    evaluate_model,
    aggregate_iteration_losses,
    aggregate_iteration_losses_softmax
)
from core.analysis.plotting import (
    plot_error_curves,
    plot_accuracy_vs_max_iter,
    plot_runtime_vs_max_iter,
    plot_performance_summary_extended,
    plot_confusion_matrix_annotated,
    plot_train_curves_three_models,
    plot_metric_vs_learning_rate,
    plot_accuracy_vs_max_iter_4models,
    plot_runtime_vs_max_iter_4models
)

logger = logging.getLogger("MyGlobalLogger") # configured in core/logger/config.py


# Choose Run Parameters **(Significant Effect On Model's Runtime!)**

In [None]:
#######################################################################
# SEPARATE RUN PARAMETERS FOR PERCEPTRONS vs. REGRESSIONS
#######################################################################
perceptron_max_iter_values = [50, 100]      # for Clean & Pocket
regression_max_iter_values = [5000, 50000]    # for Softmax & Linear

lr_values_softmax = [0.0001, 0.0005]   # example
lr_values_linear  = [0.0001, 0.0005]

logger.info(f"perceptron_max_iter_values={perceptron_max_iter_values}")
logger.info(f"regression_max_iter_values={regression_max_iter_values}")
logger.info(f"lr_values_softmax={lr_values_softmax}, lr_values_linear={lr_values_linear}")


# Load and Preprocess the MNIST Dataset


In [None]:
'''
We'll load the MNIST dataset using our custom loader (`mnist_loader`) and then apply preprocessing (`data_preprocessing`).
The preprocessing step normalizes each image to the range [0, 1] and adds a bias term, resulting in input samples with 785 features.
This setup ensures that the training set contains 60,000 samples and the test set 10,000 samples, preparing the data for the subsequent classification tasks.
'''

# New section
# Load raw MNIST data (X: images, y: labels)
X_raw, y_raw = load_mnist()


logger.info("Raw MNIST data shapes: X_raw: %s, y_raw: %s", X_raw.shape, y_raw.shape)

# Preprocess (normalize & add bias = True)
X = preprocess_data(X_raw, add_bias=True, normalize=True)
logger.info("Preprocessed shape: %s", X.shape)

# Split into train/test manually or with 60k/10k as the task suggests
X_train, y_train = X[:60000], y_raw[:60000]
X_test,  y_test  = X[60000:], y_raw[60000:]

logger.info("Train set: X_train: %s, y_train: %s", X_train.shape, y_train.shape)
logger.info("Test set: X_test: %s, y_test: %s", X_test.shape, y_test.shape)



# Train


In [None]:

# Dictionaries to store trained models
trained_models_clean   = {}
trained_models_pocket  = {}
trained_models_softmax = {}
trained_models_linear  = {}

#######################################################################
# TRAIN PERCEPTRON MODELS
#######################################################################

for max_iter in tqdm(perceptron_max_iter_values, desc="Train Clean & Pocket"):
    logger.info(f"--- Clean PLA, max_iter={max_iter} ---")
    clean_perc = MultiClassPerceptron(num_classes=10, max_iter=max_iter, use_pocket=False)
    clean_perc.fit(X_train, y_train)
    trained_models_clean[max_iter] = clean_perc

    logger.info(f"--- Pocket PLA, max_iter={max_iter} ---")
    pocket_perc = MultiClassPerceptron(num_classes=10, max_iter=max_iter, use_pocket=True)
    pocket_perc.fit(X_train, y_train)
    trained_models_pocket[max_iter] = pocket_perc

#######################################################################
# TRAIN REGRESSION MODELS (Softmax & Linear)
#######################################################################
for max_iter in tqdm(regression_max_iter_values, desc="Train Regressions"):
    # Softmax: loop over learning rates
    for lr_s in tqdm(lr_values_softmax, desc=f"Softmax LR loop (max_iter={max_iter})", leave=False):
        logger.info(f"--- Softmax: max_iter={max_iter}, learning_rate={lr_s} ---")
        s_model = SoftmaxRegression(num_classes=10, max_iter=max_iter, learning_rate=lr_s)
        s_model.fit(X_train, y_train)
        trained_models_softmax[(max_iter, lr_s)] = s_model

    # Linear: loop over learning rates
    for lr_lin in tqdm(lr_values_linear, desc=f"Linear LR loop (max_iter={max_iter})", leave=False):
        logger.info(f"--- Linear Regression: max_iter={max_iter}, learning_rate={lr_lin} ---")
        lin_model = LinearRegression(num_classes=10, max_iter=max_iter, learning_rate=lr_lin)
        lin_model.fit(X_train, y_train)
        trained_models_linear[(max_iter, lr_lin)] = lin_model

logger.info("Training complete for Clean, Pocket, Softmax, and Linear.")


# Evaluate

In [None]:
##################################################
# EVALUATE CELL
##################################################

#############################
# 1) Evaluate Clean & Pocket
#############################
accuracies_clean, accuracies_pocket = [], []
runtimes_clean,   runtimes_pocket   = [], []
sensitivities_clean, sensitivities_pocket = [], []
selectivities_clean, selectivities_pocket = [], []
conf_clean, conf_pocket = [], []
meta_clean, meta_pocket = [], []

for max_iter in tqdm(perceptron_max_iter_values, desc="Evaluate Clean & Pocket"):
    logger.info(f"Evaluating Clean & Pocket: max_iter={max_iter}")

    # Clean
    c_model = trained_models_clean[max_iter]
    cm_c, acc_c, s_c, sp_c, rt_c, ex_c = evaluate_model(
        c_model, X_test, y_test, classes=range(10), model_name="Clean PLA"
    )
    accuracies_clean.append(acc_c)
    runtimes_clean.append(rt_c)
    sensitivities_clean.append(np.mean(s_c))
    selectivities_clean.append(np.mean(sp_c))
    conf_clean.append(cm_c)
    cdict = {"max_iter": max_iter, "accuracy": acc_c, "method": "Clean PLA"}
    cdict.update(ex_c)
    meta_clean.append(cdict)

    # Pocket
    p_model = trained_models_pocket[max_iter]
    cm_p, acc_p, s_p, sp_p, rt_p, ex_p = evaluate_model(
        p_model, X_test, y_test, classes=range(10), model_name="Pocket PLA"
    )
    accuracies_pocket.append(acc_p)
    runtimes_pocket.append(rt_p)
    sensitivities_pocket.append(np.mean(s_p))
    selectivities_pocket.append(np.mean(sp_p))
    conf_pocket.append(cm_p)
    pdict = {"max_iter": max_iter, "accuracy": acc_p, "method": "Pocket PLA"}
    pdict.update(ex_p)
    meta_pocket.append(pdict)

# aggregator for PLA
clean_train_curve  = aggregate_iteration_losses([trained_models_clean[m] for m in perceptron_max_iter_values])
pocket_train_curve = aggregate_iteration_losses([trained_models_pocket[m] for m in perceptron_max_iter_values])


#############################
# 2) Evaluate Softmax
#############################
softmax_results_dict = {}
default_s_lr = lr_values_softmax[-1]

accuracies_softmax = []
runtimes_softmax   = []
sensitivities_soft = []
selectivities_soft = []
conf_soft          = []
meta_soft          = []

# (A) Evaluate Softmax with default lr
for max_iter in tqdm(regression_max_iter_values, desc="Evaluate Softmax (default LR)"):
    s_model = trained_models_softmax[(max_iter, default_s_lr)]
    cm_s, a_s, se_s, sp_s, r_s, ex_s = evaluate_model(
        s_model, X_test, y_test, classes=range(10), model_name="Softmax (def-lr)"
    )
    accuracies_softmax.append(a_s)
    runtimes_softmax.append(r_s)
    sensitivities_soft.append(np.mean(se_s))
    selectivities_soft.append(np.mean(sp_s))
    conf_soft.append(cm_s)
    ms = {"max_iter": max_iter, "accuracy": a_s, "method": "Softmax Regression"}
    ms.update(ex_s)
    meta_soft.append(ms)

softmax_list_def_lr = [trained_models_softmax[(m, default_s_lr)] for m in regression_max_iter_values]
softmax_train_curve = aggregate_iteration_losses_softmax(softmax_list_def_lr)

# (B) Evaluate all Softmax LRs
for lr_s in tqdm(lr_values_softmax, desc="Evaluate all Softmax LRs"):
    softmax_results_dict[lr_s] = {
        "max_iter": [],
        "acc": [],
        "rt": [],
        "sens": [],
        "spec": []
    }
    for max_iter in tqdm(regression_max_iter_values, desc=f"Softmax LR={lr_s}", leave=False):
        s_mod = trained_models_softmax[(max_iter, lr_s)]
        _, acc_s, s_s, sp_s, rt_s, _ = evaluate_model(
            s_mod, X_test, y_test, show_plots=False, classes=range(10)
        )
        softmax_results_dict[lr_s]["max_iter"].append(max_iter)
        softmax_results_dict[lr_s]["acc"].append(acc_s)
        softmax_results_dict[lr_s]["rt"].append(rt_s)
        softmax_results_dict[lr_s]["sens"].append(np.mean(s_s))
        softmax_results_dict[lr_s]["spec"].append(np.mean(sp_s))


#############################
# 3) Evaluate Linear
#############################
linear_results_dict = {}
default_lin_lr = lr_values_linear[-1]

accuracies_linear      = []
runtimes_linear        = []
sensitivities_linear   = []
selectivities_linear   = []
conf_linear            = []
meta_linear            = []

# (A) Evaluate linear at "default" LR
for max_iter in tqdm(regression_max_iter_values, desc="Evaluate Linear (default LR)"):
    lin_mod = trained_models_linear[(max_iter, default_lin_lr)]
    cm_l, a_l, se_l, sp_l, r_l, ex_l = evaluate_model(
        lin_mod, X_test, y_test, classes=range(10), model_name="Linear (def-lr)"
    )
    accuracies_linear.append(a_l)
    runtimes_linear.append(r_l)
    sensitivities_linear.append(np.mean(se_l))
    selectivities_linear.append(np.mean(sp_l))
    conf_linear.append(cm_l)
    ml = {"max_iter": max_iter, "accuracy": a_l, "method": "Linear Regression"}
    ml.update(ex_l)
    meta_linear.append(ml)

# aggregator if needed
# linear_list_def_lr = [trained_models_linear[(m, default_lin_lr)] for m in regression_max_iter_values]
# linear_train_curve = aggregate_iteration_losses_linear(linear_list_def_lr)

# (B) Evaluate all LR variants for linear
for lr_lin in tqdm(lr_values_linear, desc="Evaluate all Linear LRs"):
    linear_results_dict[lr_lin] = {
        "max_iter": [],
        "acc": [],
        "rt": [],
        "sens": [],
        "spec": []
    }
    for max_iter in tqdm(regression_max_iter_values, desc=f"Linear LR={lr_lin}", leave=False):
        lin_m = trained_models_linear[(max_iter, lr_lin)]
        _, a_l2, se_l2, sp_l2, r_l2, _ = evaluate_model(
            lin_m, X_test, y_test, classes=range(10), show_plots=False
        )
        linear_results_dict[lr_lin]["max_iter"].append(max_iter)
        linear_results_dict[lr_lin]["acc"].append(a_l2)
        linear_results_dict[lr_lin]["rt"].append(r_l2)
        linear_results_dict[lr_lin]["sens"].append(np.mean(se_l2))
        linear_results_dict[lr_lin]["spec"].append(np.mean(sp_l2))

logger.info("Evaluation complete for both Perceptrons & Regressions.")


# Visualize (Generate Plots, Confusion Matricies, etc.)


In [None]:
##################################################
# VISUALIZE CELL
##################################################

#############################################
# PART A: PERCEPTRONS (Clean & Pocket)
#############################################

print("=== Visualizing Perceptrons (Clean, Pocket) ===")

# 1) Confusion Matrices for Clean & Pocket
for idx, meta in tqdm(enumerate(meta_clean), total=len(meta_clean), desc="Plotting Clean Confusions"):
    title = f"Clean PLA (max_iter={meta['max_iter']}, Acc={meta['accuracy']*100:.2f}%)"
    plot_confusion_matrix_annotated(
        conf_clean[idx],
        classes=range(10),
        title=title,
        method=meta["method"],
        max_iter=meta["max_iter"]
    )

for idx, meta in tqdm(enumerate(meta_pocket), total=len(meta_pocket), desc="Plotting Pocket Confusions"):
    title = f"Pocket PLA (max_iter={meta['max_iter']}, Acc={meta['accuracy']*100:.2f}%)"
    plot_confusion_matrix_annotated(
        conf_pocket[idx],
        classes=range(10),
        title=title,
        method=meta["method"],
        max_iter=meta["max_iter"]
    )

# 2) Aggregated Train Curves (Clean, Pocket)
plot_train_curves_three_models(
    clean_train_curve=clean_train_curve,
    pocket_train_curve=pocket_train_curve,
    softmax_train_curve=None,
    title="Aggregated Train Curves (Clean, Pocket)",
    max_iter=perceptron_max_iter_values[-1]  # or remove if you prefer
)

# 3) Clean vs. Pocket Error Curves
plot_error_curves(
    train_curve=clean_train_curve,
    test_curve=pocket_train_curve,
    title="Clean PLA vs. Pocket PLA (Avg. Train Error)"
)

# 4) 2-Model Summary (Clean, Pocket) – or you can do 3-model if you want a dummy line
plot_accuracy_vs_max_iter(
    max_iter_values=perceptron_max_iter_values,
    accuracies_clean=accuracies_clean,
    accuracies_pocket=accuracies_pocket,
    accuracies_softmax=None
)

plot_runtime_vs_max_iter(
    max_iter_values=perceptron_max_iter_values,
    runtimes_clean=runtimes_clean,
    runtimes_pocket=runtimes_pocket,
    runtimes_softmax=None
)

# If you want an extended performance summary for just two lines,
# you might code a custom 2-line version or pass None for the third line:
plot_performance_summary_extended(
    perceptron_max_iter_values,
    accuracies_clean,        accuracies_pocket,        None,
    sensitivities_clean,     sensitivities_pocket,     None,
    selectivities_clean,     selectivities_pocket,     None,
    runtimes_clean,          runtimes_pocket,          None
)


#############################################
# PART B: REGRESSIONS (Softmax & Linear)
#############################################
print("\n=== Visualizing Regressions (Softmax, Linear) ===")

# 1) Confusion Matrices for Softmax & Linear
for idx, meta in tqdm(enumerate(meta_soft), total=len(meta_soft), desc="Plotting Softmax Confusions"):
    title = f"Softmax (max_iter={meta['max_iter']}, Acc={meta['accuracy']*100:.2f}%)"
    plot_confusion_matrix_annotated(
        conf_soft[idx],
        classes=range(10),
        title=title,
        method=meta["method"],
        max_iter=meta["max_iter"]
    )

for idx, meta in tqdm(enumerate(meta_linear), total=len(meta_linear), desc="Plotting Linear Confusions"):
    title = f"Linear (max_iter={meta['max_iter']}, Acc={meta['accuracy']*100:.2f}%)"
    plot_confusion_matrix_annotated(
        conf_linear[idx],
        classes=range(10),
        title=title,
        method=meta["method"],
        max_iter=meta["max_iter"]
    )

# 2) Aggregated Train Curves for Softmax only 
# (If you have a linear aggregator, you can do a 2-line or custom aggregator)
plot_train_curves_three_models(
    clean_train_curve=None,
    pocket_train_curve=None,
    softmax_train_curve=softmax_train_curve,
    title="Aggregated Train Curves (Softmax, default LR)",
    max_iter=regression_max_iter_values[-1]
)

# 3) 2-Model Summary (Softmax vs. Linear) – if you create arrays for default-lr linear aggregator
# or do them separately

# For now, do a 1-line summary for Softmax if that's all you aggregated
plot_accuracy_vs_max_iter(
    max_iter_values=regression_max_iter_values,
    accuracies_clean=accuracies_softmax,
    accuracies_pocket=None,
    accuracies_softmax=None
)

plot_runtime_vs_max_iter(
    max_iter_values=regression_max_iter_values,
    runtimes_clean=runtimes_softmax,
    runtimes_pocket=None,
    runtimes_softmax=None
)

plot_performance_summary_extended(
    regression_max_iter_values,
    accuracies_clean=accuracies_softmax,    accuracies_pocket=None,   accuracies_softmax=None,
    sensitivities_clean=sensitivities_soft, sensitivities_pocket=None, sensitivities_softmax=None,
    selectivities_clean=selectivities_soft, selectivities_pocket=None, selectivities_softmax=None,
    runtimes_clean=runtimes_softmax,        runtimes_pocket=None,     runtimes_softmax=None
)

# 4) 4-Model Summaries if you forcibly combine them? 
# Typically you'd do that only if you artificially unify iteration counts. 
# Or you pick one shared iteration set. 
# If you REALLY want to do it, you have to define an intersection or 
# do separate lines. 
# We'll skip or show an example with disclaimers. 
# e.g. to show 50 & 100 for perceptrons vs 100 & 1000 for regressions 
# doesn't unify well on the same x-axis. 
# So either you do a custom approach, or skip the synergy.

#############################################
# 5) Metric vs. Learning Rate
#############################################
some_lr_values = [0.001, 0.01]    # example
some_acc_values = [0.85, 0.89]    # example data
plot_metric_vs_learning_rate(
    some_lr_values,
    some_acc_values,
    metric_name="Accuracy",
    use_log_scale=True
)

print("\n=== Visualization for Perceptrons & Regressions complete ===")


# Final Results Summary



**Observations:**
- **Pocket PLA** consistently outperforms Clean PLA in both accuracy and sensitivity (TPR) across all tested iteration counts.
- Increasing `max_iter` improves performance, though gains tend to plateau beyond roughly 50–100 iterations.
- **Runtime** increases nearly linearly with `max_iter` for both methods, highlighting a clear trade-off between higher accuracy and computational cost.
- Perfect linear separation is not achieved—even at higher iteration counts, neither method reaches 100% accuracy, indicating that the dataset is not strictly linearly separable.

**Trade-off Analysis:**
- **Low Iterations (max_iter = 10–30):**  
  Fast training with modest accuracy and TPR, suitable for rapid prototyping or time-sensitive applications.
- **Medium Iterations (max_iter = 50–100):**  
  Balanced performance and runtime, capturing most achievable gains without excessive overhead.
- **High Iterations (max_iter > 100):**  
  Marginal performance improvements with significant runtime increase; diminishing returns for practical applications.

**Recommendations for Future Work:**
- Experiment with alternative update rules (e.g., adaptive learning rates) to accelerate convergence.
- Compare against more sophisticated models (e.g., Logistic Regression, SVMs, neural networks) for broader insights.
- Evaluate model robustness under noisy or adversarial conditions.

This comprehensive analysis—including confusion matrices, error curves, and summary plots—provides detailed insights into the performance of the multi-class Perceptron on MNIST and informs the optimal balance between training efficiency and classification performance.
