## 1 - Model 1 Logistic Regression

Main goal is to:
* Develop a baseline Logistic Regression model using lab code
* Train the model on the training set and evaluate on the validation set using Accuracy and F1-score as the primary metrics
* Analyze model coefficients to identify influential features (e.g., service, src_bytes, count)

Furthermore, some attack types (like DoS) occur very frequently, while others (like U2R or R2L) are extremely rare. This creates a large imbalance, because the model can easily learn to predict the majority class (“DoS” or “normal”) and still appear accurate, while performing poorly on rare attacks. As part of the solution, class weights will be added. This should automatically penalize mistakes on minority classes more heavily, giving them higher importance during training.


In [10]:
import numpy as np
from pathlib import Path
from joblib import load, dump
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score, classification_report
from utils import *

DATA_DIR   = Path("data/processed")
MODELS_DIR = Path("models")
MODELS_DIR.mkdir(parents=True, exist_ok=True)

X_train, y_train = load_split("train", DATA_DIR)
X_val,   y_val   = load_split("val", DATA_DIR)
X_test,  y_test  = load_split("test", DATA_DIR)

print("First element in X_train:\n", X_train[:1])
print("Type of X_train:",type(X_train))

print("First element in y_train:\n", y_train[:1])
print("Type of y_train:",type(y_train))

First element in X_train:
 [[-1.10249223e-01 -1.24706157e-01  1.57670942e+00 -1.85174409e+00
  -7.76224074e-03 -4.91864438e-03 -1.40888118e-02 -8.94864220e-02
  -7.73598503e-03 -9.50756715e-02 -2.70228184e-02 -8.09261819e-01
  -1.16636426e-02 -3.66518691e-02 -2.44365073e-02 -1.23851504e-02
  -2.61800242e-02 -1.86098963e-02 -4.12211976e-02  0.00000000e+00
  -2.81749392e-03 -9.75309440e-02  1.85045728e+00 -1.34065042e-01
  -6.37209268e-01 -6.31929033e-01  2.74640280e+00  2.71536458e+00
  -1.36692173e+00 -1.69295977e-02 -3.74559704e-01  7.34342561e-01
  -8.82122628e-01 -1.00510998e+00 -6.85530203e-02 -4.80196848e-01
  -2.89103400e-01 -6.39531905e-01 -6.24870800e-01  2.87440955e+00
   2.75391372e+00]]
Type of X_train: <class 'numpy.ndarray'>
First element in y_train:
 [14]
Type of y_train: <class 'numpy.ndarray'>
