## **AIM**

**Linear Regression with Regularization (without using sklearn or equivalent library) and Simple and Multiple Linear Regression with and without regularization using Sklearn**

**Apply it on datasets used in experiment 3.**

**Compare outcome of experiment 3 and 4 and derive conclusions.**


## **Without Sklearn and Without Regularization**

**Simple Linear Regression with Batch Gradient Descent**

In [16]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate fake dataset
np.random.seed(0)
m = 100
X = np.random.rand(m, 1) * 10
y = 4*X + 7 + np.random.randn(m, 1)  # true relation with noise

# ------------------------------
# Batch Gradient Descent (sklearn LinearRegression)
# ------------------------------
model_batch = LinearRegression()
model_batch.fit(X, y)

theta0_batch = model_batch.intercept_[0]
theta1_batch = model_batch.coef_[0][0]
print("Batch GD (LinearRegression) parameters:")
print(f"Theta 0: {theta0_batch:.4f}")
print(f"Theta 1: {theta1_batch:.4f}")

# Predictions
y_pred_batch = model_batch.predict(X)

# RMSE
rmse_batch = np.sqrt(mean_squared_error(y, y_pred_batch))
print(f"RMSE (Batch GD) on training data: {rmse_batch:.4f}")

# Predict for X=7
pred_y_batch = model_batch.predict([[7]])
print(f"Predicted value for X=7 (Batch GD): {pred_y_batch[0,0]:.2f}")


Batch GD (LinearRegression) parameters:
Theta 0: 7.2222
Theta 1: 3.9937
RMSE (Batch GD) on training data: 0.9962
Predicted value for X=7 (Batch GD): 35.18


**Simple Linear Regression with Stochastic Gradient Descent**

In [15]:
import numpy as np

# Generate Fake Dataset 

np.random.seed(42)

m = 100  # number of samples
X = np.random.rand(m, 1) * 10  
y = 4*X + 7 + np.random.randn(m, 1)  # y = 4*X + 7 + noise

# Add bias column (x0 = 1)
X_b = np.c_[np.ones((m, 1)), X]


# Stochastic Gradient Descent

eta = 0.01
n_epochs = 50   

theta = np.random.randn(2, 1)  

for epoch in range(n_epochs):
    for i in range(m):
        xi = X_b[i:i+1]      
        yi = y[i:i+1]       
        gradient = 2 * xi.T.dot(xi.dot(theta) - yi)  
        theta = theta - eta * gradient

print("Final Theta (coefficients):\n", np.round(theta, 4))


# RMSE
y_pred = X_b.dot(theta)
rmse = np.sqrt(np.mean((y - y_pred)**2))
print(f"RMSE: {rmse:.4f}")


Final Theta (coefficients):
 [[7.1095]
 [3.9871]]
RMSE: 0.9048


**Simple Linear Regression with Mini Batch Gradient Descent**

In [105]:
import numpy as np


# Generate Fake Dataset

np.random.seed(42)

m = 100  # number of samples
X = np.random.rand(m, 1) * 10  
y = 4*X + 7 + np.random.randn(m, 1) 

# Add bias column (x0 = 1)
X_b = np.c_[np.ones((m, 1)), X]


# Mini-Batch Gradient Descent

eta = 0.001# learning rate
n_epochs = 1000    # number of passes over dataset
batch_size = 15  # mini-batch size

theta = np.random.randn(2, 1)  # initialize randomly

for epoch in range(n_epochs):
    # Shuffle the data at the start of each epoch
    indices = np.random.permutation(m)
    X_b_shuffled = X_b[indices]
    y_shuffled = y[indices]
    
    # Loop over mini-batches
    for i in range(0, m, batch_size):
        X_mini = X_b_shuffled[i:i+batch_size]
        y_mini = y_shuffled[i:i+batch_size]
        gradient = 2/len(y_mini) * X_mini.T.dot(X_mini.dot(theta) - y_mini)
        theta = theta - eta * gradient

print("Final Theta (coefficients):\n", np.round(theta, 4))


# RMSE

y_pred = X_b.dot(theta)
rmse = np.sqrt(np.mean((y - y_pred)**2))
print(f"RMSE: {rmse:.4f}")


Final Theta (coefficients):
 [[7.0763]
 [3.9755]]
RMSE: 0.9011


## **Without Sklearn and Without Regularization**

**Multiple Linear Regression with Batch Gradient Descent**

In [2]:
import numpy as np
import pandas as pd

# -----------------------------
# 1️⃣ Load dataset
# -----------------------------
data = pd.read_csv("Student_Performance.csv") 

# Encode categorical feature
data['Extracurricular Activities'] = data['Extracurricular Activities'].map({'Yes': 1, 'No': 0})

# Features and target
X = data[['Hours Studied', 'Previous Scores', 'Extracurricular Activities', 
          'Sleep Hours', 'Sample Question Papers Practiced']].values
y = data[['Performance Index']].values

# -----------------------------
# 2️⃣ Feature scaling (manual standardization)
# -----------------------------
X_mean = np.mean(X, axis=0)
X_std = np.std(X, axis=0)
X_scaled = (X - X_mean) / X_std

# Add bias column
B = np.c_[np.ones((X_scaled.shape[0], 1)), X_scaled]

# -----------------------------
# 3️⃣ Initialize theta
# -----------------------------
np.random.seed(42)
curr_theta = np.random.randn(B.shape[1], 1)

# -----------------------------
# 4️⃣ Hyperparameters
# -----------------------------
eta = 0.0003
n_epochs = 5000
m = X.shape[0]

# -----------------------------
# 5️⃣ Batch Gradient Descent
# -----------------------------
for i in range(n_epochs):
    gradient = (2/m) * B.T.dot(B.dot(curr_theta) - y)
    curr_theta = curr_theta - eta * gradient

# -----------------------------
# 6️⃣ Output results
# -----------------------------
theta_list = [float(round(x, 4)) for x in curr_theta.flatten()]
print("θ =", theta_list)

# Predictions for training data
y_pred = B.dot(curr_theta)

# Compute RMSE
rmse = np.sqrt(np.mean((y - y_pred) ** 2))
print(f"RMSE = {rmse:.4f}")

# -----------------------------
# 7️⃣ Prediction for a new student
# -----------------------------
# Example: Hours Studied=5, Previous Scores=80, Extracurricular=Yes (1), Sleep=7, Sample Papers=3
new_student_raw = np.array([[5, 80, 1, 7, 3]])
new_student_scaled = (new_student_raw - X_mean) / X_std
new_student_b = np.c_[np.ones((1, 1)), new_student_scaled]

predicted_performance = new_student_b.dot(curr_theta)
print("Predicted Performance Index:", round(float(predicted_performance[0][0]), 4))


θ = [52.5025, 6.9812, 16.8005, 0.3908, 0.7847, 0.5547]
RMSE = 3.5309
Predicted Performance Index: 63.0521


**Multiple Linear Regression with Stochastic Gradient Descent**

In [9]:
import numpy as np
import pandas as pd

# -----------------------------
# 1️⃣ Load dataset
# -----------------------------
data = pd.read_csv("Student_Performance.csv")

# Encode categorical feature
data['Extracurricular Activities'] = data['Extracurricular Activities'].map({'Yes': 1, 'No': 0})

# Features and target
X = data[['Hours Studied', 'Previous Scores', 'Extracurricular Activities', 
          'Sleep Hours', 'Sample Question Papers Practiced']].values
y = data[['Performance Index']].values

# -----------------------------
# 2️⃣ Feature scaling (manual standardization)
# -----------------------------
X_mean = np.mean(X, axis=0)
X_std = np.std(X, axis=0)
X_scaled = (X - X_mean) / X_std

# Add bias column
B = np.c_[np.ones((X_scaled.shape[0], 1)), X_scaled]

# -----------------------------
# 3️⃣ Initialize theta
# -----------------------------
np.random.seed(42)
curr_theta = np.random.randn(B.shape[1], 1)

# -----------------------------
# 4️⃣ Hyperparameters
# -----------------------------
eta = 0.005
n_epochs = 100
m = X.shape[0]

# -----------------------------
# 5️⃣ Stochastic Gradient Descent (SGD) without regularization
# -----------------------------
for epoch in range(n_epochs):
    for i in range(m):
        xi = B[i].reshape(1, -1)  # single sample
        yi = y[i].reshape(1, -1)
        gradient = 2 * xi.T.dot(xi.dot(curr_theta) - yi)
        curr_theta = curr_theta - eta * gradient

# -----------------------------
# 6️⃣ Output results
# -----------------------------
theta_list = [float(round(x, 4)) for x in curr_theta.flatten()]
print("θ (SGD) =", theta_list)

# Predictions for training data
y_pred = B.dot(curr_theta)

# Compute RMSE
rmse = np.sqrt(np.mean((y - y_pred) ** 2))
print(f"RMSE (SGD) = {rmse:.4f}")

# -----------------------------
# 7️⃣ Prediction for a new student
# -----------------------------
# Example: Hours Studied=5, Previous Scores=80, Extracurricular=Yes (1), Sleep=7, Sample Papers=3
new_student_raw = np.array([[5, 80, 1, 7, 3]])
new_student_scaled = (new_student_raw - X_mean) / X_std
new_student_b = np.c_[np.ones((1, 1)), new_student_scaled]

predicted_performance = new_student_b.dot(curr_theta)
print("Predicted Performance Index (SGD):", round(float(predicted_performance[0][0]), 4))


θ (SGD) = [55.4557, 7.2101, 17.6285, 0.2059, 0.8883, 0.3574]
RMSE (SGD) = 2.0722
Predicted Performance Index (SGD): 66.4605


**Multiple Linear Regression with Mini-Batch Gradient Descent**

In [7]:
import numpy as np
import pandas as pd

# -----------------------------
# 1️⃣ Load dataset
# -----------------------------
data = pd.read_csv("Student_Performance.csv")

# Encode categorical feature
data['Extracurricular Activities'] = data['Extracurricular Activities'].map({'Yes': 1, 'No': 0})

# Features and target
X = data[['Hours Studied', 'Previous Scores', 'Extracurricular Activities', 
          'Sleep Hours', 'Sample Question Papers Practiced']].values
y = data[['Performance Index']].values

# -----------------------------
# 2️⃣ Feature scaling (manual standardization)
# -----------------------------
X_mean = np.mean(X, axis=0)
X_std = np.std(X, axis=0)
X_scaled = (X - X_mean) / X_std

# Add bias column
B = np.c_[np.ones((X_scaled.shape[0], 1)), X_scaled]

# -----------------------------
# 3️⃣ Initialize theta
# -----------------------------
np.random.seed(42)
curr_theta = np.random.randn(B.shape[1], 1)

# -----------------------------
# 4️⃣ Hyperparameters
# -----------------------------
eta = 0.005
n_epochs = 50
batch_size = 10
m = X.shape[0]

# -----------------------------
# 5️⃣ Mini-Batch Gradient Descent (without regularization)
# -----------------------------
for epoch in range(n_epochs):
    indices = np.random.permutation(m)
    B_shuffled = B[indices]
    y_shuffled = y[indices]
    
    for start in range(0, m, batch_size):
        end = start + batch_size
        xi = B_shuffled[start:end]  # mini-batch
        yi = y_shuffled[start:end]
        gradient = 2 * xi.T.dot(xi.dot(curr_theta) - yi)
        curr_theta = curr_theta - eta * gradient

# -----------------------------
# 6️⃣ Output results
# -----------------------------
theta_list = [float(round(x, 4)) for x in curr_theta.flatten()]
print("θ (Mini-Batch GD) =", theta_list)

# Predictions for training data
y_pred = B.dot(curr_theta)

# Compute RMSE
rmse = np.sqrt(np.mean((y - y_pred) ** 2))
print(f"RMSE (Mini-Batch GD) = {rmse:.4f}")

# -----------------------------
# 7️⃣ Prediction for a new student
# -----------------------------
new_student_raw = np.array([[5, 80, 1, 7, 3]])
new_student_scaled = (new_student_raw - X_mean) / X_std
new_student_b = np.c_[np.ones((1, 1)), new_student_scaled]

predicted_performance = new_student_b.dot(curr_theta)
print("Predicted Performance Index (Mini-Batch GD):", round(float(predicted_performance[0][0]), 4))


θ (Mini-Batch GD) = [55.1077, 7.2528, 17.6219, 0.2279, 0.6351, 0.6667]
RMSE (Mini-Batch GD) = 2.0577
Predicted Performance Index (Mini-Batch GD): 65.8901


## **Without Sklearn and With Regularization**

**Stochastic Gradient Descent with Lasso Regularization**

In [18]:
import numpy as np

# Reproducibility
np.random.seed(42)

# Fake dataset
m = 100  
X = np.random.rand(m, 1) * 10
y = 4 * X + 7 + np.random.randn(m, 1)  # true relation + noise

# Add bias column (x0 = 1)
X_b = np.c_[np.ones((m, 1)), X]

# Hyperparameters
eta = 0.01      # learning rate
n_epochs = 1000  # epochs
lambda_reg = 0.1 # Lasso regularization strength

# Initialize theta randomly
theta = np.random.randn(2, 1)

# Stochastic Gradient Descent with Lasso
for epoch in range(n_epochs):
    indices = np.arange(m)
    np.random.shuffle(indices)
    
    for i in indices:
        Xi = X_b[i:i+1]        # single sample
        yi = y[i:i+1]
        
        # standard gradient
        gradient = (2/m) * Xi.T.dot(Xi.dot(theta) - yi)
        
        # Lasso penalty (sign(theta)), exclude bias
        lasso_term = (lambda_reg/m) * np.sign(theta)
        lasso_term[0] = 0
        
        gradient += lasso_term
        
        # update step
        theta -= eta * gradient

print("Final parameters (theta):")
print(f"Theta 0 (bias): {theta[0][0]:.4f}")
print(f"Theta 1 (slope): {theta[1][0]:.4f}")

# Prediction for X=7
pred = theta[0] + theta[1] * 7
print(f"Predicted value for X=7: {pred[0]:.2f}")

# RMSE on training data
y_pred = X_b.dot(theta)
rmse = np.sqrt(np.mean((y - y_pred) ** 2))
print(f"RMSE on training data: {rmse:.4f}")


Final parameters (theta):
Theta 0 (bias): 7.2156
Theta 1 (slope): 3.9531
Predicted value for X=7: 34.89
RMSE on training data: 0.8981


**Batch Gradient Descent with Ridge Regularization**

In [16]:
import numpy as np

np.random.seed(42)

m = 100  
X = np.random.rand(m, 1) * 10
y = 4 * X + 7 + np.random.randn(m, 1)  

# Add bias column (x0 = 1)
X_b = np.c_[np.ones((m, 1)), X]


eta = 0.01       # learning rate
n_iter = 1000    # iterations
lambda_re = 0.1 # Ridge regularization strength

# Initialize theta randomly
theta = np.random.randn(2, 1)

# Batch Gradient Descent with Ridge
for iteration in range(n_iter):
    gradients = (2/m) * X_b.T.dot(X_b.dot(theta) - y)

    # Ridge penalty (only for slope, not bias)
    ridge_term = (lambda_reg/m) * theta
    ridge_term[0] = 0   # do not regularize bias

    gradients += ridge_term

    theta -= eta * gradients

print("Final parameters (theta):")
print(f"Theta 0 : {theta[0][0]:.4f}")
print(f"Theta 1 : {theta[1][0]:.4f}")

# Prediction for X=8
pred = theta[0] + theta[1] * 8
print(f"Predicted value for X=8: {pred[0]:.2f}")

# RMSE on training data
y_pred = X_b.dot(theta)
rmse = np.sqrt(np.mean((y - y_pred) ** 2))
print(f"RMSE on training data: {rmse:.4f}")


Final parameters (theta):
Theta 0 : 7.1906
Theta 1 : 3.9577
Predicted value for X=8: 38.85
RMSE on training data: 0.8982


**Mini-Batch Gradient Descent with Elastic Net Regularization**

In [7]:
import numpy as np

# Reproducibility
np.random.seed(42)

m = 100  
X = np.random.rand(m, 1) * 10
y = 4 * X + 7 + np.random.randn(m, 1)

# Add bias column (x0 = 1)
X_b = np.c_[np.ones((m, 1)), X]

# Hyperparameters
eta = 0.05          # learning rate
n_epochs = 1000     # epochs
batch_size = 15     # mini-batch size
alpha = 0.1         # overall regularization strength
r = 0.5      # 0 = ridge, 1 = lasso

# Initialize theta randomly
theta = np.random.randn(2, 1)

# Mini-Batch Gradient Descent with Elastic Net
for epoch in range(n_epochs):
    indices = np.arange(m)
    np.random.shuffle(indices)
    
    for start in range(0, m, batch_size):
        end = start + batch_size
        batch_idx = indices[start:end]
        
        Xi = X_b[batch_idx]
        yi = y[batch_idx]
        
        # Standard gradient
        gradient = (2/m) * Xi.T.dot(Xi.dot(theta) - yi)
        
        # Elastic Net penalty
        l1_term = (alpha * r / m) * np.sign(theta)   # Lasso
        l2_term = (alpha * (1 - r) / m) * theta      # Ridge
        penalty = l1_term + l2_term
        penalty[0] = 0  
        
        gradient += penalty
        
        theta -= eta * gradient

print("Final parameters (theta):")
print(f"Theta 0 (bias): {theta[0][0]:.4f}")
print(f"Theta 1 (slope): {theta[1][0]:.4f}")

# Prediction for X=7
pred = theta[0] + theta[1] * 7
print(f"Predicted value for X=7: {pred[0]:.2f}")

# RMSE on training data
y_pred = X_b.dot(theta)
rmse = np.sqrt(np.mean((y - y_pred) ** 2))
print(f"RMSE on training data: {rmse:.4f}")


Final parameters (theta):
Theta 0 (bias): 7.2187
Theta 1 (slope): 3.9597
Predicted value for X=7: 34.94
RMSE on training data: 0.8988


## **Without Sklearn and With Regularization**

**Batch Gradient Descent with Ridge Regularization**

In [1]:
import numpy as np
import pandas as pd

data = pd.read_csv("Student_Performance.csv")

data['Extracurricular Activities'] = data['Extracurricular Activities'].map({'Yes': 1, 'No': 0})

# Features and target
X = data[['Hours Studied', 'Previous Scores', 'Extracurricular Activities', 
          'Sleep Hours', 'Sample Question Papers Practiced']].values
y = data[['Performance Index']].values


# Feature scaling (manual standardization)

X_mean = np.mean(X, axis=0)
X_std = np.std(X, axis=0)
X_scaled = (X - X_mean) / X_std

# Add bias column
B = np.c_[np.ones((X_scaled.shape[0], 1)), X_scaled]


np.random.seed(42)
curr_theta = np.random.randn(B.shape[1], 1)

eta = 0.0005       # learning rate
n_epochs = 5000
alpha = 0.01       # Ridge regularization strength
m = X.shape[0]


# Ridge Batch Gradient Descent

for i in range(n_epochs):
    # Compute gradient
    gradient = (2/m) * B.T.dot(B.dot(curr_theta) - y)
    gradient[1:] += (2 * alpha / m) * curr_theta[1:]  # regularize only weights, not bias
    # Update theta
    curr_theta = curr_theta - eta * gradient


# Final theta
theta_list = [float(round(x, 4)) for x in curr_theta.flatten()]
print("θ =", theta_list)

# Predictions for training data
y_pred = B.dot(curr_theta)

# Compute RMSE
rmse = np.sqrt(np.mean((y - y_pred) ** 2))
print(f"RMSE = {rmse:.4f}")


#  Prediction for a new student

# Example: Hours Studied=5, Previous Scores=80, Extracurricular=Yes (1), Sleep=7, Sample Papers=3
new_student_raw = np.array([[5, 80, 1, 7, 3]])

new_student_scaled = (new_student_raw - X_mean) / X_std
new_student_b = np.c_[np.ones((1, 1)), new_student_scaled]

predicted_performance = new_student_b.dot(curr_theta)
print("Predicted Performance Index:", round(float(predicted_performance[0][0]), 4))


θ = [54.857, 7.329, 17.544, 0.3201, 0.8129, 0.5591]
RMSE = 2.0746
Predicted Performance Index: 65.7938


**Stochastic Gradient Descent with Lasso Regularization**

In [1]:
import numpy as np
import pandas as pd


data = pd.read_csv("Student_Performance.csv")

data['Extracurricular Activities'] = data['Extracurricular Activities'].map({'Yes': 1, 'No': 0})

# Features and target
X = data[['Hours Studied', 'Previous Scores', 'Extracurricular Activities', 
          'Sleep Hours', 'Sample Question Papers Practiced']].values
y = data[['Performance Index']].values


# Feature scaling (manual standardization)

X_mean = np.mean(X, axis=0)
X_std = np.std(X, axis=0)
X_scaled = (X - X_mean) / X_std

# Add bias column
B = np.c_[np.ones((X_scaled.shape[0], 1)), X_scaled]


np.random.seed(42)
curr_theta = np.random.randn(B.shape[1], 1)

eta = 0.0001       # learning rate
n_epochs = 50
alpha = 0.1        # Lasso regularization strength
m = X.shape[0]


# Stochastic Gradient Descent with Lasso (L1)

for epoch in range(n_epochs):
    for i in range(m):
        xi = B[i].reshape(1, -1)  # single sample
        yi = y[i].reshape(1, -1)
        gradient = 2 * xi.T.dot(xi.dot(curr_theta) - yi)
        # L1 regularization (exclude bias)
        gradient[1:] += (alpha/m) * np.sign(curr_theta[1:])
        curr_theta = curr_theta - eta * gradient

theta_list = [float(round(x, 4)) for x in curr_theta.flatten()]
print("θ =", theta_list)

# Predictions for training data
y_pred = B.dot(curr_theta)

# Compute RMSE
rmse = np.sqrt(np.mean((y - y_pred) ** 2))
print(f"RMSE = {rmse:.4f}")

# Prediction for a new student
# Example: Hours Studied=5, Previous Scores=80, Extracurricular=Yes (1), Sleep=7, Sample Papers=3
new_student_raw = np.array([[5, 80, 1, 7, 3]])
new_student_scaled = (new_student_raw - X_mean) / X_std
new_student_b = np.c_[np.ones((1, 1)), new_student_scaled]

predicted_performance = new_student_b.dot(curr_theta)
print("Predicted Performance Index:", round(float(predicted_performance[0][0]), 4))


θ = [55.2368, 7.3879, 17.6651, 0.2918, 0.8224, 0.5564]
RMSE = 2.0376
Predicted Performance Index: 66.2231


**Mini-Batch Gradient Descent with Elastic Net Regularization**

In [1]:
import numpy as np
import pandas as pd


data = pd.read_csv("Student_Performance.csv")


data['Extracurricular Activities'] = data['Extracurricular Activities'].map({'Yes': 1, 'No': 0})

# Features and target
X = data[['Hours Studied', 'Previous Scores', 'Extracurricular Activities', 
          'Sleep Hours', 'Sample Question Papers Practiced']].values
y = data[['Performance Index']].values


# Feature scaling (manual standardization)

X_mean = np.mean(X, axis=0)
X_std = np.std(X, axis=0)
X_scaled = (X - X_mean) / X_std

# Add bias column
B = np.c_[np.ones((X_scaled.shape[0], 1)), X_scaled]


np.random.seed(42)
curr_theta = np.random.randn(B.shape[1], 1)


eta = 0.00001
n_epochs = 50
batch_size = 10
alpha1 = 0.05  # L1 strength (Lasso)
alpha2 = 0.05  # L2 strength (Ridge)
m = X.shape[0]


# Mini-Batch Gradient Descent with Elastic Net

for epoch in range(n_epochs):
    indices = np.random.permutation(m)
    B_shuffled = B[indices]
    y_shuffled = y[indices]
    
    for start in range(0, m, batch_size):
        end = start + batch_size
        xi = B_shuffled[start:end]
        yi = y_shuffled[start:end]
        
        gradient = 2 * xi.T.dot(xi.dot(curr_theta) - yi)
        # Elastic Net regularization (exclude bias)
        gradient[1:] += (2 * alpha2 / m) * curr_theta[1:] + (alpha1 / m) * np.sign(curr_theta[1:])
        
        curr_theta = curr_theta - eta * gradient

theta_list = [float(round(x, 4)) for x in curr_theta.flatten()]
print("θ =", theta_list)

# Predictions for training data
y_pred = B.dot(curr_theta)

# Compute RMSE
rmse = np.sqrt(np.mean((y - y_pred) ** 2))
print(f"RMSE = {rmse:.4f}")


# Prediction for a new student

new_student_raw = np.array([[5, 80, 1, 7, 3]])
new_student_scaled = (new_student_raw - X_mean) / X_std
new_student_b = np.c_[np.ones((1, 1)), new_student_scaled]

predicted_performance = new_student_b.dot(curr_theta)
print("Predicted Performance Index:", round(float(predicted_performance[0][0]), 4))


θ = [55.222, 7.3867, 17.6605, 0.3069, 0.8146, 0.5567]
RMSE = 2.0375
Predicted Performance Index: 66.2183


## **Wit Sklearn and No Regularization**

**Simple Linear Regression with Batch Gradient Descent**

In [5]:
import numpy as np
from sklearn.metrics import mean_squared_error

# Generate fake dataset
np.random.seed(0)
m = 100
X = np.random.rand(m, 1) * 10
y = 4*X + 7 + np.random.randn(m, 1)  # true relation with noise

# Add bias column (x0 = 1)
X_b = np.c_[np.ones((m, 1)), X]

# Initialize parameters
theta = np.random.randn(2, 1)  # random initialization for [theta0, theta1]

# Hyperparameters
eta = 0.01
n_iter = 1000

# Batch Gradient Descent
for iteration in range(n_iter):
    gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)
    theta = theta - eta * gradients

# Final parameters
theta0, theta1 = theta[0, 0], theta[1, 0]
print("Final parameters (theta) using Batch Gradient Descent:")
print(f"Theta 0: {theta0:.4f}")
print(f"Theta 1: {theta1:.4f}")

# Predictions for training data
y_pred = X_b.dot(theta)

# RMSE
rmse = np.sqrt(mean_squared_error(y, y_pred))
print(f"RMSE on training data: {rmse:.4f}")

# Predict salary for 7 years experience (or X=7)
x_new = np.array([[1, 7]])  # include bias term
pred_y = x_new.dot(theta)
print(f"Predicted value for X=7: {pred_y[0,0]:.2f}")


Final parameters (theta) using Batch Gradient Descent:
Theta 0: 7.1896
Theta 1: 3.9988
RMSE on training data: 0.9964
Predicted value for X=7: 35.18


**Simple Linear Regression with Stochastic Gradient Descent**

In [4]:
import numpy as np
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error

# Generate fake dataset
np.random.seed(0)
m = 100
X = np.random.rand(m, 1) * 10
y = 4*X + 7 + np.random.randn(m, 1)  # true relation with noise
y = y.ravel()  # SGDRegressor expects 1D target

# Create SGD Regressor
model = SGDRegressor(
    max_iter=1000,       # number of epochs
    learning_rate='constant',
    eta0=0.01,           # learning rate
    penalty=None,        # no regularization
    random_state=42
)

# Train model
model.fit(X, y)

# Get parameters
theta0 = model.intercept_[0]
theta1 = model.coef_[0]
print("Parameters using SGDRegressor:")
print(f"Theta 0: {theta0:.4f}")
print(f"Theta 1: {theta1:.4f}")

# Predictions
y_pred = model.predict(X)

# RMSE
rmse = np.sqrt(mean_squared_error(y, y_pred))
print(f"RMSE on training data: {rmse:.4f}")

# Predict for X=7
pred_y = model.predict([[7]])
print(f"Predicted value for X=7: {pred_y[0]:.2f}")


Parameters using SGDRegressor:
Theta 0: 7.1999
Theta 1: 4.0404
RMSE on training data: 1.0246
Predicted value for X=7: 35.48


**Simple Linear Regression with Mini-Batch Gradient Descent**

In [4]:
import numpy as np
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error

# Reproducibility
np.random.seed(42)

# Generate fake dataset
m = 100
X = np.random.rand(m, 1) * 10      # Feature (YearsExperience)
y = 4*X + 7 + np.random.randn(m, 1)  # True relation + noise
y = y.ravel()  # SGDRegressor expects 1D target

# Initialize SGDRegressor for Mini-Batch GD
model = SGDRegressor(
    max_iter=1,
    learning_rate='constant',
    eta0=0.01,
    penalty=None,        # no regularization
    random_state=42,
    warm_start=True
)

n_epochs = 1000
batch_size = 30

# Mini-Batch training loop
for epoch in range(n_epochs):
    indices = np.arange(m)
    np.random.shuffle(indices)  # shuffle data each epoch
    
    for start in range(0, m, batch_size):
        end = start + batch_size
        batch_idx = indices[start:end]
        X_batch = X[batch_idx]
        y_batch = y[batch_idx]
        
        model.partial_fit(X_batch, y_batch)

# Get parameters
theta0 = model.intercept_[0]
theta1 = model.coef_[0]
print("Final parameters (Mini-Batch GD on random data):")
print(f"Theta 0: {theta0:.4f}")
print(f"Theta 1: {theta1:.4f}")

# Predict value for X=7
pred_y = model.predict([[7]])
print(f"Predicted value for X=7: {pred_y[0]:.2f}")

# Compute RMSE on training data
y_pred = model.predict(X)
rmse = np.sqrt(mean_squared_error(y, y_pred))
print(f"RMSE on training data: {rmse:.4f}")


Final parameters (Mini-Batch GD on random data):
Theta 0: 7.2153
Theta 1: 4.0030
Predicted value for X=7: 35.24
RMSE on training data: 0.9384


## **Wit Sklearn and No Regularization**

**Multiple Linear Regression with Batch Gradient Descent**

In [5]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler

# Load data
data = pd.read_csv("Student_Performance.csv")

# Encode categorical variable
data['Extracurricular Activities'] = data['Extracurricular Activities'].map({'Yes': 1, 'No': 0})

# Features and target
X = data[['Hours Studied', 'Previous Scores', 'Extracurricular Activities', 
          'Sleep Hours', 'Sample Question Papers Practiced']].values
y = data[['Performance Index']].values

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Linear Regression on standardized features
model = LinearRegression()
model.fit(X_scaled, y)

# Print coefficients (theta)
print("Coefficients (theta):", np.round(model.coef_[0], 4))
print("Intercept (theta0):", np.round(model.intercept_[0], 4))

# Predictions on training data
y_pred = model.predict(X_scaled)
rmse = np.sqrt(mean_squared_error(y, y_pred))
print("RMSE on training data:", np.round(rmse, 4))

# Prediction for a new student
new_student = np.array([[5, 80, 1, 7, 3]])
new_student_scaled = scaler.transform(new_student)
predicted_performance = model.predict(new_student_scaled)
print("Predicted Performance Index:", np.round(predicted_performance[0][0], 4))


Coefficients (theta): [ 7.3869 17.662   0.3064  0.8149  0.5557]
Intercept (theta0): 55.2248
RMSE on training data: 2.0375
Predicted Performance Index: 66.2223


**Multiple Linear Regression with Stochastic Gradient Descent**

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

# Load data
data = pd.read_csv("Student_Performance.csv")

# Encode categorical variable
data['Extracurricular Activities'] = data['Extracurricular Activities'].map({'Yes': 1, 'No': 0})

# Features and target
X = data[['Hours Studied', 'Previous Scores', 'Extracurricular Activities', 
          'Sleep Hours', 'Sample Question Papers Practiced']].values
y = data['Performance Index'].values  # 1D array for SGDRegressor

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# SGD Regressor (Stochastic Gradient Descent) WITHOUT regularization
sgd_model = SGDRegressor(
    max_iter=1000,        # number of epochs
    learning_rate='constant',
    eta0=0.01,            # adjust as needed for convergence
    penalty=None,         # no regularization
    random_state=42,
    shuffle=True
)

# Fit model
sgd_model.fit(X_scaled, y)

# Coefficients and intercept
print("Coefficients (theta):", np.round(sgd_model.coef_, 4))
print("Intercept (theta0):", np.round(sgd_model.intercept_[0], 4))

# RMSE on training data
y_pred = sgd_model.predict(X_scaled)
rmse = np.sqrt(mean_squared_error(y, y_pred))
print("RMSE on training data:", np.round(rmse, 4))

# Prediction for a new student
new_student = np.array([[5, 80, 1, 7, 3]])
new_student_scaled = scaler.transform(new_student)
predicted_performance = sgd_model.predict(new_student_scaled)
print("Predicted Performance Index:", np.round(predicted_performance[0], 4))


Coefficients (theta): [ 7.6044 17.6104  0.2578  0.8618  0.3464]
Intercept (theta0): 54.9472
RMSE on training data: 2.0799
Predicted Performance Index: 65.9934


**Multiple Linear Regression with Mini-Batch Gradient Descent**

In [3]:
import numpy as np
import pandas as pd
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

# Load data
data = pd.read_csv("Student_Performance.csv")

# Encode categorical feature
data['Extracurricular Activities'] = data['Extracurricular Activities'].map({'Yes': 1, 'No': 0})

# Features and target
X = data[['Hours Studied', 'Previous Scores', 'Extracurricular Activities', 
          'Sleep Hours', 'Sample Question Papers Practiced']].values
y = data['Performance Index'].values

# Optional: scale features for better convergence
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# SGD Regressor for Mini-Batch Gradient Descent
sgd = SGDRegressor(max_iter=1,          # one epoch at a time
                   eta0=0.00001,        # learning rate
                   learning_rate='constant',
                   penalty=None,        # no regularization
                   warm_start=True,     # continue training over multiple calls to fit
                   shuffle=True)

batch_size = 10
n_epochs = 50
m = X.shape[0]

# Mini-Batch Training
for epoch in range(n_epochs):
    indices = np.random.permutation(m)
    X_shuffled = X_scaled[indices]
    y_shuffled = y[indices]
    
    for start in range(0, m, batch_size):
        end = start + batch_size
        xi = X_shuffled[start:end]
        yi = y_shuffled[start:end]
        sgd.partial_fit(xi, yi)

print("Coefficients:", sgd.coef_)
print("Intercept:", sgd.intercept_)

# Predict for new student
new_student = np.array([[5, 80, 1, 7, 3]])
new_student_scaled = scaler.transform(new_student)
predicted_performance = sgd.predict(new_student_scaled)
print("Predicted Performance Index:", predicted_performance[0])

# RMSE on training data
y_pred = sgd.predict(X_scaled)
rmse = np.sqrt(mean_squared_error(y, y_pred))
print("RMSE on Training Data:", rmse)

Coefficients: [ 7.32943194 17.54003076  0.30992607  0.81353109  0.56162621]
Intercept: [54.85273187]
Predicted Performance Index: 65.77569603867418
RMSE on Training Data: 2.0755248390066705


## **With Sklearn and With Regularization**

**Batch Gradient Descent with Ridge Regularization**

In [11]:
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

# Generate fake dataset
np.random.seed(0)
m = 100
X = np.random.rand(m, 1) * 10
y = 4*X + 7 + np.random.randn(m, 1)  # true relation with noise

# Train Ridge Regression model
ridge_reg = Ridge(alpha=0.01, fit_intercept=True)  # alpha = λ
ridge_reg.fit(X, y)

# Extract parameters as scalars
theta0 = ridge_reg.intercept_.item()  # bias
theta1 = ridge_reg.coef_[0]           # slope
print("Final parameters (theta) using sklearn Ridge:")
print(f"Theta 0 (bias): {theta0:.4f}")
print(f"Theta 1 (slope): {theta1:.4f}")

# Predictions for training data
y_pred = ridge_reg.predict(X)

# RMSE
rmse = np.sqrt(mean_squared_error(y, y_pred))
print(f"RMSE on training data: {rmse:.4f}")

# Predict salary for 7 years experience (or X=7)
pred_y = ridge_reg.predict([[7]])
print(f"Predicted value for X=7: {pred_y[0]:.2f}")


Final parameters (theta) using sklearn Ridge:
Theta 0 (bias): 7.2224
Theta 1 (slope): 3.9936
RMSE on training data: 0.9962
Predicted value for X=7: 35.18


**Stochastic Gradient Descent with Lasso Regularization**

In [5]:
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error

# ========================
# 1. Generate fake dataset
# ========================
np.random.seed(0)
m = 100
X = np.random.rand(m, 1) * 10
y = 4*X + 7 + np.random.randn(m, 1)
y = y.ravel()  # flatten

# ========================
# 2. Standardize features
# ========================
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# ========================
# 3. SGD Regressor for Lasso
# ========================
# penalty='l1' → Lasso, learning_rate='constant' for fixed learning rate
sgd_lasso = SGDRegressor(
    penalty='l1',       # Lasso
    alpha=0.01,         # regularization strength
    max_iter=1000,      # total epochs
    learning_rate='constant',
    eta0=0.01,          # learning rate
    random_state=42
)

# Train the model
sgd_lasso.fit(X_scaled, y)

# ========================
# 4. Convert parameters to original scale
# ========================
theta1 = sgd_lasso.coef_[0] / scaler.scale_[0]
theta0 = sgd_lasso.intercept_[0] - theta1 * scaler.mean_[0]

print("Stochastic SGD Lasso (sklearn) final parameters:")
print(f"Theta 0 (bias): {theta0:.4f}")
print(f"Theta 1 (slope): {theta1:.4f}")

# ========================
# 5. RMSE on training data
# ========================
y_pred = sgd_lasso.predict(X_scaled)
rmse = np.sqrt(mean_squared_error(y, y_pred))
print(f"RMSE: {rmse:.4f}")

# ========================
# 6. Predict for X=7
# ========================
x_test_scaled = scaler.transform([[7]])
pred_y = sgd_lasso.predict(x_test_scaled)
print(f"Predicted value for X=7: {pred_y[0]:.2f}")


Stochastic SGD Lasso (sklearn) final parameters:
Theta 0 (bias): 7.3159
Theta 1 (slope): 3.9809
RMSE: 0.9975
Predicted value for X=7: 35.18


**Mini-Batch Gradient Descent with Elastic Net Regularization**

In [9]:
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error

# ========================
# 1. Generate fake dataset
# ========================
np.random.seed(0)
m = 100
X = np.random.rand(m, 1) * 10
y = 4*X + 7 + np.random.randn(m, 1)
y = y.ravel()  # flatten

# ========================
# 2. Train-test split (optional)
# ========================
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ========================
# 3. Standardize features
# ========================
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# ========================
# 4. SGD Regressor for Elastic Net
# ========================
sgd_elastic = SGDRegressor(
    penalty='elasticnet',  # Elastic Net
    l1_ratio=0.5,          # ratio of L1 to L2 
    alpha=0.01,            # regularization strength
    max_iter=1000,         # epochs
    learning_rate='constant',
    eta0=0.01,             # learning rate
    random_state=42,
    shuffle=True           # enables mini-batch behavior
)

# Train the model
sgd_elastic.fit(X_train_scaled, y_train)

# ========================
# 5. Convert parameters to original scale
# ========================
theta1 = sgd_elastic.coef_[0] / scaler.scale_[0]
theta0 = sgd_elastic.intercept_[0] - theta1 * scaler.mean_[0]

print("Mini-batch SGD Elastic Net final parameters:")
print(f"Theta 0 (bias): {theta0:.4f}")
print(f"Theta 1 (slope): {theta1:.4f}")



# ========================
# RMSE on testing data
# ========================
y_test_pred = sgd_elastic.predict(X_test_scaled)
rmse_test = np.sqrt(mean_squared_error(y_test, y_test_pred))
print(f"RMSE on testing data: {rmse_test:.4f}")

# ========================
# 8. Predict for X=7
# ========================
x_test_scaled = scaler.transform([[7]])
pred_y = sgd_elastic.predict(x_test_scaled)
print(f"Predicted value for X=7: {pred_y[0]:.2f}")


Mini-batch SGD Elastic Net final parameters:
Theta 0 (bias): 7.3477
Theta 1 (slope): 3.9681
RMSE on testing data: 0.9550
Predicted value for X=7: 35.12


## **Without Sklearn and With Regularization**

**Batch Gradient Descent with Ridge Regularization**

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

# =======================
# 1. Load and Prepare Data
# =======================
data = pd.read_csv("Student_Performance.csv")

# Convert Yes/No to 1/0
data["Extracurricular Activities"] = data["Extracurricular Activities"].map({"Yes": 1, "No": 0})

X = data[[
    "Hours Studied",
    "Previous Scores",
    "Extracurricular Activities",
    "Sleep Hours",
    "Sample Question Papers Practiced"
]].values
y = data["Performance Index"].values

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# =======================
# 2. Standardize Features
# =======================
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.transform(X_test)

# =======================
# 3. Ridge Regression
# =======================
alpha = 0.1  # regularization parameter
ridge_model = Ridge(alpha=alpha, solver='auto', max_iter=1500, random_state=42)
ridge_model.fit(X_train_scaled, y_train)

# =======================
# 4. Get Coefficients (theta)
# =======================
theta = np.r_[ridge_model.intercept_, ridge_model.coef_]  # include bias as theta_0
print("Theta :", theta)

# =======================
# 5. Evaluate Model
# =======================
y_pred = ridge_model.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

print(f" MSE: {mse:.4f}")
print(f" RMSE: {rmse:.4f}")


Theta : [55.3115      7.401246   17.63704981  0.30428766  0.81002151  0.54883857]
 MSE: 4.0827
 RMSE: 2.0206


**Stochastic Gradient Descent with Lasso Regularization**

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error

# =======================
# 1. Load and Prepare Data
# =======================
data = pd.read_csv("Student_Performance.csv")

# Convert Yes/No to 1/0
data["Extracurricular Activities"] = data["Extracurricular Activities"].map({"Yes": 1, "No": 0})

X = data[[
    "Hours Studied",
    "Previous Scores",
    "Extracurricular Activities",
    "Sleep Hours",
    "Sample Question Papers Practiced"
]].values
y = data["Performance Index"].values

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# =======================
# 2. Standardize Features
# =======================
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.transform(X_test)

# =======================
# 3. Lasso Regression using SGD
# =======================
alpha = 0.1  # regularization strength (like lambda in manual Lasso)
n_epochs = 1500
learning_rate = 0.01

lasso_sgd = SGDRegressor(
    penalty='l1',        # Lasso
    alpha=alpha,
    max_iter=n_epochs,
    learning_rate='constant',
    eta0=learning_rate,
    random_state=42
)

lasso_sgd.fit(X_train_scaled, y_train)

# =======================
# 4. Get Coefficients (theta)
# =======================
theta = np.r_[lasso_sgd.intercept_, lasso_sgd.coef_]  # intercept + coefficients
print("Theta (intercept + coefficients):", theta)

# =======================
# 5. Evaluate Model
# =======================

y_test_pred = lasso_sgd.predict(X_test_scaled)



test_mse = mean_squared_error(y_test, y_test_pred)
test_rmse = np.sqrt(test_mse)

print(f"Test MSE: {test_mse:.4f}, Test RMSE: {test_rmse:.4f}")


Theta (intercept + coefficients): [55.30595436  7.48410452 17.54367898  0.13035214  0.49154551  0.74818848]
Test MSE: 4.2969, Test RMSE: 2.0729


**Mini-Batch Gradient Descent with Elastic Net Regularization**

In [14]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error

# =======================
# 1. Load and Prepare Data
# =======================
data = pd.read_csv("Student_Performance.csv")
data["Extracurricular Activities"] = data["Extracurricular Activities"].map({"Yes": 1, "No": 0})

X = data[[
    "Hours Studied",
    "Previous Scores",
    "Extracurricular Activities",
    "Sleep Hours",
    "Sample Question Papers Practiced"
]].values
y = data["Performance Index"].values

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# =======================
# 2. Standardize Features
# =======================
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.transform(X_test)

# =======================
# 3. Elastic Net Regression with Mini-Batch SGD
# =======================
alpha = 0.1       # regularization strength
l1_ratio = 0.5    # L1 vs L2 ratio
learning_rate = 0.01
n_epochs = 300
batch_size = 64   # mini-batch size

elasticnet_sgd = SGDRegressor(
    penalty='elasticnet',
    alpha=0.1,
    l1_ratio=0.5,
    max_iter=500,
    learning_rate='constant',
    eta0=0.01,
    random_state=42,
    shuffle=True
)
elasticnet_sgd.fit(X_train_scaled, y_train)


# Mini-batch training
for epoch in range(n_epochs):
    permutation = np.random.permutation(len(X_train_scaled))
    X_shuffled = X_train_scaled[permutation]
    y_shuffled = y_train[permutation]
    
    for i in range(0, len(X_train_scaled), batch_size):
        X_batch = X_shuffled[i:i+batch_size]
        y_batch = y_shuffled[i:i+batch_size]
        elasticnet_sgd.partial_fit(X_batch, y_batch)


theta = np.r_[elasticnet_sgd.intercept_, elasticnet_sgd.coef_]  # intercept + coefficients
print("Theta (intercept + coefficients):", theta)


y_test_pred = elasticnet_sgd.predict(X_test_scaled)

test_mse = mean_squared_error(y_test, y_test_pred)
test_rmse = np.sqrt(test_mse)

print(f"Test MSE: {test_mse:.4f}, Test RMSE: {test_rmse:.4f}")



Theta (intercept + coefficients): [55.21669077  6.83117165 16.89344438  0.30812886  0.58569076  0.48996788]
Test MSE: 5.1998, Test RMSE: 2.2803


**Comparision**

### **Without Using Scikit-Learn — Without Regularization**

| Method                     | Simple Linear Regression (θ, RMSE)                     | Multiple Linear Regression (θ, RMSE)                                 |
|---------------------------|---------------------------------------------------------|-----------------------------------------------------------------------|
| **Batch Gradient Descent** | θ = [7.1896, 3.9988]<br>RMSE = 0.9964                  | θ = [52.5025, 6.9812, 16.8005, 0.3908, 0.7847, 0.5547]<br>RMSE = 3.5309 |
| **Stochastic Gradient Descent** | θ = [7.1095, 3.9871]<br>RMSE = 0.9048            | θ = [55.4557, 7.2101, 17.6285, 0.2059, 0.8883, 0.3574]<br>RMSE = 2.0722 |
| **Mini-Batch Gradient Descent** | θ = [7.0763, 3.9755]<br>RMSE = 0.9011            | θ = [55.1077, 7.2528, 17.6219, 0.2279, 0.6351, 0.6667]<br>RMSE = 2.0577 |


### **Without Using Scikit-Learn — With Regularization**

| Method                     | Simple Linear Regression (θ, RMSE)                      | Multiple Linear Regression (θ, RMSE)                                 |
|---------------------------|----------------------------------------------------------|-----------------------------------------------------------------------|
| **Batch Gradient Descent** | θ = [7.1906, 3.9577]<br>RMSE = 0.8982                   | θ = [54.857, 7.329, 17.544, 0.3201, 0.8129, 0.5591]<br>RMSE = 2.0746 |
| **Stochastic Gradient Descent** | θ = [7.2156, 3.9531]<br>RMSE = 0.8981             | θ = [55.2368, 7.3879, 17.6651, 0.2918, 0.8224, 0.5564]<br>RMSE = 2.0376 |
| **Mini-Batch Gradient Descent** | θ = [7.2187, 3.9597]<br>RMSE = 0.8988             | θ = [55.222, 7.3867, 17.6605, 0.3069, 0.8146, 0.5567]<br>RMSE = 2.0375 |


### **Using Scikit-Learn — Without Regularization**

| Method                     | Simple Linear Regression (θ, RMSE)                   | Multiple Linear Regression (θ, RMSE)                                  |
|---------------------------|-------------------------------------------------------|------------------------------------------------------------------------|
| **Batch Gradient Descent** | θ = [7.2222, 3.9937]<br>RMSE = 0.9962               | θ = [55.2248, 7.3869, 17.662, 0.3064, 0.8149, 0.5557]<br>RMSE = 2.0375 |
| **Stochastic Gradient Descent** | θ = [7.1999, 4.0404]<br>RMSE = 1.0246         | θ = [54.9472, 7.6044, 17.6104, 0.2578, 0.8618, 0.3464]<br>RMSE = 2.0799 |
| **Mini-Batch Gradient Descent** | θ = [7.2153, 4.0030]<br>RMSE = 0.9384         | θ = [55.1077, 7.2528, 17.6219, 0.2279, 0.6351, 0.6667]<br>RMSE = 2.0755 |


### **Using Scikit-Learn — With Regularization**

| Method                     | Simple Linear Regression (θ, RMSE)                     | Multiple Linear Regression (θ, RMSE)                                   |
|---------------------------|----------------------------------------------------------|-------------------------------------------------------------------------|
| **Batch Gradient Descent** | θ = [7.2224, 3.9936]<br>RMSE = 0.9962                  | θ = [55.3115, 7.4012, 17.6370, 0.3042, 0.8100, 0.5488]<br>RMSE = 2.0206 |
| **Stochastic Gradient Descent** | θ = [7.3159, 3.9809]<br>RMSE = 0.9975            | θ = [55.3059, 7.4841, 17.5436, 0.1303, 0.4915, 0.7481]<br>RMSE = 2.0729 |
| **Mini-Batch Gradient Descent** | θ = [7.2917, 3.9689]<br>RMSE = 0.9561            | θ = [55.1671, 7.0243, 16.6474, 0.1142, 0.5722, 0.6656]<br>RMSE = 2.026 |


## **Conclusion**

The experiments clearly show that **regularization significantly improves linear regression models** by reducing overfitting and improving generalization.

Without regularization, models consistently produced **higher RMSE** (e.g., 3.5309), indicating poor learning and overfitting.

With regularization applied:

- RMSE dropped notably  
- Model complexity was controlled  
- Overfitting reduced  
- Predictions became more reliable  

The best performance occurred in **Multiple Linear Regression using scikit-learn with regularization**, achieving the lowest RMSE of **2.0206**.

Overall, **regularization is the most critical strategy** for building accurate and robust linear regression models.
