## Simple Linear Regression From Scratch 

#### Goal: Find the best straight line through data points
#### Formula: y = wx + b

- Where:
  - w = slope (how steep the line is)
  - b = y - intercept (where line crosses y-axis)
  - x = input (independent variable)
  - y = output (dependent variable)

--------------------------------------------------------------------------------
### OOP Concepts Covered:
1. Class - Blueprint for creating regression models
2. Constructor `(__init__)` - Initialize model when created
3. Instance Variables (self.w, self.b) - Data stored in each model
4. Methods (fit, predict, evaluate) - Actions the model can perform
5. Encapsulation - Bundling data and methods together

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import math
import random

In [4]:
class SimpleLinearRegression():
    """
    This class represents ONE linear regression model.
    Each model learns its own slope(w) and intercept(b).
    """

    #CONSTRUCTOR: Initialize a new model
    def __init__(self, learning_rate = 0.01, iterations = 1000):
        """
        1. python creates a new object in memeory.
        2. __init__ is called automatically.
        3. self.lr gets set to 0.01
        4. self.iterations gets set to 1000
        5. Other attributes initialized to staring values.
        """

        #Instance Variable (Attributes)
        #These belong to THIS specific model instance.
        #Different models can have different values.

        #Store hyperparameters (settings we choose)
        self.lr = learning_rate
        self.iterations = iterations

        #Model parameters (What the model LEARNS)
        self.w = 0.0
        self.b = 0.0

        #Training history (for visualization) 
        self.cost_history = [] #Track cost at each iteration

        #Status Flag
        self.is_trained = False  #Has the model been trained yet?


        print(f"Learning rate: {self.lr}")
        print(f"Iterations: {self.iterations}")
        print(f"Initial Slope(w): {self.w}")
        print(f"Intial Intercept(b): {self.b}")
        print()

    #METHOD: fit() -----> Train the model (learn m and b)
    def fit(self, X, y):
        """
        Train the model to find the best w and b

        Parameters:
        X : list
            Input Values (Independent variables)
            eg: [1, 2, 3, 4, 5]

        y: list
            Output Values (dependent variables)
            eg:[90, 80, 30, 40,  20]

        Gradient Descent Steps:
        1. Start with random/zero values for w and b
        2. Make predictions with current w and b
        3. Calculate how wrong we are (cost function)
        4. Calculate gradients (which direction to adjust w and b)
        5. Update w and b in that direction
        6. Repeat steps many times
        7. Eventually converge to best w and b
        """

        #Number of training examples
        n = len(X)
        print(f"Number of training examples: {n}\n")
       



        #Gradient Descent loop
        for iterations in range(self.iterations):
            predictions = []
            for i in range(n):
                y_pred = self.w * X[i] + self.b
                predictions.append(y_pred)


            #Calculate Cost (Mean Squared Error)
            #Cost Measures how bad our predictions are
            #Lower cost --> better predictions
            #Formula: Cost = (1/n) *  Σ(y_pred - y_actual)²

            total_error = 0.0
            for i in range(n):
                error = predictions[i] - y[i]
                squared_error = error ** 2
                total_error += squared_error
            cost = total_error /n 

            self.cost_history.append(cost)


            #Calculate Gradients 
            #Gradients tell us HOW to change w and b to reduce cost
            # These formulas come from calculus (partial derivatives):
            # ∂Cost/∂m = (2/n) * Σ(y_pred - y_actual) * x
            # ∂Cost/∂b = (2/n) * Σ(y_pred - y_actual)

            #Gradient for w (slope)
            gradient_w = 0.0
            for i in range(n):
                error = predictions[i] - y[i]
                gradient_w += error * X[i]
            gradient_w = (2/n) * gradient_w

            gradient_b = 0.0
            for i in range(n):
                error = predictions[i] - y[i]
                gradient_b += error
            gradient_b = (2/n) * gradient_b


            #Update Parameters
            self.w = self.w - self.lr * gradient_w
            self.b = self.b - self.lr * gradient_b

            #Print every 100 iterations 
            if(self.iterations + 1 )% 100 == 0 or self.iterations == 0 or self.iterations == self.iterations -1:
                print(f"{iteration+1:<6} {cost:<15.2f} {self.w:<15.2f} {self.b:<15.2f}")

        self.is_trained = True
        print("Training completed")
        print(f"Final Slope (w):\n{self.w:.2f}")
        print(f"Final Intercept (b):\n{self.b:.2f}")
        print(f"Final Cost:\n{cost:.4f}")
        print(f"Learned Equation: y = {self.w:.4f}x + {self.b:.4f}")
        print()


    #METHOD: predict()
    def predict(self, X):
        """
        Make predictions using the learned equation: y = wx + b
        Parameters:
        
        X: list or single value
           Input value(s) to make predictions

        Returns:
        list or float : predicted y value(s)

        eg:
        model.predict([6, 7, 8]) ---> Predict for multiple values
        model.predict(6) ----> Predict for single value

        How it works:
        1. Check if model is trained. (can't predict without w and b)
        2. For each x, calculate y = wx + b
        3. Return predictions
        """

        if not self.is_trained:
            raise Exception("ERROR: Model must be trained before making predictions")

        #Handle single value Vs a list
        if isinstance(X, (int, float)):
            #Single value --> return single predictions
            return self.w * X + self.b
        else:
            #List of values --> return list of predictions
            predictions = []
            for x in X:
                y_pred = self.w * x + self.b
                predictions.append(y_pred)
            return predictions

    #METHOD: Get model parameters
    def get_params(self):
        """
        Get the learned parameters.

        Returns:
        dict : {'w': slope, 'b': intercept}
        """
        return {
            'w':self.w,
            'b':self.b

        }
            
#Demonstration function
def generate_linear_data(n_samples = 50, w_true=2.5, b_true = 3.0, noise = 5.0):
    """
    Generate synthetic data with a linear relationship.

    Formula: y = w_true * x + b_true + noise

    Parameters:
    n_samples: int
        - Number of data points to generate
    w_true: float 
        - True slope (what we want the model to learn)
    b_true: float
        - True intercept
    noise: float
        - Amount of random noise to add (make it realistic)

    Returns :
    X, y L lists of inputs and outputs
    """
    X = []
    y = []

    for i in range(n_samples):
        x = random.random() * 10
        y_true = w_true * x + b_true

        #Add random noise 
        noise_value = (random.random()- 0.5) * noise
        y_noisy = y_true + noise_value
        X.append(x)
        y.append(y_noisy)
    return X, y

def main():
    """
    Main function demostrating the complete worlflow

    """
    #Generate synthetic data
    print("Generating Synthetic data...")
    print()

    X, y = generate_linear_data(
        n_samples = 100,
        w_true = 2.5,
        b_true = 3.0,
        noise = 5.0

    )

    print(f"Generated {len(X)} data points")
    for i in range(10):
        print(f" x={X[i]:.2f} -> y ={y[i]:.2f}")


    #Split into train and test-sets
    
    split_idx = int(0.8 * len(X))
    X_train = X[:split_idx]
    y_train = y[:split_idx]

    X_test =  X[split_idx:]
    y_test =  y[split_idx:]


    print(f"Training set: {len(X_train)} samples")
    print(f"Test set: {len(X_test)} samples")

    #Create and train model
    model = SimpleLinearRegression(learning_rate = 0.01, iterations = 1000)
    model.fit(X_train, y_train)


    #Make Predictions
    y_pred_train = model.predict(X_train)
    y_pred_test = model.predict(X_test)

    print("First 5 test predictions vs actual:")
    for i in range(min(5, len(y_test))):
        print(f" x={X_test[i]:.2f} -> Predicted: {y_pred_test[i]:.2f}, "
              f" Actual: {y_test[i]:.2f}, Error: {abs(y_pred_test[i] - y_test[i]):.2f}")

if __name__ =="__main__":
    main()

Generating Synthetic data...

Generated 100 data points
 x=0.64 -> y =5.96
 x=5.10 -> y =16.99
 x=5.38 -> y =14.87
 x=6.73 -> y =18.35
 x=7.89 -> y =24.83
 x=8.88 -> y =27.45
 x=3.03 -> y =12.56
 x=8.42 -> y =25.42
 x=7.02 -> y =19.24
 x=5.14 -> y =17.08
Training set: 80 samples
Test set: 20 samples
Learning rate: 0.01
Iterations: 1000
Initial Slope(w): 0.0
Intial Intercept(b): 0.0

Number of training examples: 80

Training completed
Final Slope (w):
2.57
Final Intercept (b):
2.71
Final Cost:
2.1131
Learned Equation: y = 2.5730x + 2.7097

First 5 test predictions vs actual:
 x=3.77 -> Predicted: 12.42,  Actual: 12.13, Error: 0.29
 x=7.71 -> Predicted: 22.56,  Actual: 22.06, Error: 0.50
 x=2.99 -> Predicted: 10.40,  Actual: 10.36, Error: 0.03
 x=4.89 -> Predicted: 15.28,  Actual: 17.42, Error: 2.14
 x=2.73 -> Predicted: 9.74,  Actual: 8.34, Error: 1.40
