# Linear Regression

In this assignment, you need to understand and implement linear regression and evaluate its performance on the Boston Housing Dataset

### **Import Libraries**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

### **Load the dataset**
Use pd.read_csv() function to read data from the 'HousingData.csv' file

In [2]:
data = pd.read_csv('HousingData.csv')

### **Data Preparation**

We will split the dataset into training and testing data with an 80/20 split.

In [3]:
from sklearn.preprocessing import StandardScaler

X = data.drop(columns='MEDV').values    # All input features
y = data['MEDV'].values                 # Target variable

scaler = StandardScaler()
X = scaler.fit_transform(X)             # Normalizing the input data to avoid any overflows

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)   # specifying random state ensures that the random split is same everytime we run this command

# Drop rows with NaN or Inf
y_train = y_train[~np.isnan(X_train).any(axis=1)]
X_train = X_train[~np.isnan(X_train).any(axis=1)]

y_train = y_train[~np.isinf(X_train).any(axis=1)]
X_train = X_train[~np.isinf(X_train).any(axis=1)]

# Similarly for X_test
y_test = y_test[~np.isnan(X_test).any(axis=1)]
X_test = X_test[~np.isnan(X_test).any(axis=1)]

y_test = y_test[~np.isinf(X_test).any(axis=1)]
X_test = X_test[~np.isinf(X_test).any(axis=1)]

### **Implement Linear Regression from Scratch**

The three major tasks in this process are:
* Fitting the model using gradient descent
* Predicting values for test data
* Calculating the Mean Squared Error (MSE)

Since this implementation will handle multiple input features, it is called multivariate linear regression

In [4]:
class LinearRegression:
    def __init__(self):
        # Initialize weights and bias
        pass

    def fit(self, X, y, lr=0.01, epochs=1000):
        # Initialize parameters

        # Gradient descent
        
        pass

    def predict(self, X):
        pass

    def mean_squared_error(self, y_true, y_pred):
        pass

### **Instantiate the model**
Create an instance of the class and name it *model* and fit on the training data with **learning rate = 0.01** and **1000 iterations**.

In [5]:
# Initialize the model
model = None

# Train the model


### **Evaluate the Model**
Evaluate the model's performance on the test set using:
1. Mean Squared Error (MSE)
2. R-squared Score

In [6]:
# Define the R2_score function
def R2_score(y_test, y_test_pred):
    y_mean = y_test.mean()
    return 1 - ((sum((y_test - y_test_pred) ** 2)) / (sum((y_test - y_mean) ** 2)))

In [8]:
# Predict on test data
y_test_pred = None

# Calculate evaluation metrics
mse = None
r2 = None

# print(f"Mean Squared Error: {mse:.2f}")
# print(f"R-squared Score: {r2:.2f}")


## **Comparing your results with sklearn's Linear Regression**
To validate your implementation, let's compare the results with sklearn's `LinearRegression` model

In [None]:
from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(X_train, y_train)

# Predict and evaluate
y_test_pred_sklearn = lr.predict(X_test)
mse_sklearn = mean_squared_error(y_test, y_test_pred_sklearn)
r2_sklearn = r2_score(y_test, y_test_pred_sklearn)

print(f"Sklearn Model's Mean Squared Error: {mse_sklearn:.2f}")
print(f"Sklearn Model's R-squared Score: {r2_sklearn:.2f}")

# Compare weights and bias
# print(f"Your Model's Weights: {model.weights}, Bias: {model.bias}")
# print(f"Sklearn Model's Weights: {lr.coef_}, Bias: {lr.intercept_}")