# Car Price Prediction using Custom Linear Regression

## Introduction

This project demonstrates the prediction of car prices using a custom implementation of Linear Regression. The dataset contains features such as engine size, wheelbase, and horsepower to predict the car's price.

---

## Workflow

1. Import necessary libraries and load the dataset.
2. Explore and preprocess the dataset.
3. Split the data into training and testing sets.
4. Implement a custom `LinearRegression` class.
5. Train the model to calculate coefficients and intercept.
6. Make predictions on the test data.
7. Evaluate the model using metrics such as MAE, MSE, and R².
8. Visualize the results.

---

## Dataset Information

**Dataset Name:** CarPrice_Assignment.csv  

### Features:
1. **wheelbase**: The distance between the centers of the front and rear wheels.
2. **enginesize**: The volume of the car's engine in cubic centimeters.
3. **horsepower**: The power output of the car's engine.
4. **price**: The price of the car in dollars (target variable).

---

## Objective

To implement and evaluate a custom Linear Regression model for predicting car prices based on the selected features.


In [1]:
# Import necessary libraries
from sklearn.model_selection import train_test_split  # For splitting data into training and testing sets
import pandas as pd                                   # For data manipulation
import numpy as np                                    # For numerical operations

# For data visualization
import plotly.express as px                           # For creating interactive 3D scatter plots
import plotly.graph_objects as go                    # For advanced plot customization

# For evaluation metrics
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score  # For performance evaluation

# Load the dataset
# Focus only on relevant columns: 'enginesize', 'wheelbase', 'horsepower', and 'price'
dataset = pd.read_csv(r"Data/CarPrice_Assignment.csv", usecols=["enginesize", "wheelbase", "horsepower", "price"])

# View the first few rows of the dataset
dataset.head()

Unnamed: 0,wheelbase,enginesize,horsepower,price
0,88.6,130,111,13495.0
1,88.6,130,111,16500.0
2,94.5,152,154,16500.0
3,99.8,109,102,13950.0
4,99.4,136,115,17450.0


In [2]:
# Custom implementation of a Linear Regression model
class LinearRegression:
    def __init__(self):
        """
        Initialize the model with placeholders for slope (coef_) and intercept.
        """
        self.coef_ = None  # Coefficients (slopes)
        self.intercept_ = None  # Intercept (bias term)
        
    def fit(self, X, y):
        """
        Train the Linear Regression model using the training data.
        Arguments:
        - X: Independent variable (features), expects a DataFrame or Series.
        - y: Dependent variable (target), expects a Series.
        """
        X_values = X.squeeze()
        y_values = y.squeeze()
        # Add a column of ones to X for the intercept
        X_values = np.insert(X_values, 0, 1, axis=1)
        X_transpose = X_values.T
        # Calculate Beta (model parameters) using the Normal Equation
        Beta = np.linalg.inv(np.dot(X_transpose, X_values)).dot(X_transpose).dot(y_values)
        self.intercept_ = Beta[0]  # The first value is the intercept
        self.coef_ = Beta[1:]  # Remaining values are the slopes (coefficients)
        
    def predict(self, X):
        """
        Predict the target values for given input features.
        Arguments:
        - X: Input features, can be a list, NumPy array, or DataFrame.
        Returns:
        - Predicted values as a NumPy array.
        """
        if isinstance(X, list):
            X = np.array(X)  # Convert list to NumPy array
        if isinstance(X, pd.DataFrame):
            X = X.values  # Convert DataFrame to NumPy array
        # Add a column of ones for the intercept
        X = np.insert(X, 0, 1, axis=1).T
        Beta = np.insert(self.coef_, 0, self.intercept_)
        y_pred = np.dot(Beta, X)  # Predict using the model parameters
        return np.ravel(y_pred)

In [3]:
# Define the features (independent variables) and target variable
X = dataset[["wheelbase", "enginesize", "horsepower"]]  # Features
y = dataset["price"]  # Target variable

# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=9)

In [4]:
# Initialize the Linear Regression model
lr = LinearRegression()

# Train the model on the training data
lr.fit(X_train, y_train)

In [5]:
# Make predictions on the test set
y_pred = lr.predict(X_test)

In [6]:
# Evaluate the model using MAE, MSE, and R² metrics
mae = mean_absolute_error(y_test, y_pred)  # Mean Absolute Error
mse = mean_squared_error(y_test, y_pred)  # Mean Squared Error
r2 = r2_score(y_test, y_pred)             # R-squared Score

print("MAE:", mae)
print("MSE:", mse)
print("R² Score:", r2)

MAE: 2221.3576786830836
MSE: 8857447.338987635
R² Score: 0.8336004645348142
