# Multiple Linear Regression Notebook

## Introduction

In this notebook, we'll explore Multiple Linear Regression, an extension of Simple Linear Regression. Multiple Linear Regression enables us to model the relationship between multiple independent variables and a dependent variable by fitting a linear equation to observed data.

### Key Concepts:
- Multiple Linear Regression Equation
- Coefficients (Slope and Intercept)
- Assumptions of Linear Regression
- Evaluation Metrics: Mean Squared Error (MSE), R-squared

Let's get started!

## 1. Understanding Multiple Linear Regression

Multiple Linear Regression models the relationship between multiple independent variables \( X \) and a dependent variable \( y \) using a linear equation:

\[ y = b_0 + b_1x_1 + b_2x_2 + ... + b_px_p \]

- \( y \) is the dependent variable.
- \( x_1, x_2, ..., x_p \) are the independent variables.
- \( b_0 \) is the intercept.
- \( b_1, b_2, ..., b_p \) are the coefficients.

## 2. Assumptions of Linear Regression

Linear Regression assumptions remain the same as in Simple Linear Regression:
1. **Linearity**
2. **Independence**
3. **Homoscedasticity**
4. **Normality**

## 3. Example: Predicting House Prices

Consider a dataset containing various features of houses such as size, number of bedrooms, and location, and we want to predict the house prices.

### Python Example

```python

In [13]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

In [14]:
# Load the dataset
data = pd.read_csv('house_prices.csv')
data.head()

Unnamed: 0,Size,Bedrooms,Location,Price
0,1500,3,Suburban,300000
1,2000,4,Urban,450000
2,1200,2,Rural,250000
3,2200,5,Suburban,550000
4,1800,3,Urban,400000


In [15]:
# Perform one-hot encoding for the 'Location' column
data = pd.get_dummies(data, columns=['Location'], drop_first=True)

In [16]:
data.head()

Unnamed: 0,Size,Bedrooms,Price,Location_Suburban,Location_Urban
0,1500,3,300000,1,0
1,2000,4,450000,0,1
2,1200,2,250000,0,0
3,2200,5,550000,1,0
4,1800,3,400000,0,1


In [7]:
# Split data into features (X) and target variable (y)
X = data.drop(columns=['Price'])
y = data['Price']

In [5]:
y

0     300000
1     450000
2     250000
3     550000
4     400000
5     350000
6     600000
7     420000
8     580000
9     320000
10    480000
11    380000
12    500000
13    370000
14    270000
15    620000
16    340000
17    260000
18    480000
19    390000
20    360000
21    420000
22    550000
Name: Price, dtype: int64

In [6]:
X

Unnamed: 0,Size,Bedrooms,Location_Suburban,Location_Urban
0,1500,3,1,0
1,2000,4,0,1
2,1200,2,0,0
3,2200,5,1,0
4,1800,3,0,1
5,1600,4,0,0
6,2500,5,1,0
7,1900,4,0,1
8,2300,5,0,0
9,1400,3,1,0


In [24]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [25]:
# Create Linear Regression model
model = LinearRegression()

In [26]:
# Train the model
model.fit(X_train, y_train)

LinearRegression()

In [27]:
# Make predictions
y_pred = model.predict(X_test)

In [28]:
# Calculate evaluation metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)

Mean Squared Error: 1772654438.272059
R-squared: 0.9232749983434878


In [29]:
X_test

Unnamed: 0,Size,Bedrooms,Location_Suburban,Location_Urban
15,2400,5,1,0
9,1400,3,1,0
0,1500,3,1,0
8,2300,5,0,0
17,1400,2,0,0


In [46]:
check = pd.DataFrame({"Size":2400,"Bedrooms":2,"Location_Suburban":1,"Location_Urban":0},index=[1])

In [47]:
p = model.predict(check)

In [48]:
p

array([523442.99141813])

In [36]:
y_pred

array([582934.20514923, 339452.39068242, 359834.49121369, 524717.61340423,
       281787.49489165])

In [49]:
y_test

15    620000
9     320000
0     300000
8     580000
17    260000
Name: Price, dtype: int64

In [50]:
R = y_test-y_pred

In [51]:
R

15    37065.794851
9    -19452.390682
0    -59834.491214
8     55282.386596
17   -21787.494892
Name: Price, dtype: float64