# Multiple Linear Regression with Dummies - Exercise

You are given a real estate dataset. 

Real estate is one of those examples that every regression course goes through as it is extremely easy to understand and there is a (almost always) certain causal relationship to be found.

The data is located in the file: 'real_estate_price_size_year_view.csv'. 

You are expected to create a multiple linear regression (similar to the one in the lecture), using the new data. 

In this exercise, the dependent variable is 'price', while the independent variables are 'size', 'year', and 'view'.

#### Regarding the 'view' variable:
There are two options: 'Sea view' and 'No sea view'. You are expected to create a dummy variable for view and include it in the regression

Good luck!

## Import the relevant libraries

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
sns.set()

## Load the data

In [5]:
raw_data = pd.read_csv('real_estate_price_size_year_view.csv')

In [6]:
raw_data.head()

Unnamed: 0,price,size,year,view
0,234314.144,643.09,2015,No sea view
1,228581.528,656.22,2009,No sea view
2,281626.336,487.29,2018,Sea view
3,401255.608,1504.75,2015,No sea view
4,458674.256,1275.46,2009,Sea view


In [7]:
data = raw_data.copy()

## Create a dummy variable for 'view'

In [8]:
data['view'] = data['view'].map({'No sea view':0,'Sea view':1})

In [9]:
data.head()

Unnamed: 0,price,size,year,view
0,234314.144,643.09,2015,0
1,228581.528,656.22,2009,0
2,281626.336,487.29,2018,1
3,401255.608,1504.75,2015,0
4,458674.256,1275.46,2009,1


## Create the regression

### Declare the dependent and the independent variables

In [11]:
y = data['price']
x1 = data[['size','year','view']]

### Regression

In [12]:
x = sm.add_constant(x1)
result = sm.OLS(y,x).fit()
result.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.913
Model:,OLS,Adj. R-squared:,0.91
Method:,Least Squares,F-statistic:,335.2
Date:,"Wed, 22 May 2024",Prob (F-statistic):,1.02e-50
Time:,15:30:17,Log-Likelihood:,-1144.6
No. Observations:,100,AIC:,2297.0
Df Residuals:,96,BIC:,2308.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-5.398e+06,9.94e+05,-5.431,0.000,-7.37e+06,-3.43e+06
size,223.0316,7.838,28.455,0.000,207.473,238.590
year,2718.9489,493.502,5.510,0.000,1739.356,3698.542
view,5.673e+04,4627.695,12.258,0.000,4.75e+04,6.59e+04

0,1,2,3
Omnibus:,29.224,Durbin-Watson:,1.965
Prob(Omnibus):,0.0,Jarque-Bera (JB):,64.957
Skew:,1.088,Prob(JB):,7.85e-15
Kurtosis:,6.295,Cond. No.,942000.0


In [None]:
plt.scatter(data['price'],y,c=data['view'],cmap='RdYlGn_r')
yhat_no = -5.398e+06. + 223.0316 * data['size']
yhat_yes = -5.398e+06 + 5.673e+04 * data['size']
fig = plt.plot(data['SAT'], yhat_no, lw=2,c='red')
fig = plt.plot(data['SAT'], yhat_yes, lw=2,c='green')
plt.xlabel('Tamanho',fontsize = 20)
plt.ylabel('Preço',fontsize = 20)
plt.show()