## Real Estate Price Prediction

This task involves predicting the price of real estate properties based on various features that influence the value of a property. The dataset contains several attributes of real estate properties such as square footage, the number of bedrooms, bathrooms, floors, the year the property was built, whether the property has a garden or pool, the size of the garage, the location score, and the distance from the city center.

The goal is to build a regression model that can predict the Price of a property based on the provided features.

Dataset Columns:

ID: A unique identifier for each property.

Square_Feet: The area of the property in square meters.

Num_Bedrooms: The number of bedrooms in the property.

Num_Bathrooms: The number of bathrooms in the property.

Num_Floors: The number of floors in the property.

Year_Built: The year the property was built.

Has_Garden: Indicates whether the property has a garden (1 for yes, 0 for no).

Has_Pool: Indicates whether the property has a pool (1 for yes, 0 for no).

Garage_Size: The size of the garage in square meters.

Location_Score: A score from 0 to 10 indicating the quality of the neighborhood (higher scores indicate better neighborhoods).

Distance_to_Center: The distance from the property to the city center in kilometers.

Price: The target variable that represents the price of the property. This is the value we aim to predict.

Objective:
The goal of this task is to develop a regression model that predicts the Price of a real estate property using the other features as inputs. The model should be able to learn the relationship between these features and the price, providing an accurate prediction for unseen data.

https://www.kaggle.com/datasets/denkuznetz/housing-prices-regression

In [1]:
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np 

In [3]:
dataset = pd.read_csv('real_estate_dataset.csv')
dataset.head()

Unnamed: 0,ID,Square_Feet,Num_Bedrooms,Num_Bathrooms,Num_Floors,Year_Built,Has_Garden,Has_Pool,Garage_Size,Location_Score,Distance_to_Center,Price
0,1,143.63503,1,3,3,1967,1,1,48,8.297631,5.935734,602134.816747
1,2,287.678577,1,2,1,1949,0,1,37,6.061466,10.827392,591425.135386
2,3,232.998485,1,3,2,1923,1,0,14,2.911442,6.904599,464478.69688
3,4,199.664621,5,2,2,1918,0,0,17,2.070949,8.284019,583105.655996
4,5,89.00466,4,3,3,1999,1,0,34,1.523278,14.648277,619879.142523


In [4]:
# there is no null in dataset 
dataset.isnull().sum()

ID                    0
Square_Feet           0
Num_Bedrooms          0
Num_Bathrooms         0
Num_Floors            0
Year_Built            0
Has_Garden            0
Has_Pool              0
Garage_Size           0
Location_Score        0
Distance_to_Center    0
Price                 0
dtype: int64

In [5]:
# droping the id column
dataset.drop(['ID'],axis=1,inplace=True)

In [6]:
dataset.head()

Unnamed: 0,Square_Feet,Num_Bedrooms,Num_Bathrooms,Num_Floors,Year_Built,Has_Garden,Has_Pool,Garage_Size,Location_Score,Distance_to_Center,Price
0,143.63503,1,3,3,1967,1,1,48,8.297631,5.935734,602134.816747
1,287.678577,1,2,1,1949,0,1,37,6.061466,10.827392,591425.135386
2,232.998485,1,3,2,1923,1,0,14,2.911442,6.904599,464478.69688
3,199.664621,5,2,2,1918,0,0,17,2.070949,8.284019,583105.655996
4,89.00466,4,3,3,1999,1,0,34,1.523278,14.648277,619879.142523


## Split the train and test dataset

In [8]:
x = dataset.drop(['Price'],axis=1)
y = dataset['Price']

In [9]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.20,random_state=50)

In [11]:
# scaling the dataset 
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.transform(x_test)

## Model Traning

In [13]:
# importing different supervise model to solve this problem
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge,Lasso,ElasticNet
from sklearn.svm import SVR #support vector regression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import RandomForestRegressor ,GradientBoostingRegressor,AdaBoostRegressor

models = {
    "Linear regression": LinearRegression(),
    "Lasso Regression":Lasso(),
    "Ridge Regression":Ridge(),
    "ElasticNet Regression":ElasticNet(),
    "SVM Regression":SVR(),
    "Decision Tree Regression":DecisionTreeRegressor(),
    "KNeighbour Regression":KNeighborsRegressor(),
    "Random Forest Regression":RandomForestRegressor(),
    "Adaboost Regression":AdaBoostRegressor(),
    "Gradient Boosting Regressor":GradientBoostingRegressor()
}

0,1,2
,estimator,
,n_estimators,50
,learning_rate,1.0
,loss,'linear'
,random_state,


In [26]:
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error
for col in list(models):
    model = models[col]
    model.fit(x_train_scaled,y_train)
    predict = model.predict(x_test_scaled)
    score = r2_score(y_test,predict)
    MEA = mean_absolute_error(y_test,predict)
    MSE = mean_squared_error(y_test,predict)
    print()
    print(col)
    print(f"R2 value:{score}")
    print(f"Mean Absolute Error:{MEA}")
    print(f"Mean Squared Error:{MSE}")


Linear regression
R2 value:0.9766173228120267
Mean Absolute Error:16607.617586180757
Mean Squared Error:423906226.286499

Lasso Regression
R2 value:0.9766174167371021
Mean Absolute Error:16607.454015667965
Mean Squared Error:423904523.5120937

Ridge Regression
R2 value:0.9766499120482723
Mean Absolute Error:16552.917051131473
Mean Squared Error:423315413.69291216

ElasticNet Regression
R2 value:0.8449424431731216
Mean Absolute Error:42395.924152525215
Mean Squared Error:2811049532.2363806

SVM Regression
R2 value:-0.00637798380254706
Mean Absolute Error:109609.88093958284
Mean Squared Error:18244698410.795242

Decision Tree Regression
R2 value:0.6883146090815857
Mean Absolute Error:59159.885876384076
Mean Squared Error:5650566733.2573595

KNeighbour Regression
R2 value:0.7528646597359789
Mean Absolute Error:52170.36886029299
Mean Squared Error:4480334250.486724

Random Forest Regression
R2 value:0.8362810553370132
Mean Absolute Error:43046.85704932087
Mean Squared Error:2968072451.489