Project 1: House Price Prediction

The goal for this project is to build a solution that is capable of predicting the house prices better than individuals.

The following properties can be considered helpful for predicting price of house:
1. Total square feet area
2. Number of bedrooms, bathrooms, stories etc
3. Furnishing status
4. Basic amenities like :
  a. Air conditioning
  b. Hot water heating
  c. Parking
  d. Basement

In [None]:
#IMPORTING PACKAGES NUMPY & PANDAS
import numpy as np
import pandas as pd

In [None]:
#UPLOADING THE DATASET
#STEP 1
#GATHERING DATA
data=pd.read_csv("/content/Housing (1).csv")

In [None]:
#First 5 Rows of Data Set
data.head()

Unnamed: 0,price,area,bedrooms,bathrooms,stories,mainroad,guestroom,basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus
0,13300000,7420,4,2,3,yes,no,no,no,yes,2,yes,furnished
1,12250000,8960,4,4,4,yes,no,no,no,yes,3,no,furnished
2,12250000,9960,3,2,2,yes,no,yes,no,no,2,yes,semi-furnished
3,12215000,7500,4,2,2,yes,no,yes,no,yes,3,yes,furnished
4,11410000,7420,4,1,2,yes,yes,yes,no,yes,2,no,furnished


In [None]:
#INFORMATION ABOUT DATASET
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545 entries, 0 to 544
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   price             545 non-null    int64 
 1   area              545 non-null    int64 
 2   bedrooms          545 non-null    int64 
 3   bathrooms         545 non-null    int64 
 4   stories           545 non-null    int64 
 5   mainroad          545 non-null    object
 6   guestroom         545 non-null    object
 7   basement          545 non-null    object
 8   hotwaterheating   545 non-null    object
 9   airconditioning   545 non-null    object
 10  parking           545 non-null    int64 
 11  prefarea          545 non-null    object
 12  furnishingstatus  545 non-null    object
dtypes: int64(6), object(7)
memory usage: 55.5+ KB


In [None]:
#STEP 2
#CLEANING THE DATA
#WITH HELP OF FEATURE ENGINEERING
#WE WILL USE ONE HOT ENCODING TO CONVERT THE CATEGORICAL VARIABLE TO NUMERIC VALUES
#mainroad yes=0 no=1
data['mainroad']=data['mainroad'].apply({'yes':0,'no':1}.get) 

#guestroom yes=0 no=1
data['guestroom']=data['guestroom'].apply({'yes':0,'no':1}.get) 

#basement yes=0 no=1
data['basement']=data['basement'].apply({'yes':0,'no':1}.get) 

#hotwaterheating yes=0 no=1
data['hotwaterheating']=data['hotwaterheating'].apply({'yes':0,'no':1}.get) 

#airconditioning yes=0 no=1
data['airconditioning']=data['airconditioning'].apply({'yes':0,'no':1}.get) 

#prefarea yes=0 no=1
data['prefarea']=data['prefarea'].apply({'yes':0,'no':1}.get) 

#furnishingstatus furnished:0,semi-furnished:1, unfurnished:2
data['furnishingstatus']=data['furnishingstatus'].apply({'furnished':0,'semi-furnished':1, 'unfurnished':2}.get) 

In [None]:
#NOW WE WILL USE THE ENCODED DATA FOR MAGNITUDE
data.head()

Unnamed: 0,price,area,bedrooms,bathrooms,stories,mainroad,guestroom,basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus
0,13300000,7420,4,2,3,0,1,1,1,0,2,0,0
1,12250000,8960,4,4,4,0,1,1,1,0,3,1,0
2,12250000,9960,3,2,2,0,1,0,1,1,2,0,1
3,12215000,7500,4,2,2,0,1,0,1,0,3,0,0
4,11410000,7420,4,1,2,0,0,0,1,0,2,1,0


In [None]:
#STEP 3
#NOW WE WILL DIVIDE THE DATA 
#INDEPENDENT VARIABLE - area,bedrooms,bathrooms,stories,mainroad,guestroom,
  #basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus

#DEPENDENT VARIABLE - price
x=data[['area','bedrooms','bathrooms','stories','mainroad','guestroom','basement','hotwaterheating','airconditioning','parking','prefarea','furnishingstatus']] #INDEPENDENT VARIABLE

y=data['price'] #DEPENDENT VARIABLE

In [None]:
#STEP 4
#SPLITTING THE DATA INTO TRAINING AND TESTING SET
#TRAINING - WE WILL MAKE THE MODEL LEARN
#WE WILL USE SKLEARN MODULE
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3) #Test Size 0.2 Means 20% will be used for testing purpose and 
#80% will be used for training purpose to make the model learn

In [None]:
#STEP 5
#CREATING MACHINE LEARNING MODEL USING LINEAR REGRESSION ALGORITHM
#LINEAR REGRESSION ALGORITHM
from sklearn.linear_model import LinearRegression
#Activating Linear Regression Algorithm Model for use
model=LinearRegression()  

In [None]:
#TRAINING OF THE MODEL HAS TO BE DONE USING .FIT ()
model.fit(x_train,y_train)

In [None]:
#PREDICTION NEED TO BE DONE with the help of .PREDICT()
y_pred=model.predict(x_test)

In [None]:
#Printing Prediction of House Price which we are going to get
print(y_pred)

In [None]:
#CHECKING THE ACCURACY OF MODEL with .SCORE()
accuracy=model.score(x,y)

In [None]:
print('The Accuracy of the ML model using Linear Regression Algorithm is', accuracy)

The Accuracy of the ML model using Linear Regression Algorithm is 0.6732610146261582


In [None]:
#PREDICTING THE PRICE OF NEW HOUSE WITH THE HELP OF ML MODEL LINEAR REGRESSION
#We will create data dictionary for adding new house details

data_new={'area':500,'bedrooms':1,'bathrooms':1,'stories':3,'mainroad':0,'guestroom':1,'basement':1,'hotwaterheating':1,'airconditioning':1,'parking':1,'prefarea':1,'furnishingstatus':1}

In [None]:
#CONVERTING THE DICTIONARY TO DATAFRAME
index=[1]
my_data=pd.DataFrame(data_new,index)

In [None]:
print(my_data)

   area  bedrooms  bathrooms  stories  mainroad  guestroom  basement  \
1   500         1          1        3         0          1         1   

   hotwaterheating  airconditioning  parking  prefarea  furnishingstatus  
1                1                1        1         1                 1  


In [None]:
#PREDICTION THE PRICE OF HOUSE
y_pred=model.predict(my_data)

In [None]:
print(y_pred)

[2743545.08217797]
