# House Price Prediction
<div style="text-align: center; width: 100%;">
    <img src="house-picture.png" style="max-width: 80%; height: auto;">
</div>

<p>When you ask someone to envision their ideal home, they rarely dive into details like basement ceiling height or the proximity to an east-west railroad. Yet, in the world of real estate, there's so much more to price negotiations than just the number of bedrooms or the presence of a white picket fence.<p/>
<p>With an arsenal of 79 explanatory variables that comprehensively cover nearly every aspect of residential homes in the picturesque town of Ames, Iowa, we're on a mission to equip home buyers with a powerful tool. This tool will enable them to forecast the ultimate price of their cherished dream home.<p/>

The Outline for this Notebook is:
- [Data Wrangling](#data-wrangling)
- Exploratory Data Analysis
- Data Preprocessing / Feature Engineering
- Model Training
- Model Evaluation
- Model Prediction

*NOTE*: Each session might have multiple version, to demostract the iterative process used for achieving our final result.

This project is based on the Advance House Pricing competition on <img src="Kaggle_logo.png" width="35" height="15">.

The Ames Housing dataset was compiled by Dean De Cock for use in data science education, and can be found via: [link]("https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data")

<a name="data_wrangling"></a>
## Data Wrangling

#### Import required libaries

In [73]:
import pandas as pd
import numpy as np




pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.expand_frame_repr', False)

### Data Gathering

In [87]:
df = pd.read_csv('train.csv')

In [88]:
df.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,Electrical,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,2Story,7,5,2003,2003,Gable,CompShg,VinylSd,VinylSd,BrkFace,196.0,Gd,TA,PConc,Gd,TA,No,GLQ,706,Unf,0,150,856,GasA,Ex,Y,SBrkr,856,854,0,1710,1,0,2,1,3,1,Gd,8,Typ,0,,Attchd,2003.0,RFn,2,548,TA,TA,Y,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,1Story,6,8,1976,1976,Gable,CompShg,MetalSd,MetalSd,,0.0,TA,TA,CBlock,Gd,TA,Gd,ALQ,978,Unf,0,284,1262,GasA,Ex,Y,SBrkr,1262,0,0,1262,0,1,2,0,3,1,TA,6,Typ,1,TA,Attchd,1976.0,RFn,2,460,TA,TA,Y,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,2Story,7,5,2001,2002,Gable,CompShg,VinylSd,VinylSd,BrkFace,162.0,Gd,TA,PConc,Gd,TA,Mn,GLQ,486,Unf,0,434,920,GasA,Ex,Y,SBrkr,920,866,0,1786,1,0,2,1,3,1,Gd,6,Typ,1,TA,Attchd,2001.0,RFn,2,608,TA,TA,Y,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,2Story,7,5,1915,1970,Gable,CompShg,Wd Sdng,Wd Shng,,0.0,TA,TA,BrkTil,TA,Gd,No,ALQ,216,Unf,0,540,756,GasA,Gd,Y,SBrkr,961,756,0,1717,1,0,1,0,3,1,Gd,7,Typ,1,Gd,Detchd,1998.0,Unf,3,642,TA,TA,Y,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,2Story,8,5,2000,2000,Gable,CompShg,VinylSd,VinylSd,BrkFace,350.0,Gd,TA,PConc,Gd,TA,Av,GLQ,655,Unf,0,490,1145,GasA,Ex,Y,SBrkr,1145,1053,0,2198,1,0,2,1,4,1,Gd,9,Typ,1,TA,Attchd,2000.0,RFn,3,836,TA,TA,Y,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000


In [89]:
df.shape

(1460, 81)

### Data Assessing

In [90]:
df.head().T

Unnamed: 0,0,1,2,3,4
Id,1,2,3,4,5
MSSubClass,60,20,60,70,60
MSZoning,RL,RL,RL,RL,RL
LotFrontage,65.0,80.0,68.0,60.0,84.0
LotArea,8450,9600,11250,9550,14260
Street,Pave,Pave,Pave,Pave,Pave
Alley,,,,,
LotShape,Reg,Reg,IR1,IR1,IR1
LandContour,Lvl,Lvl,Lvl,Lvl,Lvl
Utilities,AllPub,AllPub,AllPub,AllPub,AllPub


In [91]:
null_columns  = df.columns[df.isnull().sum() / df.shape[0] * 100 > 10]


In [92]:
null_columns

Index(['LotFrontage', 'Alley', 'FireplaceQu', 'PoolQC', 'Fence',
       'MiscFeature'],
      dtype='object')

In [93]:
df.drop(null_columns, inplace=True,axis=1)

In [94]:
categorical_columns = df.columns[df.dtypes == 'object']
numerical_columns = df.columns[df.dtypes != 'object']

In [95]:
# Fill numerical null values with 0
df[numerical_columns] = df[numerical_columns].fillna(0)

# Drop rows with categorical null values
df.dropna(inplace=True)

In [96]:
df.shape

(1338, 75)

In [97]:
pd.DataFrame(df.isnull().sum())

Unnamed: 0,0
Id,0
MSSubClass,0
MSZoning,0
LotArea,0
Street,0
LotShape,0
LandContour,0
Utilities,0
LotConfig,0
LandSlope,0


In [104]:
df.describe().round()

Unnamed: 0,Id,MSSubClass,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,TotRmsAbvGrd,Fireplaces,GarageYrBlt,GarageCars,GarageArea,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SalePrice
count,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0,1338.0
mean,731.0,56.0,10706.0,6.0,6.0,1973.0,1986.0,110.0,464.0,49.0,582.0,1096.0,1176.0,357.0,4.0,1538.0,0.0,0.0,2.0,0.0,3.0,1.0,7.0,1.0,1979.0,2.0,501.0,99.0,48.0,21.0,4.0,16.0,3.0,43.0,6.0,2008.0,186762.0
std,422.0,41.0,10337.0,1.0,1.0,30.0,20.0,186.0,459.0,166.0,440.0,406.0,387.0,440.0,41.0,521.0,1.0,0.0,1.0,1.0,1.0,0.0,2.0,1.0,25.0,1.0,187.0,128.0,65.0,61.0,30.0,58.0,42.0,508.0,3.0,1.0,78914.0
min,1.0,20.0,1300.0,2.0,2.0,1880.0,1950.0,0.0,0.0,0.0,0.0,105.0,438.0,0.0,0.0,438.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,0.0,1900.0,1.0,160.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2006.0,35311.0
25%,366.0,20.0,7744.0,5.0,5.0,1956.0,1968.0,0.0,0.0,0.0,248.0,820.0,894.0,0.0,0.0,1160.0,0.0,0.0,1.0,0.0,2.0,1.0,5.0,0.0,1962.0,1.0,378.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,2007.0,135000.0
50%,730.0,50.0,9600.0,6.0,5.0,1976.0,1994.0,0.0,413.0,0.0,489.0,1022.0,1098.0,0.0,0.0,1480.0,0.0,0.0,2.0,0.0,3.0,1.0,6.0,1.0,1980.0,2.0,484.0,6.0,28.0,0.0,0.0,0.0,0.0,0.0,6.0,2008.0,168500.0
75%,1099.0,70.0,11761.0,7.0,6.0,2001.0,2004.0,174.0,733.0,0.0,816.0,1324.0,1414.0,740.0,0.0,1792.0,1.0,0.0,2.0,1.0,3.0,1.0,7.0,1.0,2002.0,2.0,583.0,174.0,70.0,0.0,0.0,0.0,0.0,0.0,8.0,2009.0,220000.0
max,1460.0,190.0,215245.0,10.0,9.0,2010.0,2010.0,1600.0,5644.0,1474.0,2336.0,6110.0,4692.0,2065.0,572.0,5642.0,2.0,2.0,3.0,2.0,6.0,3.0,12.0,3.0,2010.0,4.0,1418.0,857.0,547.0,552.0,508.0,480.0,738.0,15500.0,12.0,2010.0,755000.0


In [101]:
df['BsmtUnfSF'].value_counts()

0       73
728      8
384      7
300      7
572      6
625      6
280      6
600      6
108      5
326      5
390      5
80       5
410      5
672      5
216      5
440      5
270      5
490      5
284      4
747      4
276      4
319      4
611      4
340      4
884      4
36       4
840      4
816      4
660      4
360      4
312      4
264      4
544      4
186      4
192      4
92       4
420      4
768      4
88       4
698      4
392      4
536      4
125      4
847      4
350      4
162      4
115      4
638      4
168      4
336      4
163      4
100      4
732      3
970      3
506      3
894      3
114      3
203      3
808      3
245      3
105      3
252      3
811      3
396      3
460      3
556      3
484      3
342      3
756      3
602      3
184      3
324      3
594      3
292      3
121      3
630      3
712      3
173      3
710      3
596      3
210      3
474      3
278      3
504      3
554      3
598      3
700      3
354      3
441      3
130      3
310      3