## Project 2: Ames housing price prediction
---
Project notebook organisation:<br>
<a href='./1 Data Cleaning and Feature Engineering.ipynb'>1 Data Cleaning and Feature Engineering</a><br>
<a href='./2 Regression Models.ipynb'>2 Regression Models</a><br>
**3 Visualisation and Insights** (current notebook)<br>


---
### This notebook's layout
<a href='#age'>How does house age affects salesprice?</a>	
<a href='#new'>Trends on new houses(2 years and below)</a><br>
<a href='#trend'>House prices trend investigation</a><br>
<a href='#qual'>Overall house quality and sale price</a><br>
<a href='#nei'>Neighborhood price trends and popularity</a><br>
<a href='#zone'>Building type and housing zones</a><br>
<a href='#season'>Seasonality and house sale price</a><br>

---

In [24]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
%matplotlib inline
# 3 Visualisation and Insights.txt

### This notebook Intro

The original train data file was modified and ouput to another file name for ease of work when performing visualization. The following was performed:

1. Some features are abbreviated, changed the columns to be more descriptive so it is more intiutive when doing visualization. Features to changed: MSZoning, BldgType and Neighbourhood

2. Based on best fit regression model, we know which the the stronger predictors/features. I went ahead to drop all those columns that I do not need for visaulisation.

Since this part is visualization, I wanted to unclunter and just concentrate on visualization aspect, all python modifications thus was not included in this workbook. 

In [25]:
tableau = pd.read_csv('../data/train_tableau_cleaned.csv')

In [26]:
# EXploring average and median house prices
print(tableau['SalePrice'].mean())
print(tableau['SalePrice'].median())

181479.01805758907
162500.0


<a id='age'></a>
### How does house age affects salesprice?

In general , lower house age translates to higher prices but there are 2 groups of houses that are above 100 years of age was still able to get above average prices

![title](../misc/age_price.jpg)

<a id='new'></a>
### Trends on new houses(2 years and below)

New houses are not really popular.<br> 
For houses that are 2 years of age or under, they are just 295 counts or about 7% of the total. <br>
The average price for new house are almost 100k above the average house prices which could explain the unpopularity

![title](../misc/hse_2yrs.jpg)

In [27]:
tableau['age'] = tableau['YrSold']-tableau['YearBuilt']

In [28]:
# Houses that were sold with 1-2 year of being built is less than 300.
tableau[(tableau['age']<=2)].describe()

Unnamed: 0,Id,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemod/Add,TotalBsmtSF,1stFlrSF,2ndFlrSF,GrLivArea,BedroomAbvGr,TotRmsAbvGrd,MoSold,YrSold,SalePrice,totalSF,psf,age
count,295.0,295.0,295.0,295.0,295.0,295.0,295.0,295.0,295.0,295.0,295.0,295.0,295.0,295.0,295.0,295.0,295.0,295.0,295.0,295.0
mean,1727.705085,52.20339,73.80339,10033.135593,7.657627,5.020339,2006.298305,2006.650847,1374.166102,1388.820339,304.444068,1694.077966,2.701695,6.949153,6.227119,2007.240678,263902.379661,3068.244068,83.898305,0.942373
std,747.834328,42.091082,21.987365,4357.523052,1.101051,0.141397,1.261342,1.276471,426.67204,403.79375,453.405004,419.773575,0.704092,1.545497,2.908925,1.160448,97637.473902,710.559984,14.9899,0.690116
min,37.0,20.0,24.0,2522.0,5.0,5.0,2004.0,2004.0,192.0,520.0,0.0,848.0,1.0,3.0,1.0,2006.0,130000.0,1325.0,40.0,0.0
25%,1121.5,20.0,61.0,7998.5,7.0,5.0,2005.0,2006.0,1097.5,1118.5,0.0,1407.0,2.0,6.0,4.0,2006.0,191570.0,2571.0,73.0,0.0
50%,1780.0,20.0,74.0,9965.0,8.0,5.0,2006.0,2007.0,1392.0,1418.0,0.0,1612.0,3.0,7.0,6.0,2007.0,239799.0,3036.0,82.0,1.0
75%,2388.5,60.0,86.5,12339.0,8.0,5.0,2007.0,2007.0,1641.5,1659.0,728.0,1864.5,3.0,8.0,8.0,2008.0,319700.0,3461.5,94.0,1.0
max,2903.0,180.0,134.0,51974.0,10.0,6.0,2010.0,2010.0,3094.0,2464.0,1862.0,3390.0,5.0,12.0,12.0,2010.0,611657.0,5496.0,138.0,2.0


<a id='trend'></a>
### House prices trend investigation

Ames house prices peaked at 2007 and as the US housing market crash in 2008,<br>
the following years was on a downward trend. If economy recovers, housing price tend to appreciate. <br>
The period around year 2010 could be a good opportunity for buyers to buy house at bargain price 

![title](../misc/price_trend.jpg)

<a id='qual'></a>
### Overall house quality and sale price

Houses with higher overall quality has a constant upward trend of being more expensive. On the other hand, it is observed that not every house buyer is willing to pay top dollar for top end quality, most buyers usually settles for a house at a mid-high level (5-7), in essence finding a balance between the house an a house of reasonable quality. Low quality houses are definitely frowned upon.

House sellers should definitely look towards re-modelling their house , but it might not be worthwhile to over-spend as not many buyers are looking at the top quality range

![title](../misc/overallqual2.jpg)

In [29]:
tableau.groupby('OverallQual')['SalePrice'].agg(['count','mean'])

Unnamed: 0_level_0,count,mean
OverallQual,Unnamed: 1_level_1,Unnamed: 2_level_1
1,4,48725.0
2,9,51081.0
3,29,81309.103448
4,159,107744.037736
5,563,134963.64476
6,506,162891.102767
7,431,203430.285383
8,250,271437.044
9,77,370197.376623
10,21,440774.809524


<a id='nei'></a>
### Neighborhood price trends and popularity

In terms of neighborhood popularity, North Ames, College Creek and Old Town stand out. Out of these 3, 2 of them are below average sales price, only with College Creek a slight bit above the average price. 
The most pricey neighborhoods are: Stone Brook, Northridge Heights and Northridge. All 3 have average sale price above 300k which significantly higher than the 185k average. At said, they are not popular. 
The above 2 observations looking at neighborhood popularity and most pricey neighborhoods draw a conclusion that house price is most important determinant factor for house sales.

![title](../misc/neighbourhood.jpg)

<a id='zone'></a>
### Building type and housing zones

House buyers favor single-family detached which stands out among the rest at 70%. 
Personal privacy is important to house buyers. We see that houses in lower density residential area are preferred even though they command a slightly higher average price as compare to medium and high density ones. 
Townhouses are the most expensive. Again, it is no surprise that they are not as popular

![title](../misc/bldg_zones_pop1.jpg)

![title](../misc/bldg_zones_price1.jpg)

In [30]:
# Exploring the most popular neighborhood in Ames
tableau['Neighborhood'].value_counts().sort_values(ascending=False).head()

North Ames       310
College Creek    180
Old Town         163
Edwards          141
Somerset         130
Name: Neighborhood, dtype: int64

In [31]:
# Most popular house type in Ames is Single-family Detached
tableau['BldgType'].value_counts()

Single-family Detached    1698
Townhouse End Unit         161
Duplex                      75
Townhouse Inside Unit       69
Two-family_Conversion       46
Name: BldgType, dtype: int64

<a id='season'></a>
### Seasonality and house sale price

House sales peak during summer months (May to July). Sellers should try to sell their property during this period

![title](../misc/season.jpg)

In [32]:
# Most houses are sold during the summer months
tableau['MoSold'].value_counts().head()

6    352
7    303
5    257
4    208
3    168
Name: MoSold, dtype: int64