# Ames Housing Price Prediction Model (Brute)

This paper presents a data set describing the sale of individual residential property in Ames, Iowa from 2006 to 2010. The data set contains 2930 observations and a large number of explanatory variables (23 nominal, 23 ordinal, 14 discrete, and 20 continuous) involved in assessing home values. I will discuss my previous use of the Boston Housing Data Set and I will suggest methods for incorporating this new data set as a final project in an undergraduate regression course.

# Problem Statement

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Initialization" data-toc-modified-id="Initialization-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Initialization</a></span><ul class="toc-item"><li><span><a href="#Libraries-Import" data-toc-modified-id="Libraries-Import-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Libraries Import</a></span></li><li><span><a href="#Data-Overview" data-toc-modified-id="Data-Overview-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Data Overview</a></span></li></ul></li><li><span><a href="#Data-Cleaning" data-toc-modified-id="Data-Cleaning-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Data Cleaning</a></span><ul class="toc-item"><li><span><a href="#General-Cleaning" data-toc-modified-id="General-Cleaning-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>General Cleaning</a></span><ul class="toc-item"><li><span><a href="#Column-Name" data-toc-modified-id="Column-Name-2.1.1"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>Column Name</a></span></li><li><span><a href="#General-Missing-Value-Handling" data-toc-modified-id="General-Missing-Value-Handling-2.1.2"><span class="toc-item-num">2.1.2&nbsp;&nbsp;</span>General Missing Value Handling</a></span></li><li><span><a href="#Lot-Frontage" data-toc-modified-id="Lot-Frontage-2.1.3"><span class="toc-item-num">2.1.3&nbsp;&nbsp;</span>Lot Frontage</a></span></li><li><span><a href="#Masonry-Veneer" data-toc-modified-id="Masonry-Veneer-2.1.4"><span class="toc-item-num">2.1.4&nbsp;&nbsp;</span>Masonry Veneer</a></span></li><li><span><a href="#Basement" data-toc-modified-id="Basement-2.1.5"><span class="toc-item-num">2.1.5&nbsp;&nbsp;</span>Basement</a></span></li><li><span><a href="#Garage-Year-Built" data-toc-modified-id="Garage-Year-Built-2.1.6"><span class="toc-item-num">2.1.6&nbsp;&nbsp;</span>Garage Year Built</a></span></li></ul></li></ul></li><li><span><a href="#Exploratory-Data-Analysis" data-toc-modified-id="Exploratory-Data-Analysis-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Exploratory Data Analysis</a></span><ul class="toc-item"><li><span><a href="#Numerical-Columns" data-toc-modified-id="Numerical-Columns-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Numerical Columns</a></span></li><li><span><a href="#Polynomial-Features" data-toc-modified-id="Polynomial-Features-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Polynomial Features</a></span></li><li><span><a href="#Categorical-Columns" data-toc-modified-id="Categorical-Columns-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Categorical Columns</a></span></li></ul></li><li><span><a href="#Feature-Engineering" data-toc-modified-id="Feature-Engineering-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Feature Engineering</a></span><ul class="toc-item"><li><span><a href="#Datetime" data-toc-modified-id="Datetime-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Datetime</a></span></li><li><span><a href="#Missing-Data-Imputation-using-Regression" data-toc-modified-id="Missing-Data-Imputation-using-Regression-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Missing Data Imputation using Regression</a></span><ul class="toc-item"><li><span><a href="#Garage-Age" data-toc-modified-id="Garage-Age-4.2.1"><span class="toc-item-num">4.2.1&nbsp;&nbsp;</span>Garage Age</a></span></li><li><span><a href="#Preprocessing" data-toc-modified-id="Preprocessing-4.2.2"><span class="toc-item-num">4.2.2&nbsp;&nbsp;</span>Preprocessing</a></span></li><li><span><a href="#Feature-Selection-using-Lasso-Regression" data-toc-modified-id="Feature-Selection-using-Lasso-Regression-4.2.3"><span class="toc-item-num">4.2.3&nbsp;&nbsp;</span>Feature Selection using Lasso Regression</a></span></li><li><span><a href="#Garage-Age-Prediction-using-Linear-Regression" data-toc-modified-id="Garage-Age-Prediction-using-Linear-Regression-4.2.4"><span class="toc-item-num">4.2.4&nbsp;&nbsp;</span>Garage Age Prediction using Linear Regression</a></span></li><li><span><a href="#Data-Merging" data-toc-modified-id="Data-Merging-4.2.5"><span class="toc-item-num">4.2.5&nbsp;&nbsp;</span>Data Merging</a></span></li></ul></li><li><span><a href="#Data-Preprocessing" data-toc-modified-id="Data-Preprocessing-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Data Preprocessing</a></span></li></ul></li><li><span><a href="#Modelling" data-toc-modified-id="Modelling-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Modelling</a></span><ul class="toc-item"><li><span><a href="#Feature-Selection" data-toc-modified-id="Feature-Selection-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Feature Selection</a></span><ul class="toc-item"><li><span><a href="#Train-Test-Split" data-toc-modified-id="Train-Test-Split-5.1.1"><span class="toc-item-num">5.1.1&nbsp;&nbsp;</span>Train-Test Split</a></span></li><li><span><a href="#Standard-Scaling" data-toc-modified-id="Standard-Scaling-5.1.2"><span class="toc-item-num">5.1.2&nbsp;&nbsp;</span>Standard Scaling</a></span></li><li><span><a href="#Lasso-Regression" data-toc-modified-id="Lasso-Regression-5.1.3"><span class="toc-item-num">5.1.3&nbsp;&nbsp;</span>Lasso Regression</a></span></li><li><span><a href="#ElasticNet-Regression" data-toc-modified-id="ElasticNet-Regression-5.1.4"><span class="toc-item-num">5.1.4&nbsp;&nbsp;</span>ElasticNet Regression</a></span></li><li><span><a href="#Best-Model" data-toc-modified-id="Best-Model-5.1.5"><span class="toc-item-num">5.1.5&nbsp;&nbsp;</span>Best Model</a></span></li></ul></li><li><span><a href="#Model-Iteration" data-toc-modified-id="Model-Iteration-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Model Iteration</a></span><ul class="toc-item"><li><span><a href="#Multiple-Linear-Regression" data-toc-modified-id="Multiple-Linear-Regression-5.2.1"><span class="toc-item-num">5.2.1&nbsp;&nbsp;</span>Multiple Linear Regression</a></span></li><li><span><a href="#Ridge-Regression" data-toc-modified-id="Ridge-Regression-5.2.2"><span class="toc-item-num">5.2.2&nbsp;&nbsp;</span>Ridge Regression</a></span></li><li><span><a href="#Lasso-Regression" data-toc-modified-id="Lasso-Regression-5.2.3"><span class="toc-item-num">5.2.3&nbsp;&nbsp;</span>Lasso Regression</a></span></li><li><span><a href="#GridSearch" data-toc-modified-id="GridSearch-5.2.4"><span class="toc-item-num">5.2.4&nbsp;&nbsp;</span>GridSearch</a></span></li></ul></li></ul></li><li><span><a href="#Prediction" data-toc-modified-id="Prediction-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Prediction</a></span><ul class="toc-item"><li><span><a href="#Prediction-with-Best-Model" data-toc-modified-id="Prediction-with-Best-Model-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Prediction with Best Model</a></span></li><li><span><a href="#Submission-Data-Export" data-toc-modified-id="Submission-Data-Export-6.2"><span class="toc-item-num">6.2&nbsp;&nbsp;</span>Submission Data Export</a></span></li></ul></li><li><span><a href="#Inferential-Statistics" data-toc-modified-id="Inferential-Statistics-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Inferential Statistics</a></span></li><li><span><a href="#Conclusion" data-toc-modified-id="Conclusion-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Conclusion</a></span></li></ul></div>

## Initialization

### Libraries Import

In [None]:
# Vanilla Libraries

# Pandas Setting

In [None]:
# Machine Learning Libraries

In [None]:
# Data Reading

### Data Overview

In [None]:
# Data Shape

In [None]:
# Sanity Check

## Data Cleaning

### General Cleaning

#### Column Name

In [None]:
# Replace with standard PEP8 guidelines

#### General Missing Value Handling

In [None]:
# Replace all empty string with np.nan

# Cast numerical column back into float

In [None]:
# Empty columns

#### Lot Frontage

#### Masonry Veneer

#### Basement

#### Garage Year Built

## Exploratory Data Analysis

### Numerical Columns

In [None]:
# Drop id & pid

In [None]:
# Cast mssubclass into categorical

In [None]:
# Correlation Analysis

### Polynomial Features

In [None]:
# Polynomials of continuous feature

### Categorical Columns

In [None]:
# Convert Ordinal columns to Numerical columns

In [None]:
# Correlation Analysis / Boxplot

## Feature Engineering

### Datetime

In [None]:
# Years, DateTime

In [None]:
# Drop the original column

### Missing Data Imputation using Regression

#### Garage Age

In [None]:
# Garage Age

#### Preprocessing

In [None]:
# Standard Scaler

# Train-Test Split

#### Feature Selection using Lasso Regression

In [None]:
# Instantiation

# Data Training

# Metrics Evaluation

# Top Features for Prediction


#### Garage Age Prediction using Linear Regression

In [None]:
# Instantiation

# Data Training

# Metrics Evaluation


#### Data Merging

In [None]:
# Replace missing data with -1

# Replace -1 with prediction

### Data Preprocessing

In [None]:
# One-Hot Encoding

## Modelling

### Feature Selection

#### Train-Test Split

#### Standard Scaling

#### Lasso Regression

In [None]:
# Instantiation

# Data Training

# Metrics Evaluation


#### ElasticNet Regression

In [1]:
# Instantiation

# Data Training

# Metrics Evaluation


#### Best Model

### Model Iteration

#### Multiple Linear Regression

#### Ridge Regression

#### Lasso Regression

#### GridSearch

## Prediction

### Prediction with Best Model

### Submission Data Export

In [None]:
# Save to CSV for Kaggle Submission (without Index)


## Inferential Statistics

In [None]:
# Referring to Coefficient


## Conclusion