# Inherited House Price Prediction

## Objectives

* Use the Inherited model to predict the prices of inherited properties

## Inputs

* Trained and vaidated ML model, inherited houses data file

## Outputs

* output/inherited_houses_prices/inherited_houses_prices.csv

---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [None]:
import os
current_dir = os.getcwd()
current_dir

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [None]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

Confirm the new current directory

In [None]:
current_dir = os.getcwd()
current_dir

Import cleaned dataset and split it into train and test set

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split

# read data
df = pd.read_csv("outputs/datasets/collection/house_price_records.csv") 

# split data 80/20
X_train, X_test,y_train, y_test = train_test_split(
                                    df.drop(['SalePrice'], axis=1) ,
                                    df['SalePrice'],
                                    test_size=0.2,
                                    random_state=0
                                    )

print("* Train set:", X_train.shape, y_train.shape, "\n* Test set:",  X_test.shape, y_test.shape)


Import pipeline

In [12]:
from sklearn.pipeline import Pipeline

### Data Cleaning
from feature_engine.imputation import MeanMedianImputer
from feature_engine.selection import DropFeatures
from feature_engine.imputation import CategoricalImputer
from feature_engine.imputation import RandomSampleImputer

### Feature Engineering
from feature_engine.encoding import OrdinalEncoder
from feature_engine.outliers import Winsorizer

### Feature Scaling
from sklearn.preprocessing import StandardScaler

### Feature Selection
from sklearn.feature_selection import SelectFromModel

### Ml Algorithm
from sklearn.ensemble import GradientBoostingRegressor

### Create pipeline
ml_pipeline = Pipeline([
    # Data cleaning (copied from Data Cleaning notebook)
     ( 'drop',  DropFeatures(features_to_drop=['EnclosedPorch', 'GarageYrBlt', 'WoodDeckSF'])),
    ( 'categorical',  CategoricalImputer(imputation_method='missing',
                                     fill_value='None',
                                     variables=['GarageFinish'])),
    ( 'random_sample',  RandomSampleImputer(
                                     variables=['LotFrontage' ,
                                     'BsmtFinType1','2ndFlrSF','MasVnrArea'])),
    ( 'mean',  MeanMedianImputer(imputation_method='mean',
                                     variables=['BedroomAbvGr']) ),

    # Feature engineering (copied from Feature Engineering notebook)
    ( 'OrdinalCategoricalEncoder', OrdinalEncoder(encoding_method='arbitrary',
                                                variables = ['BsmtExposure',
                                                            'BsmtFinType1',
                                                            'GarageFinish',
                                                            'KitchenQual'])),
    ("Winsoriser_iqr",Winsorizer(capping_method='iqr', fold=3, tail='both', 
                                                  variables=['1stFlrSF',
                                                            'GarageArea',
                                                            'GrLivArea',
                                                            'YearBuilt',
                                                            'TotalBsmtSF',])),
    ("feat_scaling", StandardScaler()),
    ("feat_selection",  SelectFromModel(GradientBoostingRegressor(
                                        random_state=0,
                                        learning_rate=0.1,
                                        max_depth=3,
                                        min_samples_leaf=50,
                                        min_samples_split=2,
                                        n_estimators=140), threshold="0.75*mean")),
    ("model", GradientBoostingRegressor(random_state=0,
                                        learning_rate=0.1,
                                        max_depth=3,
                                        min_samples_leaf=50,
                                        min_samples_split=2,
                                        n_estimators=140)),
    ])