# Inherited House Price Prediction

## Objectives

* Use the Inherited model to predict the prices of inherited properties

## Inputs

* Trained and vaidated ML model, inherited houses data file

## Outputs

* outputs/datasets/predict_sale_price/predicted_prices_for_inherited_houses.csv

---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [3]:
import os
current_dir = os.getcwd()
current_dir

'/workspace/pp5_project_heritage_housing/jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [4]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [5]:
current_dir = os.getcwd()
current_dir

'/workspace/pp5_project_heritage_housing'

Import cleaned dataset and split it into train and test set

In [6]:
import pandas as pd
from sklearn.model_selection import train_test_split

# read data
df = pd.read_csv("outputs/datasets/collection/house_price_records.csv") 

# split data 80/20
X_train, X_test,y_train, y_test = train_test_split(
                                    df.drop(['SalePrice'], axis=1) ,
                                    df['SalePrice'],
                                    test_size=0.2,
                                    random_state=0
                                    )

print("* Train set:", X_train.shape, y_train.shape, "\n* Test set:",  X_test.shape, y_test.shape)


* Train set: (1168, 23) (1168,) 
* Test set: (292, 23) (292,)


## Inherited house price prediction

In [7]:
from src.data_management import load_pkl_file

# load the latest ml model pipeline that is optimised with only few best features
version = "v5"
ml_pipeline = load_pkl_file(
    f"outputs/ml_pipeline/predict_price/{version}/regression_pipeline.pkl"
)

# load best features from last saved pipeline, i.e. v5's train set
best_features = pd.read_csv(
    "outputs/ml_pipeline/predict_price/v5/X_train.csv"
).columns.to_list()

# Fit and transform to predict
ml_pipeline.fit(X_train[best_features], y_train)

# load raw dataset
X_inherited_house_data = pd.read_csv(
    "outputs/datasets/collection/inherited_houses.csv"
)

# predict house prices
y_predicted_price = ml_pipeline.predict(X_inherited_house_data[best_features])

# ==== End of core ML logic ====

# append the values
"Predicted Sale Price"] = y_predicted_price

In [8]:
X_inherited_house_data.to_csv("outputs/datasets/predict_sale_price/predicted_prices_for_inherited_houses.csv")

In [10]:
X_inherited_house_data


Unnamed: 0,1stFlrSF,2ndFlrSF,BedroomAbvGr,BsmtExposure,BsmtFinSF1,BsmtFinType1,BsmtUnfSF,EnclosedPorch,GarageArea,GarageFinish,...,LotFrontage,MasVnrArea,OpenPorchSF,OverallCond,OverallQual,TotalBsmtSF,WoodDeckSF,YearBuilt,YearRemodAdd,Predicted Sale Price
0,896,0,2,No,468.0,Rec,270.0,0,730.0,Unf,...,80.0,0.0,0,6,5,882.0,140,1961,1961,129450.897903
1,1329,0,3,No,923.0,ALQ,406.0,0,312.0,Unf,...,81.0,108.0,36,6,6,1329.0,393,1958,1958,157564.748372
2,928,701,3,No,791.0,GLQ,137.0,0,482.0,Fin,...,74.0,0.0,34,5,5,928.0,212,1997,1998,166747.546848
3,926,678,3,No,602.0,GLQ,324.0,0,470.0,Fin,...,78.0,20.0,36,6,6,926.0,360,1998,1998,177049.31788


# Conclusion

The predicted sale prices are now available for client to look at.