# **Predict Sale Price**

## Objectives

* Create Regressor Pipeline
* Hyperparameter optimization
* Grid Search CV - Sklearn
* Train ML pipeline to predict property sale price.

## Inputs

* outputs/datasets/cleaned/clean_house_price_records.csv

## Outputs

* ML pipeline
* Feature Importance Plot



---

# Change working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [1]:
import os
current_dir = os.getcwd()
current_dir

'/workspaces/project-portfolio-5/jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [2]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [3]:
current_dir = os.getcwd()
current_dir

'/workspaces/project-portfolio-5'

# Import Packages

In [4]:
import numpy as np
import pandas as pd

# Load Data

First we need to merge train and test sets into a new dataset that we'll call clean_house_price_records.

In [7]:
train_data = pd.read_csv('outputs/datasets/cleaned/train_set.csv')
test_data = pd.read_csv('outputs/datasets/cleaned/test_set.csv')

merged_data = pd.concat([train_data, test_data], axis=0)

merged_data.to_csv('outputs/datasets/cleaned/clean_house_price_records.csv', index=False)

Now we can load the data.

In [8]:
df = pd.read_csv(f"outputs/datasets/cleaned/clean_house_price_records.csv")
df.head()

Unnamed: 0,1stFlrSF,2ndFlrSF,BedroomAbvGr,BsmtExposure,BsmtFinSF1,BsmtFinType1,BsmtUnfSF,GarageArea,GarageFinish,GarageYrBlt,...,LotArea,LotFrontage,MasVnrArea,OpenPorchSF,OverallCond,OverallQual,TotalBsmtSF,YearBuilt,YearRemodAdd,SalePrice
0,1828,0,0,Av,48,Unk,1774,774,Unf,2007,...,11694,90,452,108,5,9,1822,2007,2007,314813
1,894,0,2,No,0,Unf,894,308,Unf,1962,...,6600,60,0,0,5,5,894,1962,1962,109500
2,964,0,2,No,713,ALQ,163,432,Unf,1921,...,13360,80,0,0,7,5,876,1921,2006,163500
3,1689,0,3,No,1218,GLQ,350,857,RFn,2002,...,13265,69,148,59,5,8,1568,2002,2002,271000
4,1541,0,3,No,0,Unf,1541,843,RFn,2001,...,13704,118,150,81,5,7,1541,2001,2002,205000


## ML Regressor Pipeline

In [None]:
from sklearn.pipeline import Pipeline

### Feature Engineering
from feature_engine.encoding import OrdinalEncoder
from feature_engine.selection import SmartCorrelatedSelection
from feature_engine import transformation as vt

### Feat Scaling
from sklearn.preprocessing import StandardScaler

### Feat Selection
from sklearn.feature_selection import SelectFromModel

### ML algorithms
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from xgboost import XGBRegressor


def PipelineOptimization(model):
    pipeline_base = Pipeline([

      ("OrdinalCategoricalEncoder", OrdinalEncoder(encoding_method='arbitrary',
                                                   variables=['BsmtExposure',
                                                              'BsmtFinType1',
                                                              'GarageFinish',
                                                              'KitchenQual'])),

      ("NumericLogTransform", vt.LogTransformer(variables=['1stFlrSF',
                                                           'LotArea',
                                                           'GrLivArea'])),
      ("NumericPowerTransform", vt.PowerTransformer(variables=['MasVnrArea'])),
      ("NumericYeoJohnsonTransform",
       vt.YeoJohnsonTransformer(variables=['OpenPorchSF'])),

      ("SmartCorrelatedSelection",
       SmartCorrelatedSelection(variables=None,
                                method="spearman",
                                threshold=0.6,
                                selection_method="cardinality"
                                )),

      ("feat_scaling", StandardScaler()),

      ("feat_selection",  SelectFromModel(model)),

      ("model", model),

    ])

    return pipeline_base

---

---

# Push files to Repo

* If you do not need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
  # create here your folder
  # os.makedirs(name='')
except Exception as e:
  print(e)
