<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Data-Preparation-Part-1---NSW-Housing-Data" data-toc-modified-id="Data-Preparation-Part-1---NSW-Housing-Data-1">Data Preparation Part 1 - NSW Housing Data</a></span><ul class="toc-item"><li><span><a href="#STEP1.-Clean-Individual-Sales-and-Rent-File" data-toc-modified-id="STEP1.-Clean-Individual-Sales-and-Rent-File-1.1">STEP1. Clean Individual Sales and Rent File</a></span><ul class="toc-item"><li><span><a href="#1.1-Sales-Data" data-toc-modified-id="1.1-Sales-Data-1.1.1">1.1 Sales Data</a></span></li><li><span><a href="#1.2-Rent-Data" data-toc-modified-id="1.2-Rent-Data-1.1.2">1.2 Rent Data</a></span></li></ul></li><li><span><a href="#STEP2.-Merge-Sales-Files" data-toc-modified-id="STEP2.-Merge-Sales-Files-1.2">STEP2. Merge Sales Files</a></span></li><li><span><a href="#STEP3.-Merge-Rent-Files" data-toc-modified-id="STEP3.-Merge-Rent-Files-1.3">STEP3. Merge Rent Files</a></span></li><li><span><a href="#STEP4.-Create-Master-Stacked-Sales-&amp;-Rent-File" data-toc-modified-id="STEP4.-Create-Master-Stacked-Sales-&amp;-Rent-File-1.4">STEP4. Create Master Stacked Sales &amp; Rent File</a></span></li><li><span><a href="#STEP5:-Create-Master-Unstacked-Sales-&amp;-Rent-File" data-toc-modified-id="STEP5:-Create-Master-Unstacked-Sales-&amp;-Rent-File-1.5">STEP5: Create Master Unstacked Sales &amp; Rent File</a></span></li></ul></li></ul></div>

# Data Preparation Part 1 - NSW Housing Data

The purpose of this notebook is to:
1. Clean each individual quarterly sales file and rent files from Q3 2017 to Q1 2021 before merging all sales and rent data together.


2. Merge the cleaned sales files and the rent files into one complete housing dataset. This file will be in the **'stacked'** format, i.e., one quarter of data is stacked on top of another quarter's with multiple entry of the same postcodes in the index, and be used mainly for exploring the trends in the housing market.


3. Create the **unstacked** dataframe of the lastest 5 quarters (Q1 2020 - Q1 2021) of data for modelling, which will have the housing statistics from previous quarters on columns to be used as input variables together with demographic and economic features.

The end result of this notebook is two .csv data files:
* <i>Master_Sales_Rent_2017Q4_2021Q1.csv</i> (Stacked complete housing data Q4'17-Q1'21)
* <i>Pivot_Sales_Rent_5Quarters_SharedPOA.csv</i> (Unstacked complete housing data Q1'20-Q1'21)

Both are saved under `Files/Cleaned`.


In [2]:
import pandas as pd
import numpy as np 

import glob
import re
%matplotlib inline

## STEP1. Clean Individual Sales and Rent File

### 1.1 Sales Data

Define `salesCleanFn` function as follows:

In [None]:
def salesCleanFn(dataFolderString,osString):
    # Getting all the file names from a specified folder.
    
    #__________________________________________________________
    # If operating system is windows
    if osString == 'windows':
        dataDir = "Files\\"
        if dataFolderString[-1] != '\\':
        #if the dataFolderString does not have a forward slash, add a forward slash to the string
            fileNames = glob.glob(dataDir+dataFolderString+'\\'+'*.xlsx')
        else:
            fileNames = glob.glob(dataDir+dataFolderString+'*.xlsx')

    #__________________________________________________________
    # If operating system is mac / linux 
    else:
        dataDir = "Files/"
        if dataFolderString[-1] != '/':
            fileNames = glob.glob(dataDir+dataFolderString+'/'+'*.xlsx')
        else:
            fileNames = glob.glob(dataDir+dataFolderString+'*.xlsx')
#     print(fileNames)
    
    #__________________________________________________________
    # Looping through the fileNames to read the excel sheets. 
    frames = []
    masterDF = []

    for i, fileString in enumerate(fileNames):
        for j in range(0,8):
            df = []
            df = pd.read_excel(fileString, sheet_name="Postcode", na_values='-', header=j)

            if df.columns[0] == 'Postcode': #  checking if the 'header' parameter has the correct j value...

                # _____________________________________________________________________________________
                # adding additional columns
                regex = re.compile(r'\d+') #finds all numbers in string
                fileNumbers = regex.findall(fileString) #only works if the format of the 
                # filename is consistent. Stores the 'regex' numbers in a list.
                
                df['key'] = 's'+fileNumbers[2] # fileNumbers type = string
                year = fileNumbers[3] #type = string
                
                # the following statement searches for the month in the filename
                # if the find() function does not find the month, it returns '-1' and moves on to the next line.
                # Thus the use of !=-1. 
                if fileString.find('mar') !=-1 or fileString.find('Mar') !=-1:
                    quarter = 'Q1'
                elif fileString.find('jun') !=-1 or fileString.find('Jun') !=-1:
                    quarter = 'Q2'
                elif fileString.find('sep') !=-1 or fileString.find('Sep') !=-1:
                    quarter = 'Q3'
                elif fileString.find('dec') !=-1 or fileString.find('Dec') !=-1:
                    quarter = 'Q4'
                df['time_period'] = year + ' ' + quarter

                df['year'] = year

                df['quarter'] = quarter
                
                # some of the columns in the files are not the same, so we fix them here
                column = 'Quarterly change in Median Sales Price'
                newColumns = {'Quarterly change in Median Sales Price':'Qtly change in Median',
                             'Annual change in Median Sales Price':'Annual change in Median',
                             'Quarterly change in Count':'Qtly change in Count'}
        
                if column in df.columns:
                    df.rename(columns = newColumns, inplace= True)
                
                # finally, putting the DF into a list: frames
                frames.extend([df])
                
               


    # _____________________________________________________________________________________
    # putting all the DFs (frames) together to get a master DF
    masterDF = pd.concat(frames)
    # General cleaning
    rename_cols= {'Postcode':'postcode', 
             'Dwelling Type':'dwelling_type', 
             "First Quartile Sales Price\n$'000s" : '25%_price',
             "Median Sales Price\n$'000s" : 'median_price', 
             "Third Quartile Sales Price\n'000s" : '75%_price',
             "Mean Sales Price\n$'000s" : 'mean_price',
             'Sales\nNo.':'sales_no',
             'Qtly change in Median':'Qdelta_median',
             'Annual change in Median':'Adelta_median',
             'Qtly change in Count':'Qdelta_count',
             'Annual change in Count':'Adelta_count'}
    
    masterDF.rename(columns=rename_cols, inplace=True) #rename the columns for easier referencing



    masterDF = masterDF.drop(columns=['25%_price', '75%_price'], axis=1) # dropping unwanted columns
    
    masterDF.loc[masterDF['sales_no'].isnull(), 'sales_no'] = 5.0 #imputing NAN values. 5 is median of 0 and 10 being the 
    # range for null values in the dataset. 
    
    # fixing the NAN values in the median and mean columns 
    keys = list(masterDF['key'].unique())

    for k in keys:
    # Total
    # First, median
        k_impMedianTotal = masterDF.loc[(masterDF['median_price'].notna()) & 
                             (masterDF['dwelling_type']=='Total') &
                             (masterDF['key']==k),
                             'median_price'].median() # calculate imputer value 

        masterDF.loc[(masterDF['median_price'].isnull()) & 
                     (masterDF['dwelling_type']=='Total') &
                     (masterDF['key']==k),
                     'median_price']=k_impMedianTotal #impute

    # mean
        k_impMeanTotal = masterDF.loc[(masterDF['mean_price'].notna()) & 
                             (masterDF['dwelling_type']=='Total') &
                             (masterDF['key']==k),
                             'median_price'].median()

        masterDF.loc[(masterDF['mean_price'].isnull()) & 
                 (masterDF['dwelling_type']=='Total') &
                 (masterDF['key']==k),
                 'mean_price']=k_impMeanTotal #impute
#         print(k_impMeanTotal)
#         print('')
#         print(k)

    # Strata
    # First, median
        k_impMedianStrata = masterDF.loc[(masterDF['median_price'].notna()) & 
                             (masterDF['dwelling_type']=='Strata') &
                             (masterDF['key']==k),
                             'median_price'].median()


        masterDF.loc[(masterDF['median_price'].isnull()) & 
                     (masterDF['dwelling_type']=='Strata') &
                     (masterDF['key']==k),
                     'median_price']=k_impMedianStrata

    # mean
        k_impMeanStrata = masterDF.loc[(masterDF['mean_price'].notna()) & 
                             (masterDF['dwelling_type']=='Strata') &
                             (masterDF['key']==k),
                             'mean_price'].median()

        masterDF.loc[(masterDF['mean_price'].isnull()) & 
                     (masterDF['dwelling_type']=='Strata') &
                     (masterDF['key']==k),
                     'mean_price']=k_impMeanStrata

    # Non-Strata
    # First, median
        k_impMedianNonStrata = masterDF.loc[(masterDF['median_price'].notna()) & 
                             (masterDF['dwelling_type']=='Non Strata') &
                             (masterDF['key']==k),
                             'median_price'].median()

        masterDF.loc[(masterDF['median_price'].isnull()) & 
                     (masterDF['dwelling_type']=='Non Strata') &
                     (masterDF['key']==k),
                     'median_price']=k_impMedianNonStrata

    # mean
        k_impMeanNonStrata = masterDF.loc[(masterDF['mean_price'].notna()) & 
                             (masterDF['dwelling_type']=='Non Strata') &
                             (masterDF['key']==k),
                             'mean_price'].median()

        masterDF.loc[(masterDF['mean_price'].isnull()) & 
                     (masterDF['dwelling_type']=='Non Strata') &
                     (masterDF['key']==k),
                     'mean_price']=k_impMeanNonStrata
        continue

    masterDF.loc[masterDF['sales_no'] == 's', 'sales_no'] = 20.0 # Replace 's' with the median of 
    # 10 and 30 since there are quite a few

    masterDF['sales_no'] = masterDF['sales_no'].astype(float) # Cast type as float

    total = masterDF.loc[masterDF['dwelling_type']=='Total'] # Separate dwelling types
    strata = masterDF.loc[masterDF['dwelling_type']=='Strata']
    nstrata = masterDF.loc[masterDF['dwelling_type']=='Non Strata'] 

    return masterDF, total, strata, nstrata

Next we'll call the function and saving the dataframes as CSV. 


Below `old` denotes the files from **2017-2018** and `new` denotes the files from **2019-2021**. Sales files are cleaned in two batches (2017-2018, 2019-2021)because they take slightly different format in terms of the number of redundant rows before the actual table starts, and can be hard to be looped through in a single batch.

Rent files have the same issue and are cleaned in the same fashion.

In [None]:
salesOld, salesOld_total, salesOld_strata, salesOld_nStrata  = salesCleanFn('Sales/2017_2018', 'windows')

salesOld.to_csv('Files/sales_2017_2018')

# salesOld_total.to_csv('Files/salesTotal_2017_2018')
# salesOld_strata.to_csv('Files/salesStrata_2017_2018')
# salesOld_nStrata.to_csv('Files/salesNstrata_2017_2018')

In [None]:
salesNew, salesNew_total, salesNew_strata,  salesNew_nStrata =  salesCleanFn('Sales/2019_2021', 'windows')

salesNew.to_csv('Files/sales_2019_2021')

# salesNew_total.to_csv('Files/salesTotal_2019_2021')
# salesNew_strata.to_csv('Files/salesStrata_2019_2021')
# salesNew_nStrata.to_csv('Files/salesNstrata_2019_2021')

The resulting data files from the above process are:
* sales_2017_2018.csv
* sales_2019_2021.csv

Both are saved in the outlier of the `Files` directory.

### 1.2 Rent Data

Define `rentCleanFn` function as follows:

In [None]:
def rentCleanFn(dataFolderString,osString):
        
    #__________________________________________________________
    # If operating system is windows
    if osString == 'windows':
        dataDir = "Files\\"
        if dataFolderString[-1] != '\\':
        #if the dataFolderString does not have a forward slash, add a forward slash to the string
            fileNames = glob.glob(dataDir+dataFolderString+'\\'+'*.xlsx')
        else:
            fileNames = glob.glob(dataDir+dataFolderString+'*.xlsx')

    #__________________________________________________________
    # If operating system is mac / linux 
    else:
        dataDir = "Files/"
        if dataFolderString[-1] != '/':
            fileNames = glob.glob(dataDir+dataFolderString+'/'+'*.xlsx')
        else:
            fileNames = glob.glob(dataDir+dataFolderString+'*.xlsx')
#     print(fileNames)
    
    #__________________________________________________________
    # Looping through the fileNames to read the excel sheets. 
    frames = []
    masterDF = []
    
    for i, fileString in enumerate(fileNames):
        for j in range(0,8):
            df = []
            df = pd.read_excel(fileString, sheet_name="Postcode", na_values='-', header=j)

            if df.columns[0] == 'Postcode': # ...if the 'header' parameter has the correct j value...

                # adding additional columns
                regex = re.compile(r'\d+') #finds all numbers in string
                fileNumbers = regex.findall(fileString)
                
                df['key'] = 'r'+fileNumbers[2] # fileNumbers type = string
    
                
        
                # some of the columns in the files are not the same, so we fix them here
                column = 'Bedroom Numbers'
                newColumns = {'Bedroom Numbers':'Number of Bedrooms'}
        
                if column in df.columns:
                    df.rename(columns = newColumns, inplace= True)
                
                
                
                frames.extend([df])# putting the DF into a list, frames

            
              



    masterDF = pd.concat(frames)
    
    # droppinig this column as we've confirmed there's an issue with the raw csv file. 
    if 'Unnamed: 10' in masterDF.columns:
        masterDF=  masterDF.drop(columns='Unnamed: 10')

    # Drop unwanted columns
    masterDF = masterDF.drop(columns=['First Quartile Weekly Rent for New Bonds\n$',
                          'Third Quartile Weekly Rent for New Bonds\n$'],
                axis=1)
    
    # Rename columns
    rename_cols= {'Postcode':'postcode',
                  'Dwelling Types':'dwelling_type', 
                  'Number of Bedrooms':'bed_number',
                  'Median Weekly Rent for New Bonds\n$': 'median_rent_newb',
                  'New Bonds Lodged\nNo.' : 'new_bonds_no',
                  'Total Bonds Held\nNo.': 'total_bonds_no',
                  'Quarterly change in Median Weekly Rent':'Qdelta_median_rent',
                  'Annual change in Median Weekly Rent':'Adelta_median_rent',
                  'Quarterly change in New Bonds Lodged':'Qdelta_new_bonds',
                  'Annual change in New Bonds Lodged':'Adelta_new_bonds'}
    
    masterDF.rename(columns=rename_cols,inplace=True)

    masterDF_ag = masterDF.loc[(masterDF['bed_number']=='Total') & (masterDF['dwelling_type']=='Total')]
    masterDF_ag = masterDF_ag.drop(columns=['bed_number','dwelling_type'], axis=1)
    
    # Impute 's' in 'new_bonds_no' and 'total_bonds_no' with 20
    masterDF_ag.loc[masterDF_ag['new_bonds_no']=='s','new_bonds_no'] = 20.0
    masterDF_ag.loc[masterDF_ag['total_bonds_no']=='s', 'total_bonds_no'] = 20.0

    # Impute na in 'new_bonds_no' and 'total_bonds_no' with 5
    masterDF_ag.loc[masterDF_ag['new_bonds_no'].isnull(),'new_bonds_no'] = 5.0
    masterDF_ag.loc[masterDF_ag['total_bonds_no'].isnull(), 'total_bonds_no'] = 5.0

    # Cast both variables as float (was object)
    masterDF_ag['new_bonds_no'] = masterDF_ag['new_bonds_no'].astype(float)
    masterDF_ag['total_bonds_no'] = masterDF_ag['total_bonds_no'].astype(float)

    # Impute na in 'median_rent' with median of the column
    masterDF_ag['median_rent_newb'].fillna(masterDF_ag['median_rent_newb'].median(), inplace=True)
    

    # Set postcode as index
    
    masterDF_ag = masterDF_ag.set_index('postcode')
    return masterDF_ag

Calling the function and saving the dataframes as CSV.

As explained in previous section, rent files are cleaned in two batches as well and saved as two separate csv files for future use.
* rent_2017_2018.csv
* rent_2019_2021.csv

In [None]:
rentNew = rentCleanFn('Rent/2019_2021', 'windows')
rentOld.to_csv('Files/rent_2017_2018')

In [None]:
rentOld  = rentCleanFn('Rent/2017_2018', 'windows')
rentNew.to_csv('Files/rent_2019_2021')

## STEP2. Merge Sales Files

Merge the cleaned `sales 2017-2018` and `sales 2019-2020` from step1.

In [3]:
# Read in sales 2017-2018

sales17_18 = "Files/sales_2017_2018"
sales17_18 = pd.read_csv(sales17_18, usecols=['postcode', 'dwelling_type', 'median_price', 'mean_price',
                                               'sales_no', 'Qdelta_median', 'Adelta_median', 'Qdelta_count',
                                               'Adelta_count', 'key', 'time_period', 'year', 'quarter'])

# Re-arrange column order
cols = ['postcode', 'key', 'time_period', 'year', 'quarter', 
        'dwelling_type', 'median_price', 'mean_price','sales_no', 
        'Qdelta_median', 'Adelta_median', 'Qdelta_count','Adelta_count' ]

sales17_18 = sales17_18[cols]
sales17_18.head(1)

FileNotFoundError: [Errno 2] No such file or directory: 'Files/sales_2017_2018'

In [None]:
# Read in sales 2019-2021

sales19_21 = "Files/sales_2019_2021"
sales19_21 = pd.read_csv(sales19_21, usecols=['postcode', 'dwelling_type', 'median_price', 'mean_price',
                                               'sales_no', 'Qdelta_median', 'Adelta_median', 'Qdelta_count',
                                               'Adelta_count', 'key', 'time_period', 'year', 'quarter'])
# Re-arrange column order
cols = ['postcode', 'key', 'time_period', 'year', 'quarter', 
        'dwelling_type', 'median_price', 'mean_price','sales_no', 
        'Qdelta_median', 'Adelta_median', 'Qdelta_count','Adelta_count' ]

sales19_21 = sales19_21[cols]
sales19_21.head(1)

In [None]:
# Concatenate the two sales files 
sales_full = pd.concat([sales17_18, sales19_21])

# Check if all quarters are present
print(sales_full.groupby('time_period').size())

In [None]:
# Check null values
sales_full.isnull().sum()

The delta variables will later be removed in the actual analysis but was kept here just in case they're of any use. Hence, we were not too concerned about the nulls in them and didn't do anything to clean them up.

In [None]:
sales_full.head()

Save the complete sales dataset as .csv for later easier reference.

In [None]:
sales_full.to_csv('Files/Cleaned/Sales_2017Q3_2021Q1_Clean.csv', index=False)

## STEP3. Merge Rent Files

Merge the cleaned `rent 2017-2018` and `rent 2019-2020` from step1.

In [None]:
# Read rent 2017-2018
rent17_18 = "Files/rent_2017_2018"
rent17_18 = pd.read_csv(rent17_18)

# Read rent 2019-2021
rent19_21 = "Files/rent_2019_2021"
rent19_21 = pd.read_csv(rent19_21)

# Concat both rent files
rent_full = pd.concat([rent17_18, rent19_21])
rent_full.head()

In [None]:
# Check null values
rent_full.isnull().sum()

Again, we don't really care about the null values in the delta variables as they'll later be dropped in the analysis.

In [None]:
# Check all quarters are presented
rent_full.groupby('key').size()

**NOTE:** rkey = skey-1 for the same quarter

In [None]:
# Map keys to time_periods so that can merge it with sales data later

tp = ['2017 Q3', '2017 Q4', 
      '2018 Q1', '2018 Q2', '2018 Q3', '2018 Q4', 
      '2019 Q1', '2019 Q2', '2019 Q3', '2019 Q4', 
      '2020 Q1', '2020 Q2', '2020 Q3', '2020 Q4', 
      '2021 Q1']
rkeys = ['r121','r122',
         'r123','r124','r125','r126',
         'r127','r128','r129','r130',
         'r131','r132','r133','r134',
         'r135']

rent_full['time_period'] = np.nan

for i in list(range(0,15)):
    rent_full.loc[rent_full['key']==rkeys[i], 'time_period']=tp[i]
    
rent_full.groupby('time_period').size()

In [None]:
# Update column name of 'key'
rent_full = rent_full.rename(columns={'key':'rkey'})


# Change columns order
cols = ['postcode', 'rkey', 'time_period', 'median_rent_newb', 'new_bonds_no', 'total_bonds_no',
       'Qdelta_median_rent', 'Qdelta_new_bonds', 'Adelta_median_rent',
       'Adelta_new_bonds']
rent_full = rent_full[cols]

rent_full.head()

In [None]:
# Save full rent data into csv
rent_full.to_csv('Files/Cleaned/Rent_2017Q4_2021Q2_Clean.csv', index=False)

## STEP4. Create Master Stacked Sales & Rent File

In [None]:
# Change the name of 'key' column in the sales files from 'key' to 'skey' 
# to differentiate from 'rkey' in rent
sales_full = sales_full.rename(columns={'key':'skey'})

In [None]:
# Join the full sales data DF and the full rent data DF

sales_rent_full = pd.merge(sales_full, rent_full, how='left',
                           left_on=['postcode','time_period'],
                           right_on=['postcode', 'time_period'])

sales_rent_full.head()

In [None]:
print(sales_rent_full.shape)
print(sales_rent_full.groupby('time_period').size())

In [None]:
print(sales_rent_full.isnull().sum())

In [None]:
# Check the 6 postcodes that are null in rent:
sales_rent_full.loc[sales_rent_full['rkey'].isnull()]

In [None]:
sales_rent_full.to_csv('Files/Cleaned/Master_Sales_Rent_2017Q4_2021Q1.csv', index=False)

In [None]:
sales_rent_full.head()

##  STEP5: Create Master Unstacked Sales & Rent File

We've decided to only include housing data from the previous year or the four quarters (i.e. Q1 2020 to Q4 2020) as predictor variables for the Q1 2021 prices. Hence, in the unstacked data file we're only keeping 5 quarters (incl. Q1 2021). 

In [None]:
# Create a subset that only contains 5 Quarters of data
subset = sales_rent_full.loc[sales_rent_full['time_period'].isin(['2020 Q1','2020 Q2','2020 Q3','2020 Q4','2021 Q1'])]

# Get some of the (potentially) unnecessary variables
subset = subset.drop(columns=['Qdelta_median','Adelta_median','Qdelta_count','Adelta_count',
                              'Qdelta_median_rent', 'Adelta_median_rent','Qdelta_new_bonds','Adelta_new_bonds'],
                     axis=1)

# And only keep 'Total' dwelling type (i.e. get rid of Strata and Non-strata)
subset_total = subset.loc[subset['dwelling_type'] == 'Total']

print(subset_total.groupby('time_period').size(),'\n')
print(subset_total.groupby('dwelling_type').size(),'\n')
print(subset_total.shape)

In [None]:
print(subset_total.isnull().sum())

In [None]:
print("number of postcode in Q4 2020:",
      subset_total.loc[subset_total['time_period']=='2020 Q4', 'postcode'].nunique())
print("number of postcode in Q1 2021:",
      subset_total.loc[subset_total['time_period']=='2021 Q1', 'postcode'].nunique())

In [None]:
pivot = subset_total.pivot_table(index='postcode', columns='time_period', values=
                                 ['median_price', 'mean_price', 'sales_no', 
                                  'median_rent_newb','new_bonds_no', 'total_bonds_no']).round(2)

pivot.head()

In [None]:
pivot.columns = [' '.join(col) for col in pivot.columns]
pivot.head()

In [None]:
print(pivot.shape)
print(pivot.isnull().sum())

Because each quarterly table has different numbers of postcodes and they're not an exact match from quarter to quarter, there will be additional null values generated after pivoting simply due to certain postcodes having data in some quarters but not in others. To serve sample size, we're going to impute the data.

In [None]:
pivot_fillna = pivot.fillna(pivot.median()).round(1)
pivot_fillna.isnull().sum()

In [None]:
pivot_fillna.reset_index(inplace=True)
pivot_fillna.to_csv('Files/Cleaned/Pivot_Sales_Rent_5Quarters_Imputed.csv', index=False)

In [None]:
pivot = subset_total.pivot_table(index='postcode', columns='time_period', values=
                                 ['median_price', 'mean_price', 'sales_no', 
                                  'median_rent_newb','new_bonds_no', 'total_bonds_no']).round(2)

pivot.head()

In [None]:
pivot.columns = [' '.join(col) for col in pivot.columns]
pivot.head()

In [None]:
print(pivot.shape)
print(pivot.isnull().sum())

In [None]:
pivot_fillna = pivot.fillna(pivot.median()).round(1)
pivot_fillna.isnull().sum()

In [None]:
pivot_fillna.reset_index(inplace=True)
pivot_fillna.to_csv('Files/Cleaned/Pivot_Sales_Rent_5Quarters_Imputed.csv', index=False)