<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#1.-Bond-Yields" data-toc-modified-id="1.-Bond-Yields-0.1"><span class="toc-item-num">0.1&nbsp;&nbsp;</span>1. Bond Yields</a></span><ul class="toc-item"><li><span><a href="#@all-target-df-to-be-defined" data-toc-modified-id="@all-target-df-to-be-defined-0.1.1"><span class="toc-item-num">0.1.1&nbsp;&nbsp;</span>@all target df to be defined</a></span></li></ul></li><li><span><a href="#2.-Interest-Rate" data-toc-modified-id="2.-Interest-Rate-0.2"><span class="toc-item-num">0.2&nbsp;&nbsp;</span>2. Interest Rate</a></span><ul class="toc-item"><li><span><a href="#@all-target-df-do-be-defined" data-toc-modified-id="@all-target-df-do-be-defined-0.2.1"><span class="toc-item-num">0.2.1&nbsp;&nbsp;</span>@all target df do be defined</a></span></li></ul></li><li><span><a href="#3.-Population-/-Age-bands" data-toc-modified-id="3.-Population-/-Age-bands-0.3"><span class="toc-item-num">0.3&nbsp;&nbsp;</span>3. Population / Age bands</a></span><ul class="toc-item"><li><span><a href="#Household-count" data-toc-modified-id="Household-count-0.3.1"><span class="toc-item-num">0.3.1&nbsp;&nbsp;</span>Household count</a></span></li><li><span><a href="#Population-movement-in-5-year-period" data-toc-modified-id="Population-movement-in-5-year-period-0.3.2"><span class="toc-item-num">0.3.2&nbsp;&nbsp;</span>Population movement in 5 year period</a></span></li><li><span><a href="#Population-Age" data-toc-modified-id="Population-Age-0.3.3"><span class="toc-item-num">0.3.3&nbsp;&nbsp;</span>Population Age</a></span></li></ul></li><li><span><a href="#4.-Construction" data-toc-modified-id="4.-Construction-0.4"><span class="toc-item-num">0.4&nbsp;&nbsp;</span>4. Construction</a></span></li><li><span><a href="#5.-Weekly-Income" data-toc-modified-id="5.-Weekly-Income-0.5"><span class="toc-item-num">0.5&nbsp;&nbsp;</span>5. Weekly Income</a></span></li><li><span><a href="#6.-Household-size" data-toc-modified-id="6.-Household-size-0.6"><span class="toc-item-num">0.6&nbsp;&nbsp;</span>6. Household size</a></span></li><li><span><a href="#7.-Additional-Feature-1" data-toc-modified-id="7.-Additional-Feature-1-0.7"><span class="toc-item-num">0.7&nbsp;&nbsp;</span>7. Additional Feature 1</a></span></li><li><span><a href="#8.-Additional-Feature-2" data-toc-modified-id="8.-Additional-Feature-2-0.8"><span class="toc-item-num">0.8&nbsp;&nbsp;</span>8. Additional Feature 2</a></span></li><li><span><a href="#9.-Additional-Feature-3" data-toc-modified-id="9.-Additional-Feature-3-0.9"><span class="toc-item-num">0.9&nbsp;&nbsp;</span>9. Additional Feature 3</a></span></li></ul></li><li><span><a href="#USE-BELOW-CELL-TO-MERGE-FEATURES-INTO-THE-MASTER-DF,-IGNORE-FOR-NOW" data-toc-modified-id="USE-BELOW-CELL-TO-MERGE-FEATURES-INTO-THE-MASTER-DF,-IGNORE-FOR-NOW-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>USE BELOW CELL TO MERGE FEATURES INTO THE MASTER DF, IGNORE FOR NOW</a></span></li></ul></div>

**The objective of this notebook is to collate the codes for cleaning below data:**
1. Bond yields
2. Ineterest rate
3. Population
4. Construction
5. Weekly income
6. Household size

**and merge all features into a complete feature set at the end.**

In [None]:
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns

## I. Feature Cleaning

### I-1. Bond Yields ###
`yields_join`

In [None]:
# read data in 
yields_data = "Files/Bond Yields/f02hist.xls"
yields = pd.read_excel(yields_data, sheet_name='Data', usecols='A:B,E', header=None, skiprows=range(0,12))
yields.columns = ['Date', '2yBonds%', '10yBonds%']
yields.head()

In [None]:
# split date column into year and month
dates = pd.to_datetime(yields["Date"])
yields["Year"] = dates.dt.year
yields["Quarter"] = dates.dt.quarter

# set datetime as index
yields.set_index('Date', inplace = True)

In [None]:
# create new column with average rate per quarter
yields_quarter_rates = yields.resample('QS').mean()
yields_quarter_rates.head(2)

In [None]:
# convert year and quarter to int
yields_quarter_rates["Year"] = yields_quarter_rates["Year"].astype(int)
yields_quarter_rates["Quarter"] = yields_quarter_rates["Quarter"].astype(int)

# create time period variable from 'Year' and 'Quarter'
yields_quarter_rates["time_period"] = yields_quarter_rates["Year"].map(str) + " Q" + yields_quarter_rates["Quarter"].map(str)

# Drop 'Year' and Quarter 
yields_join = yields_quarter_rates[["time_period", "2yBonds%", "10yBonds%"]] 
yields_join.head()

The resulting cleaned bond yield df is **yields_join**.

----

### I-2. Interest Rate ###
`rates_join`  

In [None]:
# read data in
interest = "Files/Interest Rates/f01d.xls"
interest = pd.read_excel(interest, sheet_name = "Data", usecols = "A:B", header = None, skiprows = range(0,12))
interest.columns = ['Date', 'Rate']

# get date time
dates = pd.to_datetime(interest["Date"])
interest["Year"] = dates.dt.year
interest["Quarter"] = dates.dt.quarter

# set date as index
interest.set_index('Date', inplace = True)


# calculate average
quarter_rates = interest.resample('QS').mean()

# create new column with average rate per quarter
quarter_rates["Year"] = quarter_rates["Year"].astype(int)
quarter_rates["Quarter"] = quarter_rates["Quarter"].astype(int)

# time period
quarter_rates["time_period"] = quarter_rates["Year"].map(str) + " Q" + quarter_rates["Quarter"].map(str)

# remove Year and Quarter
quarter_rates = quarter_rates[['Rate','time_period']]
quarter_rates.head()

In [None]:
# read data in
interest = "Files/Interest Rates/f01d.xls"

interest = pd.read_excel(interest, sheet_name = "Data", usecols = "A:B", header = None, skiprows = range(0,12))
interest.columns = ['Date', 'Rate']
interest.head()

In [None]:
# check data types
interest.dtypes

In [None]:
# split date column into year and month
dates = pd.to_datetime(interest["Date"])
interest["Year"] = dates.dt.year
interest["Quarter"] = dates.dt.quarter

# set datetime as index
interest.set_index('Date', inplace = True)

In [None]:
# create new column with average rate per quarter
quarter_rates = interest.resample('QS').mean()
quarter_rates.head(2)

In [None]:
# convert year and quarter to int
quarter_rates["Year"] = quarter_rates["Year"].astype(int)
quarter_rates["Quarter"] = quarter_rates["Quarter"].astype(int)

# create time period variable for join from 'Year' and 'Quarter'
quarter_rates["time_period"] = quarter_rates["Year"].map(str) + " Q" + quarter_rates["Quarter"].map(str)
quarter_rates.head()

# Drop 'Year' and 'Quarter'
rates_join = quarter_rates[["time_period", "Rate"]]
rates_join.head(2)

The resulting cleaned bond yield df is **rates_join**.

----

### I-3. Construction  
`df_cons_clean`

In [None]:
# --read file, --rename columns
construction_file = "Files/Construction/Quarterly, Building construction prices rose, due to Homebuilder grants and government infrastructure investment.xlsx"
df_cons = pd.read_excel(construction_file,header=1,usecols="A:B", skipfooter=2)
df_cons.columns=['date','constr_index']

# --convert to datetime
df_cons['date'] = pd.to_datetime(df_cons['date'],format='%b-%y')

# --get year and quarter, --concatenate as time_period format, --drop other columns
df_cons['year'] = df_cons.date.dt.year
df_cons['quarter'] = df_cons.date.dt.quarter
df_cons['time_period'] = df_cons.year.map(str) + " Q" + df_cons.quarter.map(str)
df_cons_clean = df_cons.drop(columns=['date','year','quarter'],axis=1)
df_cons_clean.head()

### I-4. Weekly Income
`incp_gr`

In [None]:
# Read data in to the raw da
census_INCP = "Files/Census/POA (UR) by INCP Toal Personal Income (Weekly).csv"

incp_raw = pd.read_csv(census_INCP, skiprows=9, nrows=11142,
                       usecols=['POA (UR)', 'INCP Total Personal Income (weekly)', 'Count'])

# Rename column for easier referencing
incp_cols = {'POA (UR)':'postcode', 'INCP Total Personal Income (weekly)':'INCP_WK'}
incp_raw.rename(columns=incp_cols, inplace=True)

# Unstack
incp = incp_raw.groupby(['postcode','INCP_WK'])['Count'].sum().unstack()

# Remove the last row (grand total)
incp = incp[:-1]

incp.head(2)

In [None]:
# Remove 'NSW' in the index and cast postcode to int64
incp.reset_index(inplace=True)
incp['postcode'] = incp['postcode'].str.split(",").str.get(0)
incp['postcode'] = incp['postcode'].astype('int64')
incp = incp.set_index('postcode')

In [None]:
# Clean column names
income_cols= {'$1,000-$1,249 ($52,000-$64,999)' : '$1000-1249', 
            '$1,250-$1,499 ($65,000-$77,999)' : '$1250-1499',
            '$1,500-$1,749 ($78,000-$90,999)' : '$1500-1749 ', 
            '$1,750-$1,999 ($91,000-$103,999)': '$1750-1999',
            '$1-$149 ($1-$7,799)': '$1-149', 
            '$150-$299 ($7,800-$15,599)' : '$150-299',
            '$2,000-$2,999 ($104,000-$155,999)':'$2000-2999',
            '$3,000 or more ($156,000 or more)':'>=$3000', 
            '$300-$399 ($15,600-$20,799)':'$300-399',
            '$400-$499 ($20,800-$25,999)':'$400-499', 
            '$500-$649 ($26,000-$33,799)':'$500-649',
            '$650-$799 ($33,800-$41,599)':'$650-799', 
            '$800-$999 ($41,600-$51,999)':'$800-999'}
incp.rename(columns=income_cols, inplace=True)

# Combine 'not applicable' and 'not stated' into 'total_na'
incp['total_na'] = incp['Not applicable'] + incp['Not stated']

# Drop the 'Total column'
incp = incp.drop(columns=['Not applicable', 'Not stated', 'Total'], axis=1)

# Reorder columns
cols = incp.columns.tolist()
cols = ['$1-149','$150-299','$300-399','$400-499','$500-649','$650-799',
        '$800-999','$1000-1249','$1250-1499','$1500-1749 ',
        '$1750-1999','$2000-2999','>=$3000',
        'Negative income','Nil income','total_na']
incp=incp[cols]

incp.head(1)

In [None]:
# Create income buckets and save into incp_gr
incp['INCP_LOW'] = incp.iloc[:, 0:6].sum(axis=1)
incp['INCP_MID'] = incp.iloc[:, 6:10].sum(axis=1)
incp['INCP_HIGH'] = incp.iloc[:, 10:13].sum(axis=1)
incp['INCP_NEG_NIL'] = incp.iloc[:, 13:15].sum(axis=1)
incp_gr = incp[['INCP_LOW', 'INCP_MID', 'INCP_HIGH', 'INCP_NEG_NIL']]

# Reset index
incp_gr.reset_index(inplace=True)

incp_gr.head(1)

In [None]:
incp.iloc[:, 10:13]

*The resulting cleanead df is <b>incp_gr</b>*

----

### I-5. Household size
`cprf`

In [None]:
# Read data
census_cprf = "Files/Census/POA by CPRF Count of Persons in Family by STATE.xlsx"
cprf = pd.read_excel(census_cprf, sheet_name="Data Sheet 0", skiprows=9, nrows=619)

# Remove redundant rows and columns 
cprf = cprf[1:] #remove the first row
cprf = cprf.drop(columns='CPRF Count of Persons in Family') # remove the first column

# Rename columns
cprf_cols= {'Unnamed: 1' : 'postcode', 
            'Two persons in family' : 'CPRF_2',
            'Three persons in family' : 'CPRF_3', 
            'Four persons in family': 'CPRF_4',
            'Five persons in family': 'CPRF_5', 
            'Six or more persons in family' : 'CPRF_6+',
            'Not applicable':'CPRF_na',
            'Total' :'CPRF_HHOLD_NO'}
cprf.rename(columns=cprf_cols, inplace=True)

cprf.head(1)

In [None]:
# Remove 'NSW' in the index and cast postcode to int64
cprf.reset_index(inplace=True)
cprf['postcode'] = cprf['postcode'].str.split(",").str.get(0)
cprf['postcode'] = cprf['postcode'].astype('int64')
cprf = cprf.set_index('postcode')
cprf = cprf.drop(columns='index', axis=1)

In [None]:
# Reset index for merging
cprf.reset_index(inplace=True)
cprf.head(1)

*The resulting cleanead df is <b>cprf</b>*

----

### I-7 Population by Age
`age_gr`

In [None]:
census_age5p = "Files/Census/POA (UR) by AGE5P - Age in Five Year Groups.xlsx"
age = pd.read_excel(census_age5p, sheet_name="Data Sheet 0", skiprows=8, nrows=619)

# Remove redundant rows and columns 
age = age[1:] #remove the first row
age = age.drop(columns='AGE5P - Age in Five Year Groups') # remove the first column

# Rename columns
age.rename(columns={"Unnamed: 1":"postcode"}, inplace=True)

age.head(1)

In [None]:
# Remove 'NSW' in the index and cast postcode to int64
age.reset_index(inplace=True)
age['postcode'] = age['postcode'].str.split(",").str.get(0)
age['postcode'] = age['postcode'].astype('int64')
age = age.set_index('postcode')
age = age.drop(columns='index', axis=1)

In [None]:
# Create age brackets 

age['0-4yo'] = age.iloc[:, 0:1].sum(axis=1)
age['5-14yo'] = age.iloc[:, 1:3].sum(axis=1)
age['15-24yo'] = age.iloc[:, 3:5].sum(axis=1)
age['25-34yo'] = age.iloc[:, 5:7].sum(axis=1)
age['35-54yo'] = age.iloc[:, 7:11].sum(axis=1)
age['55-64yo'] = age.iloc[:, 11:13].sum(axis=1)
age['65+yo'] = age.iloc[:, 13:21].sum(axis=1)
age['population_2016']=age.iloc[:, 21:22]

age_gr = age[['0-4yo', '5-14yo', '15-24yo', '25-34yo', '35-54yo', '55-64yo','65+yo', 'population_2016']]

In [None]:
# Reset index for merging
age_gr.reset_index(inplace=True)
age_gr.head(1)

### I-8 Cultural Diversity
`cald`

#### I-8-1 Australian Citizenship
`citp`

In [None]:
census_citp = "Files/Census/POA (UR) by CITP Australian Citizenship by STATE (UR).xlsx"
citp = pd.read_excel(census_citp, sheet_name="Data Sheet 0", skiprows=9, nrows=619)


# Remove redundant rows and columns 
citp = citp[1:] #remove the first row
citp = citp.drop(columns='CITP Australian Citizenship') # remove the first column


# Rename columns
citp.rename(columns={"Unnamed: 1":"postcode",
                    "Australian":"citizen_AU",
                     "Not Australian":"citizen_non_AU"}, inplace=True)

# Remove 'not stated' and total
citp = citp.drop(columns=["Not stated",'Total'])
citp.tail(1)

#### I-8-2 Indigenous Status
`ing`

In [None]:
census_ing = "Files/census/POA (UR) by INGP Indigenous Status by STATE (UR).xlsx"
ing = pd.read_excel(census_ing, sheet_name="Data Sheet 0", skiprows=9, nrows=619)

# Remove redundant rows and columns 
ing = ing[1:] #remove the first row
ing = ing.drop(columns=['INGP Indigenous Status',"Non-Indigenous","Not stated", "Total"])

# Rename columns
ing.rename(columns={"Unnamed: 1":"postcode"}, inplace=True)

ing.head(1)

In [None]:
# Create a total column for all aboriginal and Torres Strait Islanders
cols = ['Aboriginal', 'Torres Strait Islander','Both Aboriginal and Torres Strait Islander']
ing['ATSI']=ing.loc[:,cols].sum(axis=1)

# Drop cols
ing = ing[['postcode', 'ATSI']]

ing.head()

#### I-8-3 Year of Arrival in Australia
`yarrp_gr`

In [None]:
census_yarrp = "Files/census/POA (UR) by YARRP Year of Arrival in Australia.xlsx"
yarrp = pd.read_excel(census_yarrp, sheet_name="Data Sheet 0", skiprows=8, nrows=619)

# Remove redundant rows and columns 
yarrp = yarrp[1:] #remove the first row
yarrp = yarrp.drop(columns=['YARRP Year of Arrival in Australia (ranges)',
                        "Not stated", "Not applicable", "Total"])

# Rename columns
yarrp.rename(columns={"Unnamed: 1":"postcode","Arrived 1996-2005":"YARRP 1996-2005"}, inplace=True)

yarrp.head(1)

In [None]:
# Create age brackets 

yarrp['YARRP <1975'] = yarrp.iloc[:, 1:5].sum(axis=1)
yarrp['YARRP 1976-1995'] = yarrp.iloc[:, 5:7].sum(axis=1)
yarrp['YARRP 2006-2016'] = yarrp.iloc[:, 8:10].sum(axis=1)

yarrp_gr = yarrp[['postcode','YARRP <1975', 'YARRP 1976-1995', 'YARRP 1996-2005', 'YARRP 2006-2016']]
yarrp_gr.head(1)

Now join the three DFs together:

In [None]:
cald = citp.merge(yarrp_gr, left_on='postcode',right_on='postcode')
cald = cald.merge(ing, left_on='postcode',right_on='postcode')

In [None]:
# Remove 'NSW' in the index and cast postcode to int64
cald.reset_index(inplace=True)
cald['postcode'] = cald['postcode'].str.split(",").str.get(0)
cald['postcode'] = cald['postcode'].astype('int64')
cald = cald.set_index('postcode')
cald = cald.drop(columns='index', axis=1)

# Reset index for merging
cald.reset_index(inplace=True)
cald.head(1)

----

## III. Group Features into Two DataFrames


1. Cross-section features using <u>postcodes</u> as key: taken from the 2016 census, `incp_gr` (weekly income), `cprf` (household size). These features will be merged into **`features_postcode`**, and then be merged into the master data


2. Time-series features: ranging from 2011-2021, use <u>time period</u> as key (`yields_join` (bond yields), `rates_join` (interest rates), `df_cons_clean` (construction activities). These features will be merged into **`features_timePeriod`**

help with *merge()* https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

### III-2 `features_postcode`

Merge:
* `incp_gr` weekly income
* `cprf` household size)
* `age_gr`population by age group
* `cald` AU citizenship, year of arrival in AUS, and Indigenous status

In [None]:
print("number of postcodes in incp:", len(set(incp_gr['postcode'])))
print("number of postcodes in cprf:", len(set(cprf['postcode'])))
print("number of postcodes in age_gr:", len(set(age_gr['postcode'])))
print("number of postcodes in cald:", len(set(cald['postcode'])))

In [None]:
# merging
features_postcode = pd.merge(incp_gr, cprf, on='postcode')
features_postcode = pd.merge(features_postcode, age_gr, on='postcode')
features_postcode = pd.merge(features_postcode, cald, on='postcode')

In [None]:
print("Number of postcodes in features_postcode:",len(set(features_postcode['postcode'])))
features_postcode.head()

In [None]:
features_postcode.to_csv("Files/Cleaned/Features/Features_postcode_demo_census2016.csv")

### III-3 `features_timePeriod`
Merge`yields_join` (bond yields), `rates_join` (interest rates), `df_cons_clean` (construction activities)

In [None]:
print("number of time period in bond yields:", len(set(yields_join['time_period'])))
print("number of time period in interest rates:", len(set(rates_join['time_period'])))
print("number of time period in construction:", len(set(df_cons_clean['time_period'])))

We'll only keep the shared time periods in the three dataframes.

In [None]:
# Merge
features_timePeriod = pd.merge(yields_join, df_cons_clean, on='time_period')
features_timePeriod = pd.merge(features_timePeriod, rates_join, on='time_period')
features_timePeriod.head()

In [None]:
# Check the number of time period after merge
len(set(features_timePeriod['time_period']))

In [None]:
features_timePeriod['time_period'].unique()

Even though we started with more time periods and end up with less, this is sufficient to cover the time periods in the housing data we're interested in.

*Note: It is possible to do the above in a single line of code, though for clarity, I've left it like this. As seen in https://stackoverflow.com/questions/23668427/pandas-three-way-joining-multiple-dataframes-on-columns*

## IV Merge Features into the Master Housing Data (Stacked)
`master_mg1`

### IV-1 `features_postcode`$\xrightarrow{merge}$ master

In [None]:
# Import Master Housing DF

master = pd.read_csv("Files/Cleaned/Housing/Master_Sales_Rent_2017Q4_2021Q1.csv")
print(master.shape)
master.head(1)

In [None]:
print("Number of unique postcodes in features_postcode:", 
      features_postcode['postcode'].nunique())
print("Number of unique postcodes in the housing data", 
      master['postcode'].nunique())

In [None]:
master_mg1 = master.merge(features_postcode, left_on='postcode', right_on='postcode')
master_mg1.head(1)

In [None]:
print("master_merge1 shape:", master_mg1.shape)
print("Number of unique postcodes in master_merge1:", master_mg1['postcode'].nunique())

### IV-3 `features_timePeriod`$\xrightarrow{merge}$ master

In [None]:
features_timePeriod.head()

In [None]:
master_mg1 = master_mg1.merge(features_timePeriod, 
                              left_on='time_period', right_on='time_period')
master_mg1.head(1)

In [None]:
print("master_merge1 shape:", master_mg1.shape)
print("Time periods in master_merge1:\n", master_mg1.time_period.unique())

Saving as CSV

In [None]:
master_mg1.to_csv('Files/Cleaned/Postcode-based/Master_Sales_Rent_2017Q4_2021Q1_pcFeatures.csv', index=False)

In [None]:
master_mg1.head(1)

## VI Merge Features into the Master Housing Data (Unstacked Ver2)
**`unstacked_mg2`**

In [None]:
# Import unstacked Housing df2

unstacked2 = pd.read_csv("Files/Cleaned/Housing/Pivot_Sales_Rent_5Quarters_Imputed.csv")
print("Unstacked2 shape:", unstacked2.shape)
print("Number of postcodes in unstacked2:", unstacked2['postcode'].nunique())
unstacked2.head(1)

**`features_postcodes` $\xrightarrow{merge}$ unstacked2**

In [None]:
# Merge features_postcode in
unstacked_mg2= unstacked2.merge(features_postcode, left_on='postcode', right_on='postcode')
print("unstacked_mg2 shape:", unstacked_mg2.shape)
print("Number of postcodes in the unstacked_mg2:", unstacked_mg2['postcode'].nunique())

Again we lost 4 postcodes this time after merging.

**`features_timePeriods` $\xrightarrow{merge}$ master**

In [None]:
# Filter out relevant time periods (Q1 2020 - Q1 2021)

TP = ['2020 Q1','2020 Q2','2020 Q3','2020 Q4','2021 Q1']
features_timePeriod = features_timePeriod.loc[features_timePeriod.time_period.isin(TP)]
features_timePeriod

In [None]:
# Flat the features DF
features_TP_pvt = pd.pivot_table(features_timePeriod, 
                                 index=features_timePeriod.index,
                                 columns='time_period',
                                 values=['2yBonds%', '10yBonds%', 'constr_index', 'Rate'])

features_TP_pvt.columns = [' '.join(col) for col in features_TP_pvt.columns]
features_TP_pvt = features_TP_pvt.append(features_TP_pvt.sum(numeric_only=True), ignore_index=True)
features_TP_pvt = features_TP_pvt.iloc[[-1]]
features_TP_pvt.reset_index(inplace=True)
features_TP_pvt = features_TP_pvt.drop(columns='index', axis=1)
features_TP_pvt.round(2)

features_TP_pvt

In [None]:
# Convert the one-row DF into a dictionary 
tp_dict = features_TP_pvt.to_dict("record")[0]
type(tp_dict)


In [None]:
unstacked_mg2 = unstacked_mg2.assign(**tp_dict)

print("unstacked_mg2 shape:", unstacked_mg2.shape)
unstacked_mg2.head(1)

In [None]:
# Save as CSV
unstacked_mg2.to_csv('Files/Cleaned/Postcode-based/Unstacked_Sales_Rent_5Quarters_Imputed_pcFeatures.csv', index=False)