# EPIC 4 - Become a Carer - T30.1 - Carer Status, Recipient, Disability Status, Sex, Age
This Jupyter Notebook is about preparing 'Number of primary carers, by age and sex, 2018' CSV file for being used in Mo-Buddy Website Solution.
1. Read Raw Data
2. Clean Raw Data
3. Export Clean Data

- Table_29.1 - Carer status by sex, age
- Table_30.1 - Carer status, recipient, disability status, age, sex
- Table_31.1 - Carer status, by geographic location, age, sex
- Table_32.1 - 15-.. yo, carer status, sex 
- Table_33.1 - Employed 15-64 yo, carer status, sex 
- Table_34.1 - Primary Carer, recipient, age, sex
- Table_35.1 - Carer and recipient living in or other households, age
- Table_36.1 - Primary Carer, Time spending in care, sex  ****
- Table_37.1 - Primary Carer, age, time spending in care, disability status, ***** 
- Table_38.1 - Primary Carer, time spending in care, select recipint   ****
- Table_39.1 - Primary Carer, reason for taking a carer, sex   ****
- Table_40.1 - Primary Carer, reason for taking a carer, age of recipient ****
- Table_41.1 - Primary Carer, satisfaction of service recieved, sex, age  ****
- Table_42.1 - Primary Carer, social community participation with recipient, time spending in car, age  ****
- Table_43.1 - Primary Carer, social community participation without recipient, time spending in car, age  ****

In [1]:
# Import Packages
import pandas as pd
import itertools
import re

In [2]:
# Set option to display all columns
pd.set_option('display.max_columns', None)

## 1. Read in Raw Data from a CSV file

In [3]:
# Function for reading in raw data from a XLS file
def read_in_data(file_path, sheet_name):
    """
    Function for reading in raw data from XLS file.
    Inputs: 
        - file_path, type: string, desc: XLS file path
        - sheet_name, type: string, desc: Sheet Name
    Outputs:
        - raw_data, type: dataframe, desc: Raw data
    """

    raw_data = pd.read_excel(io=file_path, sheet_name=sheet_name)
    
    return raw_data

In [4]:
# Read in data
filepath_raw_data = 'DataBases/44300do030.xls'
sheet_name = 'Table_30.1'
df_raw_carer_30 = read_in_data(filepath_raw_data, sheet_name)

In [5]:
# Check how the dataframe looks like
df_raw_carer_30

Unnamed: 0,Australian Bureau of Statistics,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15
0,"44300DO030_2018 Disability, Ageing and Carers,...",,,,,,,,,,,,,,,
1,Released at 11.30am (Canberra time) Thurs 24 O...,,,,,,,,,,,,,,,
2,"Table 30.1 All persons, living in households, ...",,,,,,,,,,,,,,,
3,,Males,,,,,Females,,,,,All persons,,,,
4,,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total
5,ESTIMATE ('000),,,,,,,,,,,,,,,
6,Persons aged 0-14 years,,,,,,,,,,,,,,,
7,All with reported disability,,3.7,3.7,226,230.1,,6.2,6.2,123,128.7,,9.3,9.3,345.9,357.5
8,No reported disability,,16.6,16.6,2147.9,2164,,22.8,22.8,2117.4,2138.5,,37.3,37.3,4265.2,4304.7
9,Total,,20.2,20.2,2372.5,2393.2,,27.3,27.3,2241.3,2267.2,,46.3,46.3,4612,4660.8


## 2. Clean up Raw Data

In [6]:
# Drop useless rows and columns
col_names = df_raw_carer_30.iloc[4,:]
df_carer_30 = df_raw_carer_30.copy()
df_carer_30.drop(labels=[0,1,2,43,44,45], axis=0, inplace=True)
df_carer_30.reset_index(drop=True, inplace=True)
df_carer_30.columns = col_names
df_carer_30.columns.names = ['']
df_carer_30

Unnamed: 0,NaN,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer.1,"Carer, but not a primary carer.1",Total carers.1,Not a carer.1,Total.1,Primary carer.2,"Carer, but not a primary carer.2",Total carers.2,Not a carer.2,Total.2
0,,Males,,,,,Females,,,,,All persons,,,,
1,,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total
2,ESTIMATE ('000),,,,,,,,,,,,,,,
3,Persons aged 0-14 years,,,,,,,,,,,,,,,
4,All with reported disability,,3.7,3.7,226,230.1,,6.2,6.2,123,128.7,,9.3,9.3,345.9,357.5
5,No reported disability,,16.6,16.6,2147.9,2164,,22.8,22.8,2117.4,2138.5,,37.3,37.3,4265.2,4304.7
6,Total,,20.2,20.2,2372.5,2393.2,,27.3,27.3,2241.3,2267.2,,46.3,46.3,4612,4660.8
7,,,,,,,,,,,,,,,,
8,Persons aged 15-64 years,,,,,,,,,,,,,,,
9,Profound core activity limitation,2.5,5.4,11.3,90.3,101.4,5.7,10.5,15.5,82.9,99.8,6.8,18.9,26,173.6,198.7


In [7]:
# Reset index and prepare for adding multiindex (in column)
df_carer_30.rename(columns={df_carer_30.columns[0]:'Index'}, inplace=True)
df_carer_30.set_index(keys='Index', drop=True, inplace=True)
df_carer_30

Unnamed: 0_level_0,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
,Males,,,,,Females,,,,,All persons,,,,
,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total
ESTIMATE ('000),,,,,,,,,,,,,,,
Persons aged 0-14 years,,,,,,,,,,,,,,,
All with reported disability,,3.7,3.7,226,230.1,,6.2,6.2,123,128.7,,9.3,9.3,345.9,357.5
No reported disability,,16.6,16.6,2147.9,2164,,22.8,22.8,2117.4,2138.5,,37.3,37.3,4265.2,4304.7
Total,,20.2,20.2,2372.5,2393.2,,27.3,27.3,2241.3,2267.2,,46.3,46.3,4612,4660.8
,,,,,,,,,,,,,,,
Persons aged 15-64 years,,,,,,,,,,,,,,,
Profound core activity limitation,2.5,5.4,11.3,90.3,101.4,5.7,10.5,15.5,82.9,99.8,6.8,18.9,26,173.6,198.7


In [8]:
# Adding multiIndex (in column)
first_level = ['Male', 'Female', 'All persons']
second_level = ['Primary carer', 'Carer, but not a primary carer', 'Total carers', 'Not a carer', 'Total']
levels = [first_level, second_level]
df_carer_30.columns = pd.MultiIndex.from_product(levels, names=["Gender", "Status"])
df_carer_30.reset_index(inplace=True)
df_carer_30

Gender,Index,Male,Male,Male,Male,Male,Female,Female,Female,Female,Female,All persons,All persons,All persons,All persons,All persons
Status,Unnamed: 1_level_1,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total
0,,Males,,,,,Females,,,,,All persons,,,,
1,,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total
2,ESTIMATE ('000),,,,,,,,,,,,,,,
3,Persons aged 0-14 years,,,,,,,,,,,,,,,
4,All with reported disability,,3.7,3.7,226,230.1,,6.2,6.2,123,128.7,,9.3,9.3,345.9,357.5
5,No reported disability,,16.6,16.6,2147.9,2164,,22.8,22.8,2117.4,2138.5,,37.3,37.3,4265.2,4304.7
6,Total,,20.2,20.2,2372.5,2393.2,,27.3,27.3,2241.3,2267.2,,46.3,46.3,4612,4660.8
7,,,,,,,,,,,,,,,,
8,Persons aged 15-64 years,,,,,,,,,,,,,,,
9,Profound core activity limitation,2.5,5.4,11.3,90.3,101.4,5.7,10.5,15.5,82.9,99.8,6.8,18.9,26,173.6,198.7


In [9]:
# Dropping useless rows
df_carer_30_1 =df_carer_30.copy()
df_carer_30_1.drop(labels=[0,1,2,7,18,29] , axis=0, inplace=True)
df_carer_30_1.reset_index(drop=True, inplace=True)
df_carer_30_1

Gender,Index,Male,Male,Male,Male,Male,Female,Female,Female,Female,Female,All persons,All persons,All persons,All persons,All persons
Status,Unnamed: 1_level_1,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total
0,Persons aged 0-14 years,,,,,,,,,,,,,,,
1,All with reported disability,,3.7,3.7,226.0,230.1,,6.2,6.2,123.0,128.7,,9.3,9.3,345.9,357.5
2,No reported disability,,16.6,16.6,2147.9,2164.0,,22.8,22.8,2117.4,2138.5,,37.3,37.3,4265.2,4304.7
3,Total,,20.2,20.2,2372.5,2393.2,,27.3,27.3,2241.3,2267.2,,46.3,46.3,4612.0,4660.8
4,Persons aged 15-64 years,,,,,,,,,,,,,,,
5,Profound core activity limitation,2.5,5.4,11.3,90.3,101.4,5.7,10.5,15.5,82.9,99.8,6.8,18.9,26.0,173.6,198.7
6,Severe core activity limitation,8.7,22.3,28.0,116.9,147.3,25.6,24.1,51.0,109.0,157.4,32.2,46.4,79.5,227.9,304.6
7,Moderate core activity limitation,6.2,21.1,27.3,106.2,135.9,26.7,25.1,53.0,115.6,167.0,33.1,47.4,79.6,219.0,301.9
8,Mild core activity limitation,19.1,51.9,70.0,253.4,324.4,43.5,54.3,98.2,241.9,341.9,63.1,105.2,170.1,495.8,662.7
9,Schooling or employment restriction,44.3,102.1,146.4,553.0,697.0,98.1,112.0,211.0,506.1,715.7,142.4,216.8,356.0,1059.5,1414.6


In [10]:
levels_names = df_carer_30_1['Index'].unique()
levels_names

array(['Persons aged 0-14 years', 'All with reported disability',
       'No reported disability', 'Total', 'Persons aged 15-64 years',
       'Profound core activity limitation',
       'Severe core activity limitation',
       'Moderate core activity limitation',
       'Mild core activity limitation',
       'Schooling or employment restriction',
       'All with specific limitations or restrictions',
       'Persons aged 65 years and over', 'All persons'], dtype=object)

In [11]:
first_level = sorted(levels_names)[6:9] + [levels_names[-1]]
first_level

['Persons aged 0-14 years',
 'Persons aged 15-64 years',
 'Persons aged 65 years and over',
 'All persons']

In [12]:
second_level = list(levels_names[5:11]) + list(levels_names[1:4])
second_level

['Profound core activity limitation',
 'Severe core activity limitation',
 'Moderate core activity limitation',
 'Mild core activity limitation',
 'Schooling or employment restriction',
 'All with specific limitations or restrictions',
 'All with reported disability',
 'No reported disability',
 'Total']

In [13]:
# Adding multiIndex (in index)
levels_1 = list(itertools.product([first_level[0]], second_level[6:]))
levels_2 = list(itertools.product(first_level[1:], second_level))
levels_all = levels_1 + levels_2
multi_index = pd.MultiIndex.from_tuples(levels_all, names=["Age", "Disability Status"])
df_carer_30_1.drop(labels=[0,4,14,24], axis=0,inplace=True)
df_carer_30_1.set_index(keys=multi_index, drop=True, inplace=True)
df_carer_30_1.drop(columns='Index', axis=1, level=0, inplace=True)
df_carer_30_1

Unnamed: 0_level_0,Gender,Male,Male,Male,Male,Male,Female,Female,Female,Female,Female,All persons,All persons,All persons,All persons,All persons
Unnamed: 0_level_1,Status,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total
Age,Disability Status,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
Persons aged 0-14 years,All with reported disability,,3.7,3.7,226.0,230.1,,6.2,6.2,123.0,128.7,,9.3,9.3,345.9,357.5
Persons aged 0-14 years,No reported disability,,16.6,16.6,2147.9,2164.0,,22.8,22.8,2117.4,2138.5,,37.3,37.3,4265.2,4304.7
Persons aged 0-14 years,Total,,20.2,20.2,2372.5,2393.2,,27.3,27.3,2241.3,2267.2,,46.3,46.3,4612.0,4660.8
Persons aged 15-64 years,Profound core activity limitation,2.5,5.4,11.3,90.3,101.4,5.7,10.5,15.5,82.9,99.8,6.8,18.9,26.0,173.6,198.7
Persons aged 15-64 years,Severe core activity limitation,8.7,22.3,28.0,116.9,147.3,25.6,24.1,51.0,109.0,157.4,32.2,46.4,79.5,227.9,304.6
Persons aged 15-64 years,Moderate core activity limitation,6.2,21.1,27.3,106.2,135.9,26.7,25.1,53.0,115.6,167.0,33.1,47.4,79.6,219.0,301.9
Persons aged 15-64 years,Mild core activity limitation,19.1,51.9,70.0,253.4,324.4,43.5,54.3,98.2,241.9,341.9,63.1,105.2,170.1,495.8,662.7
Persons aged 15-64 years,Schooling or employment restriction,44.3,102.1,146.4,553.0,697.0,98.1,112.0,211.0,506.1,715.7,142.4,216.8,356.0,1059.5,1414.6
Persons aged 15-64 years,All with specific limitations or restrictions,47.6,127.2,175.7,678.2,851.1,120.8,137.6,256.6,641.1,899.1,168.4,266.2,433.2,1319.3,1750.2
Persons aged 15-64 years,All with reported disability,55.2,144.6,202.5,801.5,1002.2,140.9,164.7,305.6,747.6,1054.9,196.6,308.8,507.6,1549.5,2057.5


In [14]:
# Inpute missing values
df_carer_30_1.loc[df_carer_30_1[('Male','Primary carer')].isna(), ('Male','Primary carer')] = 0
df_carer_30_1.loc[df_carer_30_1[('Female','Primary carer')].isna(), ('Female','Primary carer')] = 0
df_carer_30_1.loc[df_carer_30_1[('All persons','Primary carer')].isna(), ('All persons','Primary carer')] = 0
df_carer_30_1

Unnamed: 0_level_0,Gender,Male,Male,Male,Male,Male,Female,Female,Female,Female,Female,All persons,All persons,All persons,All persons,All persons
Unnamed: 0_level_1,Status,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total,Primary carer,"Carer, but not a primary carer",Total carers,Not a carer,Total
Age,Disability Status,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
Persons aged 0-14 years,All with reported disability,0.0,3.7,3.7,226.0,230.1,0.0,6.2,6.2,123.0,128.7,0.0,9.3,9.3,345.9,357.5
Persons aged 0-14 years,No reported disability,0.0,16.6,16.6,2147.9,2164.0,0.0,22.8,22.8,2117.4,2138.5,0.0,37.3,37.3,4265.2,4304.7
Persons aged 0-14 years,Total,0.0,20.2,20.2,2372.5,2393.2,0.0,27.3,27.3,2241.3,2267.2,0.0,46.3,46.3,4612.0,4660.8
Persons aged 15-64 years,Profound core activity limitation,2.5,5.4,11.3,90.3,101.4,5.7,10.5,15.5,82.9,99.8,6.8,18.9,26.0,173.6,198.7
Persons aged 15-64 years,Severe core activity limitation,8.7,22.3,28.0,116.9,147.3,25.6,24.1,51.0,109.0,157.4,32.2,46.4,79.5,227.9,304.6
Persons aged 15-64 years,Moderate core activity limitation,6.2,21.1,27.3,106.2,135.9,26.7,25.1,53.0,115.6,167.0,33.1,47.4,79.6,219.0,301.9
Persons aged 15-64 years,Mild core activity limitation,19.1,51.9,70.0,253.4,324.4,43.5,54.3,98.2,241.9,341.9,63.1,105.2,170.1,495.8,662.7
Persons aged 15-64 years,Schooling or employment restriction,44.3,102.1,146.4,553.0,697.0,98.1,112.0,211.0,506.1,715.7,142.4,216.8,356.0,1059.5,1414.6
Persons aged 15-64 years,All with specific limitations or restrictions,47.6,127.2,175.7,678.2,851.1,120.8,137.6,256.6,641.1,899.1,168.4,266.2,433.2,1319.3,1750.2
Persons aged 15-64 years,All with reported disability,55.2,144.6,202.5,801.5,1002.2,140.9,164.7,305.6,747.6,1054.9,196.6,308.8,507.6,1549.5,2057.5


## 3. Export Clean Data to a CSV file

In [27]:
# Export full version
df_carer_30_1.to_csv('Output\EPIC4_T30_1_V1.csv', index=True)