# EPIC 4 - Become a Carer - T41.1 - Primary Carer, Satisfaction, Age, Sex
This Jupyter Notebook is about preparing 'Number of primary carers, by age and sex, 2018' XLS file for being used in Mo-Buddy Website Solution.
1. Read Raw Data
2. Clean Raw Data
3. Export Clean Data

- Table_29.1 - Carer status by sex, age
- Table_30.1 - Carer status, recipient, disability status, age, sex
- Table_31.1 - Carer status, by geographic location, age, sex
- Table_32.1 - 15-.. yo, carer status, sex 
- Table_33.1 - Employed 15-64 yo, carer status, sex 
- Table_34.1 - Primary Carer, recipient, age, sex
- Table_35.1 - Carer and recipient living in or other households, age
- Table_36.1 - Primary Carer, Time spending in care, sex  ****
- Table_37.1 - Primary Carer, age, time spending in care, disability status, ***** 
- Table_38.1 - Primary Carer, time spending in care, select recipint   ****
- Table_39.1 - Primary Carer, reason for taking a carer, sex   ****
- Table_40.1 - Primary Carer, reason for taking a carer, age of recipient ****
- Table_41.1 - Primary Carer, satisfaction of service recieved, sex, age  ****
- Table_42.1 - Primary Carer, social community participation with recipient, time spending in car, age  ****
- Table_43.1 - Primary Carer, social community participation without recipient, time spending in car, age  ****

In [1]:
# Import Packages
import pandas as pd
import itertools
# import re

In [2]:
# Set option to display all columns
pd.set_option('display.max_columns', None)

## 1. Read in Raw Data from a XLS file

In [3]:
# Function for reading in raw data from a XLS file
def read_in_data(file_path, sheet_name):
    """
    Function for reading in raw data from XLS file.
    Inputs: 
        - file_path, type: string, desc: XLS file path
        - sheet_name, type: string, desc: Sheet Name
    Outputs:
        - raw_data, type: dataframe, desc: Raw data
    """

    raw_data = pd.read_excel(io=file_path, sheet_name=sheet_name)
    
    return raw_data

In [4]:
# Read in data
filepath_raw_data = 'DataBases/44300do030.xls'
sheet_name = 'Table_41.1'
df_raw_carer_41 = read_in_data(filepath_raw_data, sheet_name)

In [5]:
# Check how the dataframe looks like
df_raw_carer_41

Unnamed: 0,Australian Bureau of Statistics,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6
0,"44300DO030_2018 Disability, Ageing and Carers,...",,,,,,
1,Released at 11.30am (Canberra time) Thurs 24 O...,,,,,,
2,"Table 41.1 Primary carers, living in household...",,,,,,
3,,Sex,,,Age group,,
4,,Males,Females,All persons,15–44 years,45–64 years,65 years and over
5,PROPORTION OF PERSONS (%),,,,,,
6,Satisfaction with quality of assistance receiv...,,,,,,
7,Satisified,20.3,30.3,27.6,30.5,26.4,27.1
8,Neither satisfied or dissatisfied,4.3,8.7,7.2,9.4,7.6,4
9,Dissatisfied,2.4,4.1,3.8,3.9,3.7,2.7


## 2. Clean up Raw Data

In [6]:
# Drop useless rows and columns
col_names = df_raw_carer_41.iloc[4,:]
df_carer_41 = df_raw_carer_41.copy()
df_carer_41.drop(labels=[0,1,2,3,5,6,11,12,17,18,23,24,29,30,40,42,43,44], axis=0, inplace=True)
df_carer_41.reset_index(drop=True, inplace=True)
df_carer_41.columns = col_names
df_carer_41.columns.names = ['']
df_carer_41

Unnamed: 0,NaN,Males,Females,All persons,15–44 years,45–64 years,65 years and over
0,,Males,Females,All persons,15–44 years,45–64 years,65 years and over
1,Satisified,20.3,30.3,27.6,30.5,26.4,27.1
2,Neither satisfied or dissatisfied,4.3,8.7,7.2,9.4,7.6,4
3,Dissatisfied,2.4,4.1,3.8,3.9,3.7,2.7
4,Has not received assistance from organised ser...,72.6,57.1,61.2,55.4,61.6,66.5
5,Satisfied,36.7,35.9,36.1,32.2,35,42.7
6,Neither satisfied or dissatisfied,21.3,25.4,24.1,25.7,26.2,18.6
7,Dissatisfied,8.1,13.6,12.1,16.1,13.4,6.1
8,Don't know the range of services available,34.1,25.4,27.7,24.8,25.6,33.4
9,Used respite care in the last three months,4.1,7.7,6.2,3.9,7.8,6.5


In [7]:
# Reset index and prepare for adding multiindex (in column)
df_carer_41.rename(columns={df_carer_41.columns[0]:'Index'}, inplace=True)
df_carer_41.set_index(keys='Index', drop=True, inplace=True)
df_carer_41

Unnamed: 0_level_0,Males,Females,All persons,15–44 years,45–64 years,65 years and over
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
,Males,Females,All persons,15–44 years,45–64 years,65 years and over
Satisified,20.3,30.3,27.6,30.5,26.4,27.1
Neither satisfied or dissatisfied,4.3,8.7,7.2,9.4,7.6,4
Dissatisfied,2.4,4.1,3.8,3.9,3.7,2.7
Has not received assistance from organised services in the last six months,72.6,57.1,61.2,55.4,61.6,66.5
Satisfied,36.7,35.9,36.1,32.2,35,42.7
Neither satisfied or dissatisfied,21.3,25.4,24.1,25.7,26.2,18.6
Dissatisfied,8.1,13.6,12.1,16.1,13.4,6.1
Don't know the range of services available,34.1,25.4,27.7,24.8,25.6,33.4
Used respite care in the last three months,4.1,7.7,6.2,3.9,7.8,6.5


In [8]:
# Adding multiIndex (in column)
# first_level = ['Persons aged 0-64 years', 'Persons aged 65 years and over', 'All persons']
# second_level = ['Primary carer', 'Carer, but not a primary carer', 'Total carers', 'Not a carer', 'Total']
# levels = [first_level, second_level]
# df_carer_31.columns = pd.MultiIndex.from_product(levels, names=["Age", "Status"])
df_carer_41.reset_index(inplace=True)
df_carer_41

Unnamed: 0,Index,Males,Females,All persons,15–44 years,45–64 years,65 years and over
0,,Males,Females,All persons,15–44 years,45–64 years,65 years and over
1,Satisified,20.3,30.3,27.6,30.5,26.4,27.1
2,Neither satisfied or dissatisfied,4.3,8.7,7.2,9.4,7.6,4
3,Dissatisfied,2.4,4.1,3.8,3.9,3.7,2.7
4,Has not received assistance from organised ser...,72.6,57.1,61.2,55.4,61.6,66.5
5,Satisfied,36.7,35.9,36.1,32.2,35,42.7
6,Neither satisfied or dissatisfied,21.3,25.4,24.1,25.7,26.2,18.6
7,Dissatisfied,8.1,13.6,12.1,16.1,13.4,6.1
8,Don't know the range of services available,34.1,25.4,27.7,24.8,25.6,33.4
9,Used respite care in the last three months,4.1,7.7,6.2,3.9,7.8,6.5


In [9]:
# Dropping useless rows
df_carer_41_1 = df_carer_41.copy()
df_carer_41_1.drop(labels=[0] , axis=0, inplace=True)
df_carer_41_1.reset_index(drop=True, inplace=True)
df_carer_41_1

Unnamed: 0,Index,Males,Females,All persons,15–44 years,45–64 years,65 years and over
0,Satisified,20.3,30.3,27.6,30.5,26.4,27.1
1,Neither satisfied or dissatisfied,4.3,8.7,7.2,9.4,7.6,4.0
2,Dissatisfied,2.4,4.1,3.8,3.9,3.7,2.7
3,Has not received assistance from organised ser...,72.6,57.1,61.2,55.4,61.6,66.5
4,Satisfied,36.7,35.9,36.1,32.2,35.0,42.7
5,Neither satisfied or dissatisfied,21.3,25.4,24.1,25.7,26.2,18.6
6,Dissatisfied,8.1,13.6,12.1,16.1,13.4,6.1
7,Don't know the range of services available,34.1,25.4,27.7,24.8,25.6,33.4
8,Used respite care in the last three months,4.1,7.7,6.2,3.9,7.8,6.5
9,Used respite care but not in the last three mo...,5.2,5.9,5.7,4.9,6.4,4.7


In [10]:
levels_names = df_carer_41_1['Index'].unique()
levels_names

array(['Satisified', 'Neither satisfied or dissatisfied', 'Dissatisfied',
       'Has not received assistance from organised services in the last six months',
       'Satisfied', "Don't know the range of services available",
       'Used respite care in the last three months',
       'Used respite care but not in the last three months',
       'Total used respite care ', 'Never used respite care',
       'Previously used respite care and needs more',
       'Has never received respite care but needs it',
       'Total needing respite care ', 'Does not need respite care',
       'Does not need service',
       'Respite care services not available in area',
       'Available services not suited to needs',
       'No affordable services available',
       'Recipient does not want service',
       'Unaware of entitlements/services', 'Carer does not want service',
       'Other reason', 'Has used respite care',
       "Total number of persons ('000)"], dtype=object)

In [11]:
first_level = ['Satisfaction with quality of assistance received, in the last six months, to assist with caring role', 
               'Satisfaction with range of organised services available, to assist with caring role', 
               'Receipt of respect care for main recipient', 
               'Need of respite care for main recipient', 
               'Main reason has not used respite care', 
               'Total']
first_level

['Satisfaction with quality of assistance received, in the last six months, to assist with caring role',
 'Satisfaction with range of organised services available, to assist with caring role',
 'Receipt of respect care for main recipient',
 'Need of respite care for main recipient',
 'Main reason has not used respite care',
 'Total']

In [12]:
second_level = levels_names
second_level

array(['Satisified', 'Neither satisfied or dissatisfied', 'Dissatisfied',
       'Has not received assistance from organised services in the last six months',
       'Satisfied', "Don't know the range of services available",
       'Used respite care in the last three months',
       'Used respite care but not in the last three months',
       'Total used respite care ', 'Never used respite care',
       'Previously used respite care and needs more',
       'Has never received respite care but needs it',
       'Total needing respite care ', 'Does not need respite care',
       'Does not need service',
       'Respite care services not available in area',
       'Available services not suited to needs',
       'No affordable services available',
       'Recipient does not want service',
       'Unaware of entitlements/services', 'Carer does not want service',
       'Other reason', 'Has used respite care',
       "Total number of persons ('000)"], dtype=object)

In [13]:
# Adding multiIndex (in index)
levels_1 = list(itertools.product([first_level[0]],second_level[0:4]))
levels_2 = list(itertools.product([first_level[1]],list(second_level[0:3]) + [second_level[5]]))
levels_3 = list(itertools.product([first_level[2]],second_level[6:10]))
levels_4 = list(itertools.product([first_level[3]],second_level[10:14]))
levels_5 = list(itertools.product([first_level[4]],second_level[14:-1]))
levels_6 = list(itertools.product([first_level[5]],[second_level[-1]]))
levels_all = levels_1 + levels_2 + levels_3 + levels_4 + levels_5 + levels_6
# levels_all
multi_index = pd.MultiIndex.from_tuples(levels_all, names=['Type', 'Satisfaction'])
df_carer_41_1.set_index(keys=multi_index, drop=True, inplace=True)
df_carer_41_1.drop(columns='Index', axis=1, inplace=True)
df_carer_41_1

Unnamed: 0_level_0,Unnamed: 1_level_0,Males,Females,All persons,15–44 years,45–64 years,65 years and over
Type,Satisfaction,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
"Satisfaction with quality of assistance received, in the last six months, to assist with caring role",Satisified,20.3,30.3,27.6,30.5,26.4,27.1
"Satisfaction with quality of assistance received, in the last six months, to assist with caring role",Neither satisfied or dissatisfied,4.3,8.7,7.2,9.4,7.6,4.0
"Satisfaction with quality of assistance received, in the last six months, to assist with caring role",Dissatisfied,2.4,4.1,3.8,3.9,3.7,2.7
"Satisfaction with quality of assistance received, in the last six months, to assist with caring role",Has not received assistance from organised services in the last six months,72.6,57.1,61.2,55.4,61.6,66.5
"Satisfaction with range of organised services available, to assist with caring role",Satisified,36.7,35.9,36.1,32.2,35.0,42.7
"Satisfaction with range of organised services available, to assist with caring role",Neither satisfied or dissatisfied,21.3,25.4,24.1,25.7,26.2,18.6
"Satisfaction with range of organised services available, to assist with caring role",Dissatisfied,8.1,13.6,12.1,16.1,13.4,6.1
"Satisfaction with range of organised services available, to assist with caring role",Don't know the range of services available,34.1,25.4,27.7,24.8,25.6,33.4
Receipt of respect care for main recipient,Used respite care in the last three months,4.1,7.7,6.2,3.9,7.8,6.5
Receipt of respect care for main recipient,Used respite care but not in the last three months,5.2,5.9,5.7,4.9,6.4,4.7


## 3. Export Clean Data to a CSV file

In [14]:
# Export full version
df_carer_41_1.to_csv('Output\EPIC4_T41_1_V1.csv', index=True)