# EPIC 4 - Become a Carer - T37.1 - Primary Carer, Age Carer, Time Spending, Disability Status
This Jupyter Notebook is about preparing 'Number of primary carers, by age and sex, 2018' XLS file for being used in Mo-Buddy Website Solution.
1. Read Raw Data
2. Clean Raw Data
3. Export Clean Data

- Table_29.1 - Carer status by sex, age
- Table_30.1 - Carer status, recipient, disability status, age, sex
- Table_31.1 - Carer status, by geographic location, age, sex
- Table_32.1 - 15-.. yo, carer status, sex 
- Table_33.1 - Employed 15-64 yo, carer status, sex 
- Table_34.1 - Primary Carer, recipient, age, sex
- Table_35.1 - Carer and recipient living in or other households, age
- Table_36.1 - Primary Carer, Time spending in care, sex  ****
- Table_37.1 - Primary Carer, age, time spending in care, disability status, ***** 
- Table_38.1 - Primary Carer, time spending in care, select recipint   ****
- Table_39.1 - Primary Carer, reason for taking a carer, sex   ****
- Table_40.1 - Primary Carer, reason for taking a carer, age of recipient ****
- Table_41.1 - Primary Carer, satisfaction of service recieved, sex, age  ****
- Table_42.1 - Primary Carer, social community participation with recipient, time spending in car, age  ****
- Table_43.1 - Primary Carer, social community participation without recipient, time spending in car, age  ****

In [1]:
# Import Packages
import pandas as pd
import itertools
# import re

In [2]:
# Set option to display all columns
pd.set_option('display.max_columns', None)

## 1. Read in Raw Data from a XLS file

In [3]:
# Function for reading in raw data from a XLS file
def read_in_data(file_path, sheet_name):
    """
    Function for reading in raw data from XLS file.
    Inputs: 
        - file_path, type: string, desc: XLS file path
        - sheet_name, type: string, desc: Sheet Name
    Outputs:
        - raw_data, type: dataframe, desc: Raw data
    """

    raw_data = pd.read_excel(io=file_path, sheet_name=sheet_name)
    
    return raw_data

In [4]:
# Read in data
filepath_raw_data = 'DataBases/44300do030.xls'
sheet_name = 'Table_37.1'
df_raw_carer_37 = read_in_data(filepath_raw_data, sheet_name)

In [5]:
# Check how the dataframe looks like
df_raw_carer_37

Unnamed: 0,Australian Bureau of Statistics,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,"44300DO030_2018 Disability, Ageing and Carers,...",,,,
1,Released at 11.30am (Canberra time) Thurs 24 O...,,,,
2,"Table 37.1 Primary carers, age of primary care...",,,,
3,,Average weekly hours spent caring,,,
4,,Less than 20 hours,20–39 hours,40 hours or more,Total
5,ESTIMATE ('000),,,,
6,Whether main recipient of care lives in same h...,,,,
7,Main recipient of care lives in same household,,,,
8,Primary carers aged 15-44 years,,,,
9,Recipient has profound core activity limitation,16,16.8,63.2,98


## 2. Clean up Raw Data

In [6]:
# Drop useless rows and columns
col_names = df_raw_carer_37.iloc[4,:]
df_carer_37 = df_raw_carer_37.copy()
df_carer_37.drop(labels=[0,1,2,5,12,17,22,27,29,30,31,32,33], axis=0, inplace=True)
df_carer_37.reset_index(drop=True, inplace=True)
df_carer_37.columns = col_names
df_carer_37.columns.names = ['']
df_carer_37

Unnamed: 0,NaN,Less than 20 hours,20–39 hours,40 hours or more,Total
0,,Average weekly hours spent caring,,,
1,,Less than 20 hours,20–39 hours,40 hours or more,Total
2,Whether main recipient of care lives in same h...,,,,
3,Main recipient of care lives in same household,,,,
4,Primary carers aged 15-44 years,,,,
5,Recipient has profound core activity limitation,16,16.8,63.2,98
6,Recipient has severe core activity limitation,42.5,14.9,22.3,83.9
7,All with reported disability,63.2,29.8,86,190.3
8,Primary carers aged 45-64 years,,,,
9,Recipient has profound core activity limitation,33.6,31.3,90.8,167.6


In [7]:
miss_idx = df_carer_37['Less than 20 hours'].isna()
missing_idx = df_carer_37.loc[miss_idx,:].index.values
levels_lis = list(df_carer_37.iloc[missing_idx,0])
levels_lis

['Whether main recipient of care lives in same household',
 'Main recipient of care lives in same household',
 'Primary carers aged 15-44 years',
 'Primary carers aged 45-64 years',
 'Primary carers aged 65 years and over',
 'All primary carers']

In [8]:
# Reset index and prepare for adding multiindex (in column)
df_carer_37.drop(labels=[0,1,2,3,4,8,12,16], axis=0, inplace=True)
df_carer_37.rename(columns={df_carer_37.columns[0]:'Index'}, inplace=True)
df_carer_37.set_index(keys='Index', drop=True, inplace=True)
df_carer_37

Unnamed: 0_level_0,Less than 20 hours,20–39 hours,40 hours or more,Total
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Recipient has profound core activity limitation,16.0,16.8,63.2,98.0
Recipient has severe core activity limitation,42.5,14.9,22.3,83.9
All with reported disability,63.2,29.8,86.0,190.3
Recipient has profound core activity limitation,33.6,31.3,90.8,167.6
Recipient has severe core activity limitation,50.3,23.9,26.1,107.2
All with reported disability,91.2,55.4,124.4,290.8
Recipient has profound core activity limitation,28.7,26.8,50.1,110.9
Recipient has severe core activity limitation,38.7,17.0,9.1,72.4
All with reported disability,75.9,45.9,65.1,199.3
Recipient has profound core activity limitation,75.9,74.4,206.8,377.7


In [9]:
# Adding multiIndex (in column)
first_level = ['Average weekly hours spent caring', 'Total']
second_level = list(col_names)[1:-1]
levels_1_1 = list(itertools.product([first_level[0]], second_level[0:3]))
levels_1_2 = list(itertools.product([first_level[1]], ['Total']))
levels = levels_1_1 + levels_1_2
df_carer_37.columns = pd.MultiIndex.from_tuples(levels, names=["Type", "Time"])
df_carer_37.reset_index(inplace=True)
df_carer_37

Type,Index,Average weekly hours spent caring,Average weekly hours spent caring,Average weekly hours spent caring,Total
Time,Unnamed: 1_level_1,Less than 20 hours,20–39 hours,40 hours or more,Total
0,Recipient has profound core activity limitation,16.0,16.8,63.2,98.0
1,Recipient has severe core activity limitation,42.5,14.9,22.3,83.9
2,All with reported disability,63.2,29.8,86.0,190.3
3,Recipient has profound core activity limitation,33.6,31.3,90.8,167.6
4,Recipient has severe core activity limitation,50.3,23.9,26.1,107.2
5,All with reported disability,91.2,55.4,124.4,290.8
6,Recipient has profound core activity limitation,28.7,26.8,50.1,110.9
7,Recipient has severe core activity limitation,38.7,17.0,9.1,72.4
8,All with reported disability,75.9,45.9,65.1,199.3
9,Recipient has profound core activity limitation,75.9,74.4,206.8,377.7


In [10]:
# Dropping useless rows
df_carer_37_1 =df_carer_37.copy()
# df_carer_36_1.drop(labels=[0] , axis=0, inplace=True)
# df_carer_36_1.reset_index(drop=True, inplace=True)
df_carer_37_1

Type,Index,Average weekly hours spent caring,Average weekly hours spent caring,Average weekly hours spent caring,Total
Time,Unnamed: 1_level_1,Less than 20 hours,20–39 hours,40 hours or more,Total
0,Recipient has profound core activity limitation,16.0,16.8,63.2,98.0
1,Recipient has severe core activity limitation,42.5,14.9,22.3,83.9
2,All with reported disability,63.2,29.8,86.0,190.3
3,Recipient has profound core activity limitation,33.6,31.3,90.8,167.6
4,Recipient has severe core activity limitation,50.3,23.9,26.1,107.2
5,All with reported disability,91.2,55.4,124.4,290.8
6,Recipient has profound core activity limitation,28.7,26.8,50.1,110.9
7,Recipient has severe core activity limitation,38.7,17.0,9.1,72.4
8,All with reported disability,75.9,45.9,65.1,199.3
9,Recipient has profound core activity limitation,75.9,74.4,206.8,377.7


In [11]:
levels_names = df_carer_37_1['Index'].unique()
levels_names

array(['Recipient has profound core activity limitation',
       'Recipient has severe core activity limitation',
       'All with reported disability',
       'Main recipient of care lives elsewhere'], dtype=object)

In [12]:
first_level = [levels_lis[1] ,levels_names[-1]]
first_level

['Main recipient of care lives in same household',
 'Main recipient of care lives elsewhere']

In [13]:
second_level = levels_lis[2:]
second_level 

['Primary carers aged 15-44 years',
 'Primary carers aged 45-64 years',
 'Primary carers aged 65 years and over',
 'All primary carers']

In [14]:
third_level = list(levels_names[0:-1])
third_level

['Recipient has profound core activity limitation',
 'Recipient has severe core activity limitation',
 'All with reported disability']

In [15]:
# Adding multiIndex (in index)
levels_1 = list(itertools.product([first_level[0]],second_level, third_level))
levels_2 = list(itertools.product([first_level[1]],[second_level[3]], [first_level[1]]))
levels_all = levels_1 + levels_2
multi_index = pd.MultiIndex.from_tuples(levels_all, names=['Place', 'Age', 'Disability Status'])
df_carer_37_1.set_index(keys=multi_index, drop=True, inplace=True)
df_carer_37_1.drop(columns='Index', axis=1, inplace=True)
df_carer_37_1

  obj = obj._drop_axis(labels, axis, level=level, errors=errors)


Unnamed: 0_level_0,Unnamed: 1_level_0,Type,Average weekly hours spent caring,Average weekly hours spent caring,Average weekly hours spent caring,Total
Unnamed: 0_level_1,Unnamed: 1_level_1,Time,Less than 20 hours,20–39 hours,40 hours or more,Total
Place,Age,Disability Status,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Main recipient of care lives in same household,Primary carers aged 15-44 years,Recipient has profound core activity limitation,16.0,16.8,63.2,98.0
Main recipient of care lives in same household,Primary carers aged 15-44 years,Recipient has severe core activity limitation,42.5,14.9,22.3,83.9
Main recipient of care lives in same household,Primary carers aged 15-44 years,All with reported disability,63.2,29.8,86.0,190.3
Main recipient of care lives in same household,Primary carers aged 45-64 years,Recipient has profound core activity limitation,33.6,31.3,90.8,167.6
Main recipient of care lives in same household,Primary carers aged 45-64 years,Recipient has severe core activity limitation,50.3,23.9,26.1,107.2
Main recipient of care lives in same household,Primary carers aged 45-64 years,All with reported disability,91.2,55.4,124.4,290.8
Main recipient of care lives in same household,Primary carers aged 65 years and over,Recipient has profound core activity limitation,28.7,26.8,50.1,110.9
Main recipient of care lives in same household,Primary carers aged 65 years and over,Recipient has severe core activity limitation,38.7,17.0,9.1,72.4
Main recipient of care lives in same household,Primary carers aged 65 years and over,All with reported disability,75.9,45.9,65.1,199.3
Main recipient of care lives in same household,All primary carers,Recipient has profound core activity limitation,75.9,74.4,206.8,377.7


## 3. Export Clean Data to a CSV file

In [16]:
# Export full version
df_carer_37_1.to_csv('Output\EPIC4_T37_1_V1.csv', index=True)