# EPIC 4 - Become a Carer - T36.1 - Primary Carer, Time Spending, Sex
This Jupyter Notebook is about preparing 'Number of primary carers, by age and sex, 2018' XLS file for being used in Mo-Buddy Website Solution.
1. Read Raw Data
2. Clean Raw Data
3. Export Clean Data

- Table_29.1 - Carer status by sex, age
- Table_30.1 - Carer status, recipient, disability status, age, sex
- Table_31.1 - Carer status, by geographic location, age, sex
- Table_32.1 - 15-.. yo, carer status, sex 
- Table_33.1 - Employed 15-64 yo, carer status, sex 
- Table_34.1 - Primary Carer, recipient, age, sex
- Table_35.1 - Carer and recipient living in or other households, age
- Table_36.1 - Primary Carer, Time spending in care, sex  ****
- Table_37.1 - Primary Carer, age, time spending in care, disability status, ***** 
- Table_38.1 - Primary Carer, time spending in care, select recipint   ****
- Table_39.1 - Primary Carer, reason for taking a carer, sex   ****
- Table_40.1 - Primary Carer, reason for taking a carer, age of recipient ****
- Table_41.1 - Primary Carer, satisfaction of service recieved, sex, age  ****
- Table_42.1 - Primary Carer, social community participation with recipient, time spending in car, age  ****
- Table_43.1 - Primary Carer, social community participation without recipient, time spending in car, age  ****

In [1]:
# Import Packages
import pandas as pd
import itertools
# import re

In [2]:
# Set option to display all columns
pd.set_option('display.max_columns', None)

## 1. Read in Raw Data from a XLS file

In [3]:
# Function for reading in raw data from a XLS file
def read_in_data(file_path, sheet_name):
    """
    Function for reading in raw data from XLS file.
    Inputs: 
        - file_path, type: string, desc: XLS file path
        - sheet_name, type: string, desc: Sheet Name
    Outputs:
        - raw_data, type: dataframe, desc: Raw data
    """

    raw_data = pd.read_excel(io=file_path, sheet_name=sheet_name)
    
    return raw_data

In [4]:
# Read in data
filepath_raw_data = 'DataBases/44300do030.xls'
sheet_name = 'Table_36.1'
df_raw_carer_36 = read_in_data(filepath_raw_data, sheet_name)

In [5]:
# Check how the dataframe looks like
df_raw_carer_36

Unnamed: 0,Australian Bureau of Statistics,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9
0,"44300DO030_2018 Disability, Ageing and Carers,...",,,,,,,,,
1,Released at 11.30am (Canberra time) Thurs 24 O...,,,,,,,,,
2,"Table 36.1 Primary carers, time spent caring f...",,,,,,,,,
3,,Primary Carers,,,,,,,,
4,,Average weekly hours spent caring,,,Years of care provided to main recipient of care,,,,,Total
5,Relationship to main recipient of care,Less than 20 hours,20–39 hours,40 hours or more,Less than 2,2–4,5–9,10–24,25 or more,
6,ESTIMATE ('000),,,,,,,,,
7,Male primary carers,,,,,,,,,
8,Partner of recipient,55.7,28.6,38.2,10.5,40.8,29.7,36.9,8.7,129.6
9,Child of recipient,35.8,10,16.7,7.8,23.2,18.8,9.6,2.8,62.7


## 2. Clean up Raw Data

In [6]:
# Drop useless rows and columns
col_names = df_raw_carer_36.iloc[5,:]
df_carer_36 = df_raw_carer_36.copy()
df_carer_36.drop(labels=[0,1,2,3,4,6,7,13,14,20,21,27,28,29], axis=0, inplace=True)
df_carer_36.reset_index(drop=True, inplace=True)
df_carer_36.columns = col_names
df_carer_36.columns.names = ['']
df_carer_36

Unnamed: 0,Relationship to main recipient of care,Less than 20 hours,20–39 hours,40 hours or more,Less than 2,2–4,5–9,10–24,25 or more,NaN
0,Relationship to main recipient of care,Less than 20 hours,20–39 hours,40 hours or more,Less than 2,2–4,5–9,10–24,25 or more,
1,Partner of recipient,55.7,28.6,38.2,10.5,40.8,29.7,36.9,8.7,129.6
2,Child of recipient,35.8,10,16.7,7.8,23.2,18.8,9.6,2.8,62.7
3,Parent of recipient,8.7,4.2,10.8,3.4,5.7,3.7,10.1,3.5,27.1
4,Other relationship to recipient,12.5,3.8,5.2,2.8,6.8,5.9,6.5,3,22.1
5,Total,112,47.2,72.4,28.1,74.8,54.8,62.3,16.4,241.9
6,Partner of recipient,68.2,39.6,66.2,19.7,36.2,41.1,62.7,19.9,185.7
7,Child of recipient,100.4,22.7,32.6,18.6,47.9,49.9,34.5,1.3,161.4
8,Parent of recipient,57.6,39.4,98,25.9,43.1,42.3,72.9,18.7,205.6
9,Other relationship to recipient,34.1,8,19.8,16.4,16.6,14,11.9,1.4,65.2


In [7]:
# Reset index and prepare for adding multiindex (in column)
df_carer_36.rename(columns={df_carer_36.columns[0]:'Index'}, inplace=True)
df_carer_36.set_index(keys='Index', drop=True, inplace=True)
df_carer_36

Unnamed: 0_level_0,Less than 20 hours,20–39 hours,40 hours or more,Less than 2,2–4,5–9,10–24,25 or more,NaN
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Relationship to main recipient of care,Less than 20 hours,20–39 hours,40 hours or more,Less than 2,2–4,5–9,10–24,25 or more,
Partner of recipient,55.7,28.6,38.2,10.5,40.8,29.7,36.9,8.7,129.6
Child of recipient,35.8,10,16.7,7.8,23.2,18.8,9.6,2.8,62.7
Parent of recipient,8.7,4.2,10.8,3.4,5.7,3.7,10.1,3.5,27.1
Other relationship to recipient,12.5,3.8,5.2,2.8,6.8,5.9,6.5,3,22.1
Total,112,47.2,72.4,28.1,74.8,54.8,62.3,16.4,241.9
Partner of recipient,68.2,39.6,66.2,19.7,36.2,41.1,62.7,19.9,185.7
Child of recipient,100.4,22.7,32.6,18.6,47.9,49.9,34.5,1.3,161.4
Parent of recipient,57.6,39.4,98,25.9,43.1,42.3,72.9,18.7,205.6
Other relationship to recipient,34.1,8,19.8,16.4,16.6,14,11.9,1.4,65.2


In [8]:
# Adding multiIndex (in column)
first_level = ['Average weekly hours spent caring', 'Years of care provided to main recipient of care', 'Total']
second_level = list(col_names)[1:-1]
levels_1_1 = list(itertools.product([first_level[0]], second_level[0:3]))
levels_1_2 = list(itertools.product([first_level[1]], second_level[3:8]))
levels_1_3 = list(itertools.product([first_level[2]], ['Total']))
levels = levels_1_1 + levels_1_2 + levels_1_3
df_carer_36.columns = pd.MultiIndex.from_tuples(levels, names=["Type", "Time"])
df_carer_36.reset_index(inplace=True)
df_carer_36

Type,Index,Average weekly hours spent caring,Average weekly hours spent caring,Average weekly hours spent caring,Years of care provided to main recipient of care,Years of care provided to main recipient of care,Years of care provided to main recipient of care,Years of care provided to main recipient of care,Years of care provided to main recipient of care,Total
Time,Unnamed: 1_level_1,Less than 20 hours,20–39 hours,40 hours or more,Less than 2,2–4,5–9,10–24,25 or more,Total
0,Relationship to main recipient of care,Less than 20 hours,20–39 hours,40 hours or more,Less than 2,2–4,5–9,10–24,25 or more,
1,Partner of recipient,55.7,28.6,38.2,10.5,40.8,29.7,36.9,8.7,129.6
2,Child of recipient,35.8,10,16.7,7.8,23.2,18.8,9.6,2.8,62.7
3,Parent of recipient,8.7,4.2,10.8,3.4,5.7,3.7,10.1,3.5,27.1
4,Other relationship to recipient,12.5,3.8,5.2,2.8,6.8,5.9,6.5,3,22.1
5,Total,112,47.2,72.4,28.1,74.8,54.8,62.3,16.4,241.9
6,Partner of recipient,68.2,39.6,66.2,19.7,36.2,41.1,62.7,19.9,185.7
7,Child of recipient,100.4,22.7,32.6,18.6,47.9,49.9,34.5,1.3,161.4
8,Parent of recipient,57.6,39.4,98,25.9,43.1,42.3,72.9,18.7,205.6
9,Other relationship to recipient,34.1,8,19.8,16.4,16.6,14,11.9,1.4,65.2


In [9]:
# Dropping useless rows
df_carer_36_1 =df_carer_36.copy()
df_carer_36_1.drop(labels=[0] , axis=0, inplace=True)
df_carer_36_1.reset_index(drop=True, inplace=True)
df_carer_36_1

Type,Index,Average weekly hours spent caring,Average weekly hours spent caring,Average weekly hours spent caring,Years of care provided to main recipient of care,Years of care provided to main recipient of care,Years of care provided to main recipient of care,Years of care provided to main recipient of care,Years of care provided to main recipient of care,Total
Time,Unnamed: 1_level_1,Less than 20 hours,20–39 hours,40 hours or more,Less than 2,2–4,5–9,10–24,25 or more,Total
0,Partner of recipient,55.7,28.6,38.2,10.5,40.8,29.7,36.9,8.7,129.6
1,Child of recipient,35.8,10.0,16.7,7.8,23.2,18.8,9.6,2.8,62.7
2,Parent of recipient,8.7,4.2,10.8,3.4,5.7,3.7,10.1,3.5,27.1
3,Other relationship to recipient,12.5,3.8,5.2,2.8,6.8,5.9,6.5,3.0,22.1
4,Total,112.0,47.2,72.4,28.1,74.8,54.8,62.3,16.4,241.9
5,Partner of recipient,68.2,39.6,66.2,19.7,36.2,41.1,62.7,19.9,185.7
6,Child of recipient,100.4,22.7,32.6,18.6,47.9,49.9,34.5,1.3,161.4
7,Parent of recipient,57.6,39.4,98.0,25.9,43.1,42.3,72.9,18.7,205.6
8,Other relationship to recipient,34.1,8.0,19.8,16.4,16.6,14.0,11.9,1.4,65.2
9,Total,262.5,109.6,213.8,82.6,140.0,148.4,185.1,43.3,618.8


In [10]:
levels_names = df_carer_36_1['Index'].unique()
levels_names

array(['Partner of recipient', 'Child of recipient',
       'Parent of recipient', 'Other relationship to recipient', 'Total'],
      dtype=object)

In [11]:
first_level = ['Male', 'Female', 'All persons']
first_level

['Male', 'Female', 'All persons']

In [12]:
second_level = levels_names
second_level

array(['Partner of recipient', 'Child of recipient',
       'Parent of recipient', 'Other relationship to recipient', 'Total'],
      dtype=object)

In [13]:
# Adding multiIndex (in index)
levels_all = list(itertools.product(first_level, second_level))
multi_index = pd.MultiIndex.from_tuples(levels_all, names=['Gender', 'Relationship'])
df_carer_36_1.set_index(keys=multi_index, drop=True, inplace=True)
df_carer_36_1.drop(columns='Index', axis=1, inplace=True)
df_carer_36_1

  obj = obj._drop_axis(labels, axis, level=level, errors=errors)


Unnamed: 0_level_0,Type,Average weekly hours spent caring,Average weekly hours spent caring,Average weekly hours spent caring,Years of care provided to main recipient of care,Years of care provided to main recipient of care,Years of care provided to main recipient of care,Years of care provided to main recipient of care,Years of care provided to main recipient of care,Total
Unnamed: 0_level_1,Time,Less than 20 hours,20–39 hours,40 hours or more,Less than 2,2–4,5–9,10–24,25 or more,Total
Gender,Relationship,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Male,Partner of recipient,55.7,28.6,38.2,10.5,40.8,29.7,36.9,8.7,129.6
Male,Child of recipient,35.8,10.0,16.7,7.8,23.2,18.8,9.6,2.8,62.7
Male,Parent of recipient,8.7,4.2,10.8,3.4,5.7,3.7,10.1,3.5,27.1
Male,Other relationship to recipient,12.5,3.8,5.2,2.8,6.8,5.9,6.5,3.0,22.1
Male,Total,112.0,47.2,72.4,28.1,74.8,54.8,62.3,16.4,241.9
Female,Partner of recipient,68.2,39.6,66.2,19.7,36.2,41.1,62.7,19.9,185.7
Female,Child of recipient,100.4,22.7,32.6,18.6,47.9,49.9,34.5,1.3,161.4
Female,Parent of recipient,57.6,39.4,98.0,25.9,43.1,42.3,72.9,18.7,205.6
Female,Other relationship to recipient,34.1,8.0,19.8,16.4,16.6,14.0,11.9,1.4,65.2
Female,Total,262.5,109.6,213.8,82.6,140.0,148.4,185.1,43.3,618.8


## 3. Export Clean Data to a CSV file

In [14]:
# Export full version
df_carer_36_1.to_csv('Output\EPIC4_T36_1_V1.csv', index=True)