# Example Usage
- This notebook demonstrates how `table_builder_io` loads data for a series of extract formats.
- Only the CSV comma separated value export option from ABS Table builder is supported.



## Test Data Setup
- `table_builder_io` is tested against a (hopefully) representative set of test_files, which are truncations of standard outputs of ABS Table Builder

In [23]:
import sys
root = r"C:\Data\OtherRepositories\table_builder_io\test_data"
sys.path.append("..")
sys.path.append("../test")

from csv_test_cases import MULTILEVEL_ROWS, SPATIAL_X_ATTR1, OD_DATA1, COL_MULTIINDEX_DATA


## OD Data example
- Shown data is a subset of SA2 x SA2 home to work trip patterns

In [30]:
from table_builder_io import TableBuilderReader

In [31]:
# Raw CSV Data (stored as a string)
print(OD_DATA1)

Australian Bureau of Statistics

"2016 Census - Counting Employed Persons, Place of Work (POW)"
"OCCP - 1 Digit Level by SA2 (POW) by SA2 (UR)"
"Counting: Persons Aged 15 Years and Over Place of Work"

Filters:
"Default Summation","Persons Aged 15 Years and Over Place of Work"


"SA2 (UR)","Alexandra Hills","Belmont - Gumdale","Birkdale","Capalaba","Thorneside",
"SA2 (POW)",
"Brisbane City",25,19,34,35,8,
"Fortitude Valley",5,4,11,18,3,
"Wynnum West - Hemmant",90,30,78,78,19,
"Total",2308,703,1777,2558,436,


"Data Source: Census of Population and Housing, 2016, TableBuilder"

"INFO","Cells in this table have been randomly adjusted to avoid the release of confidential data. No reliance should be placed on small cells."


"Copyright Commonwealth of Australia, 2018, see abs.gov.au/copyright"
"ABS data licensed under Creative Commons, see abs.gov.au/ccby"



In [33]:
reader = TableBuilderReader.from_string(OD_DATA1)
# reader = TableBuilderReader.from_file(fpath) # Read from file on disk is the typical use case
df = reader.read()
display(df)

SA2 (UR),Alexandra Hills,Belmont - Gumdale,Birkdale,Capalaba,Thorneside
SA2 (POW),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Brisbane City,25,19,34,35,8
Fortitude Valley,5,4,11,18,3
Wynnum West - Hemmant,90,30,78,78,19
Total,2308,703,1777,2558,436


By default, `table_builder_io` loads the row data labels into the dataframe index. This is usually pretty handy. But sometimes (for example if you're updating an old workflow where you've removed headers / fixed columns manually) it may be preferable to load this as a column instead:

In [36]:
df2 = reader.read(as_index=False)
display(df2)
# Note in this case the SA2 (UR) label on df.columns is not kept around - there's no logical place to store that in this format
# (also note you can freely switch between the two formats using pandas directly, but there seemed value in directly supporting both)

Unnamed: 0,SA2 (POW),Alexandra Hills,Belmont - Gumdale,Birkdale,Capalaba,Thorneside
0,Brisbane City,25,19,34,35,8
1,Fortitude Valley,5,4,11,18,3
2,Wynnum West - Hemmant,90,30,78,78,19
3,Total,2308,703,1777,2558,436


## Multilevel rows example
- When you add two or more fields to the rows or columns in Table Builder, these are denoted in a ragged fashion, where there are blanks:
- In the example below, we have SEXP Sex, INCP Total Personal Income (weekly) and AGE10P - Age in Ten Year Groups as rows
- The row labels start out as (Male, Negative income, 0-9-years), and then continue (  ,  , 10-19 years), with Male and Negative income being implicit, until all age groups are exhausted
- `table_builder_io` supports reading this and filling the index as you would expect


In [37]:
print(MULTILEVEL_ROWS)

Australian Bureau of Statistics

"2016 Census - Counting Persons, Place of Usual Residence (MB)"
"SEXP Sex, INCP Total Personal Income (weekly) and AGE10P - Age in Ten Year Groups by Australia (UR)"
"Counting: Persons Place of Usual Residence"

Filters:
"Default Summation","Persons Place of Usual Residence"


,,"Australia (UR)","Australia","Total",
"SEXP Sex","INCP Total Personal Income (weekly)","AGE10P - Age in Ten Year Groups",
"Male","Negative income","0-9 years",999,999,
,,"10-19 years",999,999,
,,"20-29 years",999,999,
,,"30-39 years",999,999,
,,"80-89 years",999,999,
,,"90-99 years",999,999,
,,"100 years and over",999,999,
"Data Source: Census of Population and Housing, 2016, TableBuilder"

"INFO","Cells in this table have been randomly adjusted to avoid the release of confidential data. No reliance should be placed on small cells."


"Copyright Commonwealth of Australia, 2018, see abs.gov.au/copyright"
"ABS data licensed under Creative Commons, see abs.gov.au/ccby"



In [41]:
# can do this all in one line:
df = TableBuilderReader.from_string(MULTILEVEL_ROWS).read(as_index=True)
# as_index format is similar to the ragged index on disk
display(df)
print()
# reading with as_index=False shows the row labels repeated out for each record.
display(TableBuilderReader.from_string(MULTILEVEL_ROWS).read(as_index=False))

Unnamed: 0_level_0,Unnamed: 1_level_0,Australia (UR),Australia,Total
SEXP Sex,INCP Total Personal Income (weekly),AGE10P - Age in Ten Year Groups,Unnamed: 3_level_1,Unnamed: 4_level_1
Male,Negative income,0-9 years,999,999
Male,Negative income,10-19 years,999,999
Male,Negative income,20-29 years,999,999
Male,Negative income,30-39 years,999,999
Male,Negative income,80-89 years,999,999
Male,Negative income,90-99 years,999,999
Male,Negative income,100 years and over,999,999





Unnamed: 0,SEXP Sex,INCP Total Personal Income (weekly),AGE10P - Age in Ten Year Groups,Australia,Total
0,Male,Negative income,0-9 years,999,999
1,Male,Negative income,10-19 years,999,999
2,Male,Negative income,20-29 years,999,999
3,Male,Negative income,30-39 years,999,999
4,Male,Negative income,80-89 years,999,999
5,Male,Negative income,90-99 years,999,999
6,Male,Negative income,100 years and over,999,999


## Wafer Data Example
- Example data is SA2 x SA2 home to work, wafered by occupation (the subset of occupations corresponding to blue collar jobs)
- Below, we can see that the first wafer name is "Technicians and Trades Workers"
- `table_builder_io` supports reading wafer data

In [51]:
path= r"../test_data/sa2_pow_vs_sa2_ur_bne_bc_worker_total_wafer.csv"

In [58]:
with open(path, 'r') as f:
    print("".join(f.readlines()[:12]))

Australian Bureau of Statistics

"2016 Census - Counting Employed Persons, Place of Work (POW)"
"OCCP - 1 Digit Level by SA2 (POW) by SA2 (UR)"
"Counting: Persons Aged 15 Years and Over Place of Work"

Filters:
"Default Summation","Persons Aged 15 Years and Over Place of Work"

" Technicians and Trades Workers"
"SA2 (UR)","Alexandra Hills","Belmont - Gumdale","Birkdale","Capalaba","Thorneside","Wellington Point","Cleveland","Ormiston","Redland Bay","Sheldon - Mount Cotton","Thornlands","Victoria Point","Redland Islands","Brisbane Port - Lytton","Manly - Lota","Manly West","Murarrie","Tingalpa","Wakerley","Wynnum","Wynnum West - Hemmant","Bald Hills","Bridgeman Downs","Carseldine","Everton Park","McDowall","Aspley","Chermside","Chermside West","Geebung","Kedron - Gordon Park","Stafford","Stafford Heights","Wavell Heights","Boondall","Brisbane Airport","Eagle Farm - Pinkenba","Northgate - Virginia","Nudgee - Banyo","Nundah","Bracken Ridge","Brighton (Qld)","Deagon","Sandgate - Shorncliff

In [60]:
result = TableBuilderReader.from_file(path).read(as_index=True)
print(type(result))
print(result.keys())

<class 'dict'>
dict_keys(['Technicians and Trades Workers', 'Machinery Operators and Drivers', 'Labourers', 'Total'])


For wafer data, `TableBuilderReader.read` returns a dictionary of dataframes, where the keys are the wafer names. So for example, we can pull out the "Labourers" wafer:

In [62]:
df = result['Labourers']
display(df.head())

SA2 (UR),Alexandra Hills,Belmont - Gumdale,Birkdale,Capalaba,Thorneside,Wellington Point,Cleveland,Ormiston,Redland Bay,Sheldon - Mount Cotton,...,Wilston,Windsor,Wooloowin - Lutwyche,Ashgrove,Auchenflower,Bardon,Paddington - Milton,Red Hill (Qld),Toowong,Total
SA2 (POW),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Brisbane City,20,7,6,15,3,10,7,3,6,7,...,6,24,48,28,25,9,25,20,23,2959
Fortitude Valley,5,9,0,0,0,0,3,0,0,0,...,0,7,9,9,0,0,0,7,3,602
Highgate Hill,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,27
Kangaroo Point,0,5,0,0,4,0,0,0,0,4,...,0,0,7,0,0,0,0,0,0,242
New Farm,4,0,0,0,0,0,0,0,0,0,...,0,6,5,0,0,0,5,3,0,251
