# Getting Started

---

This repo showcases how I use the `Great Tables` package in daily data science projects. I have been using the `gt` (and `gtExtras`) since May 2018 and I'm now transitioning to `Python` and will be using the `Great Tables` library, instead, for personal and professional tabulation needs.

In [119]:
# Load Required Libraries
import polars as pl 
import great_tables 
from great_tables import GT, style, loc, md, html
import polars.selectors as cs

In [110]:
# Import dataset
url = 'https://raw.githubusercontent.com/tongakuot/polars-orbit/main/data/ss_2008_census_data_raw.csv'
age_mapping = {
    "0 to 4": "0-14",
    "5 to 9": "0-14",
    "10 to 14": "0-14",
    "15 to 19": "15-24",
    "20 to 24": "15-24",
    "25 to 29": "25-34",
    "30 to 34": "25-34",
    "35 to 39": "35-44",
    "40 to 44": "35-44",
    "45 to 49": "45-54",
    "50 to 54": "45-54",
    "55 to 59": "55-64",
    "60 to 64": "55-64",
    "65+": "65 and above",
}
census = (
    pl.read_csv(url, null_values='NA')
    .select(
        ["Region Name", "Variable Name", "Age Name", "2008"]
    )
    .rename({
            "Region Name": "state",
            "Variable Name": "gender",
            "Age Name": "age_category",
            "2008": "population",
    })
    .with_columns(
        gender=pl.col("gender").str.split(" ").list.get(1),
        age_category=pl.col("age_category").replace(age_mapping),
    )
    .filter(
        (pl.col("gender") != "Total") & (pl.col("age_category") != "Total")
    )
    .group_by(['state', 'gender', 'age_category'])
    .agg(total=pl.col('population').sum())
    .sort('total', descending=True)
)

In [111]:
# Inspect the first few rows
print(census.head(5))

shape: (5, 4)
┌───────────────────┬────────┬──────────────┬────────┐
│ state             ┆ gender ┆ age_category ┆ total  │
│ ---               ┆ ---    ┆ ---          ┆ ---    │
│ str               ┆ str    ┆ str          ┆ i64    │
╞═══════════════════╪════════╪══════════════╪════════╡
│ Jonglei           ┆ Male   ┆ 0-14         ┆ 338443 │
│ Jonglei           ┆ Female ┆ 0-14         ┆ 263646 │
│ Central Equatoria ┆ Male   ┆ 0-14         ┆ 242247 │
│ Upper Nile        ┆ Male   ┆ 0-14         ┆ 237461 │
│ Warrap            ┆ Male   ┆ 0-14         ┆ 230854 │
└───────────────────┴────────┴──────────────┴────────┘


# Tabulating Data with the Great Tables Package

In [112]:
# Summarize data by state
state = (
    census
    .group_by(['state', 'gender'])
    .agg(total=pl.col('total').sum())
    .sort('total', descending=True)
    .pivot(index='state', columns='gender', values='total')
)

# Dispay output
print(state)

shape: (10, 3)
┌─────────────────────────┬────────┬────────┐
│ state                   ┆ Male   ┆ Female │
│ ---                     ┆ ---    ┆ ---    │
│ str                     ┆ i64    ┆ i64    │
╞═════════════════════════╪════════╪════════╡
│ Jonglei                 ┆ 734327 ┆ 624275 │
│ Central Equatoria       ┆ 581722 ┆ 521835 │
│ Upper Nile              ┆ 525430 ┆ 438923 │
│ Warrap                  ┆ 470734 ┆ 502194 │
│ Eastern Equatoria       ┆ 465187 ┆ 440974 │
│ Northern Bahr el Ghazal ┆ 348290 ┆ 372608 │
│ Lakes                   ┆ 365880 ┆ 329850 │
│ Western Equatoria       ┆ 318443 ┆ 300586 │
│ Unity                   ┆ 300247 ┆ 285554 │
│ Western Bahr el Ghazal  ┆ 177040 ┆ 156391 │
└─────────────────────────┴────────┴────────┘


In [128]:
# Tabulate state data frame
(
    GT(state, rowname_col='state')
    .tab_header(
        title='Population Distribution by State and Gender in South Sudan',
        subtitle=md(
            '**Insights from the 2008 Census:** <br> Jonglei State Leads with Over 13.6 Million Persons,<br> while Western Bahr el Ghazal State Has the Lowest Population'
        )
    )
    .fmt_integer(columns=cs.integer(), use_seps=True)
    .cols_align(align='center', columns=['Male', 'Female'])
    .opt_align_table_header(align="left")
    .cols_label(state='State')
    .tab_spanner(columns=cs.integer(), label='Population by Gender')
    .tab_source_note(source_note=html('<a href="http://southsudan.opendataforafrica.org/fvjqdpe/population-by-age-and-sex-2008-south-sudan"><strong>2008 Census: National Bureau of Statistics, South Sudan</strong></a>'))
)

Population Distribution by State and Gender in South Sudan,Population Distribution by State and Gender in South Sudan,Population Distribution by State and Gender in South Sudan
"Insights from the 2008 Census: Jonglei State Leads with Over 13.6 Million Persons,  while Western Bahr el Ghazal State Has the Lowest Population","Insights from the 2008 Census: Jonglei State Leads with Over 13.6 Million Persons,  while Western Bahr el Ghazal State Has the Lowest Population.1","Insights from the 2008 Census: Jonglei State Leads with Over 13.6 Million Persons,  while Western Bahr el Ghazal State Has the Lowest Population.2"
Jonglei,734327,624275
Central Equatoria,581722,521835
Upper Nile,525430,438923
Warrap,470734,502194
Eastern Equatoria,465187,440974
Northern Bahr el Ghazal,348290,372608
Lakes,365880,329850
Western Equatoria,318443,300586
Unity,300247,285554
Western Bahr el Ghazal,177040,156391


In [129]:
# Summarize data by state
state_by_gender_and_age = (
    census
    .group_by(['state', 'gender', 'age_category'])
    .agg(total=pl.col('total').sum())
    .sort('total', descending=True)
    .pivot(
        index=['state', 'gender'], 
        columns='age_category', 
        values='total', 
        aggregate_function='sum'
    )
)

# Dispay output
print(state_by_gender_and_age)

shape: (20, 9)
┌────────────────────────┬────────┬────────┬────────┬───┬───────┬───────┬───────┬──────────────┐
│ state                  ┆ gender ┆ 0-14   ┆ 15-24  ┆ … ┆ 35-44 ┆ 45-54 ┆ 55-64 ┆ 65 and above │
│ ---                    ┆ ---    ┆ ---    ┆ ---    ┆   ┆ ---   ┆ ---   ┆ ---   ┆ ---          │
│ str                    ┆ str    ┆ i64    ┆ i64    ┆   ┆ i64   ┆ i64   ┆ i64   ┆ i64          │
╞════════════════════════╪════════╪════════╪════════╪═══╪═══════╪═══════╪═══════╪══════════════╡
│ Jonglei                ┆ Male   ┆ 338443 ┆ 142786 ┆ … ┆ 66018 ┆ 44620 ┆ 24530 ┆ 22658        │
│ Jonglei                ┆ Female ┆ 263646 ┆ 125241 ┆ … ┆ 66971 ┆ 35955 ┆ 15724 ┆ 12384        │
│ Central Equatoria      ┆ Male   ┆ 242247 ┆ 124513 ┆ … ┆ 59775 ┆ 32567 ┆ 15704 ┆ 11409        │
│ Upper Nile             ┆ Male   ┆ 237461 ┆ 99908  ┆ … ┆ 52518 ┆ 31790 ┆ 16976 ┆ 15746        │
│ Warrap                 ┆ Male   ┆ 230854 ┆ 79293  ┆ … ┆ 45602 ┆ 28227 ┆ 13867 ┆ 12345        │
│ …            

In [130]:
(
    GT(state_by_gender_and_age, rowname_col='gender', groupname_col='state')
    .tab_header(
        title='Population Distribution by State, Gender, and Age in South Sudan',
        subtitle=md(
            '**Insights from the 2008 Census:** <br> Jonglei State Leads with Over 13.6 Million Persons,<br> while Western Bahr el Ghazal State Has the Lowest Population'
        )
    )
    .fmt_integer(columns=cs.integer(), use_seps=True)
    .cols_align(align='center', columns=cs.all())
    .opt_align_table_header(align="left")
    .cols_label(state='State')
    .tab_spanner(columns=cs.integer(), label='Age Group')
    .tab_source_note(source_note=html('<a href="http://southsudan.opendataforafrica.org/fvjqdpe/population-by-age-and-sex-2008-south-sudan"><strong>2008 Census: National Bureau of Statistics, South Sudan</strong></a>'))
)

"Population Distribution by State, Gender, and Age in South Sudan","Population Distribution by State, Gender, and Age in South Sudan","Population Distribution by State, Gender, and Age in South Sudan","Population Distribution by State, Gender, and Age in South Sudan","Population Distribution by State, Gender, and Age in South Sudan","Population Distribution by State, Gender, and Age in South Sudan","Population Distribution by State, Gender, and Age in South Sudan","Population Distribution by State, Gender, and Age in South Sudan"
"Insights from the 2008 Census: Jonglei State Leads with Over 13.6 Million Persons,  while Western Bahr el Ghazal State Has the Lowest Population","Insights from the 2008 Census: Jonglei State Leads with Over 13.6 Million Persons,  while Western Bahr el Ghazal State Has the Lowest Population.1","Insights from the 2008 Census: Jonglei State Leads with Over 13.6 Million Persons,  while Western Bahr el Ghazal State Has the Lowest Population.2","Insights from the 2008 Census: Jonglei State Leads with Over 13.6 Million Persons,  while Western Bahr el Ghazal State Has the Lowest Population.3","Insights from the 2008 Census: Jonglei State Leads with Over 13.6 Million Persons,  while Western Bahr el Ghazal State Has the Lowest Population.4","Insights from the 2008 Census: Jonglei State Leads with Over 13.6 Million Persons,  while Western Bahr el Ghazal State Has the Lowest Population.5","Insights from the 2008 Census: Jonglei State Leads with Over 13.6 Million Persons,  while Western Bahr el Ghazal State Has the Lowest Population.6","Insights from the 2008 Census: Jonglei State Leads with Over 13.6 Million Persons,  while Western Bahr el Ghazal State Has the Lowest Population.7"
Jonglei,Jonglei,Jonglei,Jonglei,Jonglei,Jonglei,Jonglei,Jonglei
Male,338443,142786,95272,66018,44620,24530,22658
Female,263646,125241,104354,66971,35955,15724,12384
Central Equatoria,Central Equatoria,Central Equatoria,Central Equatoria,Central Equatoria,Central Equatoria,Central Equatoria,Central Equatoria
Male,242247,124513,95507,59775,32567,15704,11409
Female,221216,115726,86092,50764,27182,12259,8596
Upper Nile,Upper Nile,Upper Nile,Upper Nile,Upper Nile,Upper Nile,Upper Nile,Upper Nile
Male,237461,99908,71031,52518,31790,16976,15746
Female,191018,86484,68857,46427,24277,11716,10144
Warrap,Warrap,Warrap,Warrap,Warrap,Warrap,Warrap,Warrap


In [131]:
# Reproducing great_tables example
# Import required libraries and illness dataset
from great_tables.data import illness

# Convert illness dataset into a polars dataframe (ensure that you've installed pyarrow)
illness_pl = pl.from_pandas(illness)

# Transform the dataset to include a nested or list column
illness_df = (
    illness_pl
    .join(
        illness_pl
        .select(
            'test', 
            values=pl.concat_str(
                pl.exclude('test', 'units'), 
                separator=' ', 
                ignore_nulls=True
            )
        ), 
        on='test'
    )
    .select(
        'test', 'units', 'day_3', 'day_4', 
        'day_5', 'day_6', 'day_7',
        'day_8', 'day_9', 'values'
    )
    .slice(1, 5)
)

# inspect output
illness_df.head()

test,units,day_3,day_4,day_5,day_6,day_7,day_8,day_9,values
str,str,f64,f64,f64,f64,f64,f64,f64,str
"""WBC""","""x10^9 / L""",5.26,4.26,9.92,10.49,24.77,30.26,19.03,"""5.26 4.26 9.92 10.49 24.77 30.…"
"""Neutrophils""","""x10^9 / L""",4.87,4.72,7.92,18.21,22.08,27.17,16.59,"""4.87 4.72 7.92 18.21 22.08 27.…"
"""RBC""","""x10^12 / L""",5.72,5.98,4.23,4.83,4.12,2.68,3.32,"""5.72 5.98 4.23 4.83 4.12 2.68 …"
"""Hb""","""g / L""",153.0,135.0,126.0,115.0,75.0,87.0,95.0,"""153.0 135.0 126.0 115.0 75.0 8…"
"""PLT""","""x10^9 / L""",67.0,38.6,27.4,26.2,74.1,36.2,25.6,"""67.0 38.6 27.4 26.2 74.1 36.2 …"


In [127]:
# Table the results
(
    GT(illness_df, rowname_col='test')
    # .cols_hide('units')
    .fmt_nanoplot(columns='values')
    .tab_header(md('My Reproduction of the Summary of Daily Tests<br>Performed on YF Patient'))
    .tab_stubhead(label=md('**Test**'))
    .cols_label(
        units='Units',
        day_3='3',
        day_4='4',
        day_5='5',
        day_6='6',
        day_7='7',
        day_8='8',
        day_9='9',
        values=md('*Progression* Trends')
    )
    .cols_align(columns=['units', 'values'], align='center')
    .tab_spanner(label='Day of Disease progression', columns=cs.starts_with('day'))
    .sub_missing()
    .tab_style(
        style=[
            style.fill(color='#F9E3D6'),
            style.text(style='italic')
        ],
        locations=loc.body(columns='units')
    )
    .tab_source_note(source_note='Measurements from Day 3 through to Day 9')
)

My Reproduction of the Summary of Daily Tests Performed on YF Patient,My Reproduction of the Summary of Daily Tests Performed on YF Patient.1,My Reproduction of the Summary of Daily Tests Performed on YF Patient.2,My Reproduction of the Summary of Daily Tests Performed on YF Patient.3,My Reproduction of the Summary of Daily Tests Performed on YF Patient.4,My Reproduction of the Summary of Daily Tests Performed on YF Patient.5,My Reproduction of the Summary of Daily Tests Performed on YF Patient.6,My Reproduction of the Summary of Daily Tests Performed on YF Patient.7,My Reproduction of the Summary of Daily Tests Performed on YF Patient.8,My Reproduction of the Summary of Daily Tests Performed on YF Patient.9
WBC,x10^9 / L,5.26,4.26,9.92,10.49,24.77,30.26,19.03,30.34.005.264.269.9210.524.830.319.04.0010.0
Neutrophils,x10^9 / L,4.87,4.72,7.92,18.21,22.08,27.17,16.59,27.22.004.874.727.9218.222.127.216.62.008.00
RBC,x10^12 / L,5.72,5.98,4.23,4.83,4.12,2.68,3.32,5.982.685.725.984.234.834.122.683.324.005.50
Hb,g / L,153.0,135.0,126.0,115.0,75.0,87.0,95.0,16075153135126115758795120160
PLT,x10^9 / L,67.0,38.6,27.4,26.2,74.1,36.2,25.6,30025.667.038.627.426.274.136.225.6100300
Test,Units,Day of Disease progression,Day of Disease progression,Day of Disease progression,Day of Disease progression,Day of Disease progression,Day of Disease progression,Day of Disease progression,Progression Trends
Test,Units,3,4,5,6,7,8,9,Progression Trends
Measurements from Day 3 through to Day 9,Measurements from Day 3 through to Day 9,Measurements from Day 3 through to Day 9,Measurements from Day 3 through to Day 9,Measurements from Day 3 through to Day 9,Measurements from Day 3 through to Day 9,Measurements from Day 3 through to Day 9,Measurements from Day 3 through to Day 9,Measurements from Day 3 through to Day 9,Measurements from Day 3 through to Day 9
