### Experiemental code for interpolating the median age from grouped age category data, via the [Los Angeles Times Data Desk](https://github.com/datadesk). 

### For more information, see: https://github.com/datadesk/census-data-aggregator

In [1]:
import math

In [2]:
def approximate_median(range_list):
    """
    Returns the estimated median from a set of ranged totals.
    Useful for generated medians for measures like median household income and median age when aggregating census geographies.
    Expects a list of dictionaries with three keys:
        min: The minimum value in the range
        max: The maximum value in the range
        n: The number of people, households or other universe figure in the range
    """
    # Sort the list
    range_list.sort(key=lambda x: x['min'])

    # What is the total number of observations in the universe?
    n = sum([d['n'] for d in range_list])

    # What is the midpoint of the universe?
    midpoint = n / 2.0

    # For each range calculate its min and max value along the universe's scale
    cumulative_n = 0
    for range_ in range_list:
        range_['n_min'] = cumulative_n
        cumulative_n += range_['n']
        range_['n_max'] = cumulative_n

    # Now use those to determine which group contains the midpoint.
    try:
        midpoint_range = next(d for d in range_list if midpoint >= d['n_min'] and midpoint <= d['n_max'])
    except StopIteration:
        raise StopIteration("The midpoint of the total does not fall within a data range.")

    # How many households in the midrange are needed to reach the midpoint?
    midrange_gap = midpoint - midpoint_range['n_min']

    # What is the proportion of the group that would be needed to get the midpoint?
    midrange_gap_percent = midrange_gap / midpoint_range['n']

    # Apply this proportion to the width of the midrange
    midrange_gap_adjusted = (midpoint_range['max'] - midpoint_range['min']) * midrange_gap_percent

    # Estimate the median
    estimated_median = midpoint_range['min'] + midrange_gap_adjusted

    # Return the result
    return estimated_median

### Estimate the median age of D.C.'s 2018 population, by race

In [3]:
import pandas as pd

In [4]:
dc = pd.read_csv('dc_age2018.csv')

In [5]:
for i in dc['race_cat'].unique():
    print(i)
    dc_race = dc.loc[dc['race_cat'] == i][['min', 'max', 'n']].to_dict('records')
    print(round(approximate_median(dc_race)))

White
32.0
Black
38.0
MultipleOther
26.0
Asian
32.0
Hispanic
31.0
