# Parsing US Census Age Data from NHGIS IPUMS

The following notebook parses census block-level data from the 2010 Decennial Census. Data is sourced from the [National Historic GIS](https://www.nhgis.org/) from the University of Minnesota. Age data is grouped into major age groups and a GEOID is created for joining with census block GIS data. The GEOID is 15-charater [FIPS code](https://www.census.gov/geo/reference/geoidentifiers.html)

In [21]:
#Import the Pandas library and rad create a dataframe from the csv file
import pandas as  pd
df = pd.read_csv('nhgis0040_csv/nhgis0040_ds172_2010_block.csv')

In [22]:
#Create lists of headers for the columns to be summed for each age group.
#The file seperates males and females, so those need to be added together.
UNDER18 = ['H760{0:02d}'.format(x) for x in range(3,7)] + ['H760{0:02d}'.format(x) for x in range(27,31)]
AGE18_64 = ['H760{0:02d}'.format(x) for x in range(7,20)] + ['H760{0:02d}'.format(x) for x in range(31,44)]
OVER64 = ['H760{0:02d}'.format(x) for x in range(20,26)] + ['H760{0:02d}'.format(x) for x in range(44,50)]

In [23]:
#Create new columns for each major age group.
df['UNDER18'] = df[UNDER18].sum(axis=1)
df['AGE18_64'] = df[AGE18_64].sum(axis=1)
df['OVER64'] = df[OVER64].sum(axis=1)

In [24]:
#Contacenate geography FIPS codes to generate a 15-digit FIPS that can be used to join on 
#Census block GIS data

df['GEOID'] = (df['STATEA'].apply(lambda x : '{0:02d}'.format(x)) +
               df['COUNTYA'].apply(lambda x : '{0:03d}'.format(x))+
               df['TRACTA'].apply(lambda x : '{0:05d}'.format(x)) +
               df['BLOCKA'].apply(lambda x : '{0:04d}'.format(x))
              )

In [25]:
#Create a simplified dataframe for export as a csv file
ages = df[['GEOID','UNDER18','AGE18_64','OVER64']]
ages.to_csv('nh_blocks_ages.csv',index=False)