# Data Processing

Full data dictionary available [here](https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2018/cc-est2018-alldata.pdf)

<span style="float:left;">
    
|YEAR Code|Dates|
|:---|:---|
|1|4/1/2010 Census population|
|2|4/1/2010 population estimates base|
|3|7/1/2010 population estimate|
|4|7/1/2011 population estimate|
|5|7/1/2012 population estimate|
|6|7/1/2013 population estimate|
|7|7/1/2014 population estimate|
|8|7/1/2015 population estimate|
|9|7/1/2016 population estimate|
|10|7/1/2017 population estimate|
|11|7/1/2018 population estimate|

In [52]:
import pandas as pd
import numpy as np

### Read in data

In [53]:
df = pd.read_csv("../data/cc-est2018-alldata-36.csv")

In [54]:
df.head()

Unnamed: 0,SUMLEV,STATE,COUNTY,STNAME,CTYNAME,YEAR,AGEGRP,TOT_POP,TOT_MALE,TOT_FEMALE,...,HWAC_MALE,HWAC_FEMALE,HBAC_MALE,HBAC_FEMALE,HIAC_MALE,HIAC_FEMALE,HAAC_MALE,HAAC_FEMALE,HNAC_MALE,HNAC_FEMALE
0,50,36,1,New York,Albany County,1,0,304204,147076,157128,...,5236,5416,2125,2295,361,410,139,141,41,41
1,50,36,1,New York,Albany County,1,1,15286,7821,7465,...,488,514,288,277,44,40,18,12,5,3
2,50,36,1,New York,Albany County,1,2,16131,8224,7907,...,452,443,275,258,31,36,15,11,5,3
3,50,36,1,New York,Albany County,1,3,17639,9065,8574,...,448,435,230,204,33,28,13,12,2,1
4,50,36,1,New York,Albany County,1,4,23752,11925,11827,...,632,627,271,283,50,46,16,15,4,6


## Filter for Asian American Demographic

<span style="float:left;">

|Code|Demographic|
|:---|:---|
|AAC_MALE|Asian alone or in combination male population|
|AAC_FEMALE|Asian alone or in combination female population|

In [55]:
# Filter for Summary Level, State Name, County Code, County Name, Year, Age Group, Total Population, and Asian American Male, Asian American Female
df = df[["SUMLEV", "STNAME", "COUNTY", "CTYNAME", "YEAR", "AGEGRP", "TOT_POP", "AAC_MALE", "AAC_FEMALE", "NHAAC_MALE", "NHAAC_FEMALE"]]

In [56]:
df.shape

(12958, 11)

## Filter for Ages 65+

<span style="float:left;">
    
|AGEGRP Code|Age|
|:---|:---|
|0|Total|
|1|Age 0 to 4 years|
|2|Age 5 to 9 years|
|3|Age 10 to 14 years|
|4|Age 15 to 19 years|
|5|Age 20 to 24 years|
|6|Age 25 to 29 years|
|7|Age 30 to 34 years|
|8|Age 35 to 39 years|
|9|Age 40 to 44 years|
|10|Age 45 to 49 years|
|11|Age 50 to 54 years|
|12|Age 55 to 59 years|
|13|Age 60 to 64 years|
|14|Age 65 to 69 years|
|15|Age 70 to 74 years|
|16|Age 75 to 79 years|
|17|Age 80 to 84 years|
|18|Age 85 years or older|

In [57]:
# Ages 65+ are marked in AGEGRP Codes 14-18
df_seniors = df.loc[df["AGEGRP"] >= 14]
df_seniors.shape

(3410, 11)

In [58]:
# Combine Male and Female into category
df_seniors["AAC"] = df_seniors["AAC_MALE"] + df_seniors["AAC_FEMALE"] + df_seniors["NHAAC_MALE"] + df_seniors["NHAAC_FEMALE"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [59]:
df_seniors.head()

Unnamed: 0,SUMLEV,STNAME,COUNTY,CTYNAME,YEAR,AGEGRP,TOT_POP,AAC_MALE,AAC_FEMALE,NHAAC_MALE,NHAAC_FEMALE,AAC
14,50,New York,1,Albany County,1,14,11968,159,187,157,185,688
15,50,New York,1,Albany County,1,15,8676,124,137,123,134,518
16,50,New York,1,Albany County,1,16,7712,63,77,63,77,280
17,50,New York,1,Albany County,1,17,6858,36,41,35,40,152
18,50,New York,1,Albany County,1,18,7100,17,35,17,35,104


In [60]:
# Groupby Year Estimate
df_seniors.groupby('YEAR').sum()

Unnamed: 0_level_0,SUMLEV,COUNTY,AGEGRP,TOT_POP,AAC_MALE,AAC_FEMALE,NHAAC_MALE,NHAAC_FEMALE,AAC
YEAR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,15500,19220,4960,2617943,62624,75857,61505,74236,274222
2,15500,19220,4960,2617941,62624,75857,61505,74236,274222
3,15500,19220,4960,2628921,63592,77017,62420,75341,278370
4,15500,19220,4960,2672307,67526,81926,66254,80119,295825
5,15500,19220,4960,2767948,72395,87854,71047,85935,317231
6,15500,19220,4960,2845320,77860,94041,76395,92008,340304
7,15500,19220,4960,2923921,83578,100703,81992,98503,364776
8,15500,19220,4960,2998402,89668,107886,87970,105529,391053
9,15500,19220,4960,3073321,94998,114239,93164,111722,414123
10,15500,19220,4960,3143903,100464,120284,98559,117669,436976


## Filter for New York City

New York City belongs to the countires New York, Kings, Bronx, Richmond, Queens

In [61]:
nyc = ["Bronx County", "Queens County", "New York County", "Richmond County", "Kings County"]

In [62]:
df_nyc = df_seniors[df_seniors["CTYNAME"].isin(nyc)]

In [63]:
# Groupby Year Estimate
df_nyc.groupby('YEAR').sum()

Unnamed: 0_level_0,SUMLEV,COUNTY,AGEGRP,TOT_POP,AAC_MALE,AAC_FEMALE,NHAAC_MALE,NHAAC_FEMALE,AAC
YEAR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,1250,1395,400,993158,47426,58234,46512,56891,209063
2,1250,1395,400,993124,47426,58228,46512,56885,209051
3,1250,1395,400,998770,48061,58956,47074,57555,211646
4,1250,1395,400,1021168,50776,62474,49713,60955,223918
5,1250,1395,400,1059154,54248,66739,53129,65133,239249
6,1250,1395,400,1092676,58254,71284,57040,69586,256164
7,1250,1395,400,1126551,62512,76280,61196,74444,274432
8,1250,1395,400,1161518,67107,81669,65725,79712,294213
9,1250,1395,400,1192819,71081,86322,69597,84236,311236
10,1250,1395,400,1220395,75229,90916,73692,88762,328599


In [65]:
(346248-209063)/209063

0.6561897609811397