The ERS dataset is composed of files that describe counties and states in different ways. We will consolidate this information into a single table describing a jurisdiction.

In [1]:
!bq --location=US mk --dataset ers_modeled

Dataset 'alert-result-266803:ers_modeled' successfully created.


In [1]:
%%bigquery
create table ers_modeled.Jurisdiction_Demography as
select e.FIPS_Code as fipscode, e.State as state, e.Area_name as jname, e.__High_School as sub_hs_count, e.High_School_Only as hs_count, 
e.Some_College as some_college_count, e.College as college_count, e.__High_School__ as sub_hs_pct, e.High_School_Only__	 as hs_pct,
e.Some_College__ as some_college_pct, e.College__ as college_pct, pop.Rural_urban_Continuum_Code_2013 as ruc, pop.Urban_Influence_Code_2013 as uic,
pop.Economic_typology_2015 as econ_type, pop.POP_ESTIMATE_2016 as pop_16, pop.POP_ESTIMATE_2018 as pop_18,
pop.N_POP_CHG_2016 as pop_chg_16, pop.N_POP_CHG_2018 as pop_chg_18, pop.INTERNATIONAL_MIG_2016 as int_mig_16,
pop.INTERNATIONAL_MIG_2016 as int_mig_18, pop.DOMESTIC_MIG_2016 as dom_mig_16, pop.DOMESTIC_MIG_2018 as dom_mig_18,
pop.NET_MIG_2016 as net_mig_16, pop.NET_MIG_2018 as net_mig_18, pop.R_INTERNATIONAL_MIG_2016 as int_mig_rate_16,
pop.R_INTERNATIONAL_MIG_2018 as int_mig_rate_18, pop.R_DOMESTIC_MIG_2016 as dom_mig_rate_16,
pop.R_DOMESTIC_MIG_2018 as dom_mig_rate_18, pop.R_NET_MIG_2016 as net_mig_rate_16, pop.R_NET_MIG_2018 as net_mig_rate_18,
u._Civilian_labor_force_2016_ as civ_labor_force_16, u.Civilian_labor_force_2018 as civ_labor_force_18,
u._Employed_2016_ as employed_count_16, u.Employed_2018 as employed_count_18, u._Unemployed_2016_ as unemployed_count_16,
u.Unemployed_2018 as unemployed_count_18, u.Unemployment_rate_2016 as unemployment_rate_16,
u.Unemployment_rate_2018 as unemployment_rate_18, u.Median_Household_Income_2018 as mhi_18, u.Med_HH_Income_Percent_of_State_Total_2018	as relative_mhi_18,
pov.POVALL_2018 as pov_count_18, pov.PCTPOVALL_2018 as pov_pct_18, pov.POV017_2018 as pov_minor_count_18, pov.PCTPOV017_2018 as pov_minor_pct_18, pov.POV517_2018 as pov_youth_count_18,
pov.PCTPOV517_2018 as pov_youth_pct_18, pov.POV04_2018 as pov_child_count_18, pov.PCTPOV04_2018 as pov_child_pct_18
from ers_staging.Education e
full join ers_staging.Population pop
on e.FIPS_Code = pop.FIPS
full join ers_staging.Unemployment u
on pop.FIPS = u.FIPS
full join ers_staging.Poverty pov
on u.FIPS = pov.FIPStxt


We use FIPS code as the primary key for this table. The FIPS code is a numeric equivalent for states and counties.

In [2]:
%%bigquery
select count(*)
from ers_modeled.Jurisdiction_Demography;

Unnamed: 0,f0_
0,3287


In [3]:
%%bigquery
select count(distinct fipscode)
from ers_modeled.Jurisdiction_Demography;

Unnamed: 0,f0_
0,3283


Unfortunately, there appears to be a small discrepancy between the expected and actual number of unique entries based on this key. We check for duplicates.

In [4]:
%%bigquery
select fipscode, count(*)
from ers_modeled.Jurisdiction_Demography
GROUP BY fipscode
HAVING count(*) > 1

Unnamed: 0,fipscode,f0_
0,,4


We then check to see which entries have a null fipscode

In [5]:
%%bigquery
select * from ers_modeled.Jurisdiction_Demography
where fipscode is null

Unnamed: 0,fipscode,state,jname,sub_hs_count,hs_count,some_college_count,college_count,sub_hs_pct,hs_pct,some_college_pct,...,mhi_18,relative_mhi_18,pov_count_18,pov_pct_18,pov_minor_count_18,pov_minor_pct_18,pov_youth_count_18,pov_youth_pct_18,pov_child_count_18,pov_child_pct_18
0,,,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,


For some reason, it appears a few empty rows were included in the staging tables, leading to their inclusion in the modeled table. Although the preview above is limited, we verified this was the case in BigQuery by checking each column. We chose to use full joins to retain information about national and state metrics, as well as to keep counties with incomplete statistics available. Hence, we will have to remove these entirely empty rows, as opposed to altering our joins.

In [6]:
%%bigquery
delete from ers_modeled.Jurisdiction_Demography
where fipscode is null

Now, we can check to see if our PK choice is valid.

In [7]:
%%bigquery
select count(*)
from ers_modeled.Jurisdiction_Demography;

Unnamed: 0,f0_
0,3283


In [8]:
%%bigquery
select count(distinct fipscode)
from ers_modeled.Jurisdiction_Demography;

Unnamed: 0,f0_
0,3283


Since these counts now agree, FIPS code is the verified primary key for this table.

Since this dataset coalesces into a single modeled table, we do not check for a FK relationship within this dataset. However, we do claim that there is a FK relationship between the Jurisdiction table in dataset1 and this one, which is only violated by inconsistent labeling of the Alaska jurisdictions. This relationship is the basis for many interesting queries, of which we describe a few in CROSS-DATASETS.txt.

### Transforms ###

We perform the following transform on the ers_modeled.Jurisdiction_Demography. The first operation adds an unemployment_change column, which gives the net change in umemployment between 2016 and 2018. We also calculate the number of adults in poverty in 2018 by subtracting the minor poverty count from the total poverty count. This statistic may prove useful as minors are ineligible to vote. We also calculate a population change between 2016 and 2018. We also create a new table in this dataset modeling population and year for each jurisdicton. Negative numbers imply a decrease from 2016 to 2018. We decided to perform this transform on the entire dataset using SQL.

Below, we perform the transform on the entire dataset.

In [21]:
%%bigquery 
CREATE TABLE ers_modeled.Jurisdiction_Demography_SQL_Final AS
SELECT state, jname, fipscode, sub_hs_count, hs_count, college_count, some_college_count, sub_hs_pct, hs_pct, some_college_pct, college_pct, ruc, uic, econ_type, (pop_16 - pop_18) as pop_16_18, pop_16, pop_18, pop_chg_16, pop_chg_18, int_mig_16, int_mig_18, dom_mig_16, dom_mig_18, net_mig_16, net_mig_18, int_mig_rate_16, int_mig_rate_18, dom_mig_rate_16, dom_mig_rate_18, net_mig_rate_16, net_mig_rate_18, civ_labor_force_16, civ_labor_force_18, employed_count_16, employed_count_18, unemployment_rate_16, unemployment_rate_18, unemployed_count_16, unemployed_count_18, (unemployed_count_16 - unemployed_count_18) as net_change_in_unemployment_16_18, pov_count_18, pov_minor_count_18, mhi_18, relative_mhi_18, (pov_count_18 - pov_minor_count_18) as pov_adult_count_18, pov_pct_18, pov_minor_pct_18, pov_youth_count_18, pov_youth_pct_18, pov_child_count_18, pov_child_pct_18 
FROM ers_modeled.Jurisdiction_Demography  
Order by state, jname


Executing query with job ID: c539f58f-95d7-47b8-b25b-c8aae4d1b98a
Query executing: 0.72s

Conflict: 409 GET https://www.googleapis.com/bigquery/v2/projects/alert-result-266803/queries/c539f58f-95d7-47b8-b25b-c8aae4d1b98a?timeoutMs=400&location=US&maxResults=0: Already Exists: Table alert-result-266803:ers_modeled.Jurisdiction_Demography_SQL_Final

### Key Verification ###

In [19]:
%%bigquery
SELECT COUNT(*)
FROM ers_modeled.Jurisdictions_Demographics_SQL_Final

Unnamed: 0,f0_
0,3283


In [20]:
%%bigquery
SELECT COUNT(distinct fipscode)
FROM ers_modeled.Jurisdictions_Demographics_SQL_Final

Unnamed: 0,f0_
0,3283


The number of counts in both queries match, indicating that there are no key violations. Here, fipscode represents our primary key