In [2]:
import pandas as np
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import fidap
from config import api_key

# instantiate api connection
fidap = fidap.fidap_client(api_key=api_key)

### Social Determinants of Health  

This document aims to create a Minimum Viable Product that documents the constituent components of Social Determinants of Health (SDOH). SDOH is broadly conceptualized as comprised of measurements of:  
1. Crime Levels  
2. Educational Attainment  
3. Retail-Grocery Gap  
4. Environmental Factors  
5. Personal Infrastructure  
6. Climate Change  
7. Family  

What we are trying to achieve here is to approximate what the open-data that are used to derive SDOH might look like.  
  
#### Minimum Viable Product (MVP)  
  
This MVP proposed will likely be able to take into account all of the 6 of the 7 factors listed above. Climate Change and Environmental Factors will be amalgamated into one. For the purpose of this MVP, we will primarily gather data on the spatial scale of zip codes, relating to Chicago IL in 2018. [Chicago's zip codes](https://www.chicago.gov/content/dam/city/sites/covid/reports/2020-04-24/ChicagoCommunityAreaandZipcodeMap.pdf) also have the distinctive feature of starting with 606. In fact all zip-codes starting with 606 relate to Chicago.   
  
A lot of the data can be obtained at the zip-code level. Detail and data at the scale of census blocks might not be easily available all the time. It is therefore much preferable to aggregate at the scale of zip codes. Zip codes also have the benefit of being universally understood, as opposed to rather esoteric FIPs codes of Census blocks or Census tracts. However there are multiple problems with using zip codes, but we can address them in future iterations of this product. For the moment, the combined use of ZCTAs from the Census Bureau and Zip Codes are stable enough to provide an indicative MVP.   
  
#### Family  
  
We define family broadly as basic demographic information such as the breakdown of the population by race, age group, gender, and household information. 

In [5]:
age_structure_query = fidap.sql("""
SELECT geo_id AS zip, total_pop,male_pop,female_pop, median_age, male_under_5,male_5_to_9,male_10_to_14, (male_15_to_17+male_18_to_19) AS male_15_to_19, (male_20+male_21+male_22_to_24) AS male_20_to_24 ,male_25_to_29,male_30_to_34,male_35_to_39,male_40_to_44,male_45_to_49,male_50_to_54,male_55_to_59,(male_60_to_61+male_62_to_64) AS male_60_to_64, (male_65_to_66+male_67_to_69+male_70_to_74+male_75_to_79+male_80_to_84+male_85_and_over) AS male_65_and_over,female_under_5,female_5_to_9,female_10_to_14,(female_15_to_17+female_18_to_19) AS female_15_to_19, (female_20 + female_21 + female_22_to_24) AS female_20_to_24,female_25_to_29,female_30_to_34,female_35_to_39,female_40_to_44,female_45_to_49,female_50_to_54,female_55_to_59,(female_60_to_61+female_62_to_64) AS female_60_to_64,(female_65_to_66+female_67_to_69+female_70_to_74+female_75_to_79+female_80_to_84+female_85_and_over) AS female_65_and_over 
FROM bigquery-public-data.census_bureau_acs.zcta5_2018_5yr
WHERE geo_id LIKE '606%';
""")

In [6]:
family_structure_query = fidap.sql("""
SELECT geo_id AS zip, married_households, nonfamily_households, family_households 
FROM bigquery-public-data.census_bureau_acs.zcta5_2018_5yr
WHERE geo_id LIKE '606%';
""")

In [9]:
race_query = fidap.sql("""
SELECT total_pop, black_pop, asian_pop, hispanic_pop, amerindian_pop, other_race_pop, white_pop
FROM bigquery-public-data.census_bureau_acs.zcta5_2018_5yr
WHERE geo_id LIKE '606%';
""")

#### Education  
  
Education can be defined in terms of educational attainment of the population.  
  
Counting the number of educational establishments within each zip code is another way to do this, but it does not directly affect the population in its surrounding areas as they might not make use of them. Not everyone who lives around UChicago enjoys the benefit of a UChicago education. But is obviously more true at other levels of education such as K-12 as most children attend schools near their place of residence. Then, the availability of educational opportunities matter.    

In [10]:
educational_attainment_query = fidap.sql("""
SELECT geo_id AS zip, total_pop, associates_degree,bachelors_degree,high_school_diploma,less_one_year_college,masters_degree,one_year_more_college
FROM bigquery-public-data.census_bureau_acs.zcta5_2018_5yr
WHERE geo_id LIKE '606%';
""")

#### Retail Grocery Gap  
  
What we want to measure here is the availability of fresh food. We want to see whether the distribution of supermarkets in each zip code is equitable. To this end, we will first like to obtain a list of supermarkets in Chicago, IL, and then group them by zip code. 

In [15]:
supermarket_query = fidap.sql("""
WITH bounding_area AS (SELECT geometry FROM bigquery-public-data.geo_openstreetmap.planet_features_multipolygons
  WHERE ('name:en', 'Chicago') IN (SELECT(key, value) FROM UNNEST(all_tags))
  AND ('boundary', 'administrative') IN (SELECT(key, value) FROM UNNEST(all_tags))
  AND ('admin_level', '8') IN  (SELECT(key, value) FROM UNNEST(all_tags))
)
SELECT pt.geometry, tags.value AS tags, tags.key AS keys
FROM bigquery-public-data.geo_openstreetmap.planet_features_points AS pt, bounding_area
JOIN UNNEST(all_tags) AS tags
WHERE (tags.key = 'name' OR tags.key = 'addr:postcode')
AND ('shop', 'supermarket') IN (SELECT(key, value) FROM UNNEST(all_tags))
AND ST_WITHIN(pt.geometry, bounding_area.geometry)
""")