### Fidap Demo  
  
In this demo notebook, we want to answer the following questions:  
1) Which counties have the lowest overall Covid-19 mortality rates?  
2) Which states have the lowest overall Covid-19 infection rates?

In [2]:
from fidap import fidap_client
from config import api_key
import pandas as pd
import altair as alt

# instantiate api connection
fidap = fidap_client(api_key=api_key)

We can start by simply gathering information from all counties based on the NYT dataset. 

In [30]:
nyt_covid19 = fidap.sql("""
SELECT *
FROM bigquery-public-data.covid19_nyt.us_counties
WHERE date = CAST('2021-07-07' AS DATE)
ORDER BY confirmed_cases DESC
LIMIT 20;
""")

We can also select population numbers from the ACS survey. 

In [35]:
acs_population = fidap.sql("""
SELECT total_pop, geo_id
FROM bigquery-public-data.census_bureau_acs.county_2018_5yr
LIMIT 20;
""")

**County-Level**

Putting these two together, we can get the following query to get per-capita caseloads and number of deaths for every 10,000 residents in each county in the United States.

In [5]:
### SQL query to get infection rates 
infection_rate_query = fidap.sql("""
WITH covid_cases AS (
SELECT *
FROM bigquery-public-data.covid19_nyt.us_counties
WHERE date = CAST('2021-07-07' AS DATE)) 

SELECT acs.total_pop, c.county, c.state_name, c.confirmed_cases, c.deaths, (ROUND(10000*c.confirmed_cases/acs.total_pop, 2))  AS per_capita_county_infection_rate_10k, (ROUND(10000*c.deaths/acs.total_pop,2))  AS per_capita_county_death_rate_10k
FROM covid_cases AS c
INNER JOIN bigquery-public-data.census_bureau_acs.county_2018_5yr AS acs 
ON acs.geo_id = c.county_fips_code
ORDER BY per_capita_county_infection_rate_10k;
""")

In [16]:
infection_rate_query.head(n=10)

Unnamed: 0,total_pop,county,state_name,confirmed_cases,deaths,per_capita_county_infection_rate_10k,per_capita_county_death_rate_10k
0,71377,Kauai,Hawaii,361,2.0,50.58,0.28
1,2518,Haines Borough,Alaska,28,0.0,111.2,0.0
2,16473,San Juan,Washington,186,0.0,112.91,0.0
3,75,Kalawao,Hawaii,1,0.0,133.33,0.0
4,30856,Jefferson,Washington,448,4.0,145.19,1.3
5,197658,Hawaii,Hawaii,3253,55.0,164.58,2.78
6,74487,Clallam,Washington,1433,13.0,192.38,1.75
7,102,Loving,Texas,2,0.0,196.08,0.0
8,1061,Skagway Municipality,Alaska,21,0.0,197.93,0.0
9,2930,Sierra,California,58,0.0,197.95,0.0


**State-Level**

In [6]:
## get state numbers
state_infection = infection_rate_query.copy().groupby('state_name').agg(
    total_pop = ('total_pop', sum),
    total_deaths = ('deaths', sum),
    total_cases = ('confirmed_cases', sum)
).reset_index()

state_infection = state_infection.assign(
    total_deaths_per_capita = lambda x: 10000 * x.total_deaths/x.total_pop,
    total_infections_per_capita = lambda x: 10000 * x.total_cases/x.total_pop    
).sort_values(by = 'total_infections_per_capita')

In [22]:
state_infection.head(n = 10)

Unnamed: 0,state_name,total_pop,total_deaths,total_cases,total_deaths_per_capita,total_infections_per_capita
11,Hawaii,1422029,513.0,36314,3.607521,255.367507
46,Vermont,624977,256.0,24382,4.096151,390.126357
39,Puerto Rico,3386941,0.0,168033,0.0,496.120245
37,Oregon,4081943,2796.0,208310,6.849679,510.320698
19,Maine,1332813,858.0,68989,6.437512,517.619501
48,Washington,7294336,5956.0,452960,8.165239,620.974959
8,District of Columbia,684498,1141.0,49315,16.66915,720.45499
29,New Hampshire,1343622,1370.0,99120,10.19632,737.70748
20,Maryland,6003435,9717.0,462398,16.185734,770.222381
47,Virginia,8413774,11402.0,679917,13.551588,808.099909


### Analysis  
  
Based on our data, we can actually go a little further and look at some things like the distribution of deaths and infections per capita.

In [15]:
deaths_hist = alt.Chart(
    state_infection,
    title = "Deaths per Capita Across the US").mark_bar(
    color = '#80b1d3').encode(
    x = alt.X("total_deaths_per_capita:Q", 
              bin = True,
              axis = alt.Axis(title = "Deaths per 10,000 residents")),
    y = alt.Y('count()',
              title = "No. of States")
)
deaths_hist_mean = alt.Chart(state_infection).mark_rule(color = '#fb8072').encode(
    x = alt.X('mean(total_deaths_per_capita):Q',
              title = ""),
    size = alt.value(3)
)
deaths_hist + deaths_hist_mean

In [14]:
infections_hist = alt.Chart(
    state_infection,
    title = "Infections per Capita Across the US").mark_bar(
    color = '#80b1d3').encode(
    x = alt.X("total_infections_per_capita:Q", 
              bin = True,
              axis = alt.Axis(title = "Infections per 10,000 residents")),
    y = alt.Y('count()',
             title = "No. of States")
)
infections_hist_mean = alt.Chart(state_infection).mark_rule(color = '#fb8072').encode(
    x = alt.X('mean(total_infections_per_capita):Q',
              title = ""),
    size = alt.value(3)
)
infections_hist + infections_hist_mean

Logically, we would expect states with high levels of infections per capita to also have higher death rates. 

In [17]:
alt.Chart(
    state_infection,
    title = "Infection and Deaths Rates Across the US").mark_circle(
    color = "#fdb462").encode(
    x = alt.X("total_infections_per_capita",
             title = "Infections per 10,000 residents"),
    y = alt.Y("total_deaths_per_capita",
              title = "Deaths per 10,000 residents"),
    tooltip = ['state_name',"total_infections_per_capita", "total_deaths_per_capita"]).interactive()

I find it very fascinating that the epicenters of the outbreak in the US, such as Los Angeles and New York City, are in states that have not notched up particularly high per-capita infection and death rates. Instead, we have states like North and South Dakota, Arizona, and New Jersey with astronomical death and infection rates.  
  
Recently, there have been a few articles in the NYT that talked about the divergence in vaccinations rates in the US. Can we see the effect of this borne out in the case numbers?   