# Virus Tracking

The CDC tracks virus cases by state, where each state reports their current count to the CDC.
I want to see the number of cases in Texas as they grow over time, broken down by city if possible.

I have only found 1 website that offers anything like this kind of data, and it only shows the *current* cases:
https://dshs.texas.gov/news/updates.shtm

DSHS is updating their data source every day at noon, so if I *add* to my data source every day at 1pm (rather than replacing the current), I should be able to see change over time.

If you are curious, you can find the source for this report here:
* https://github.com/treytomes/covid-19-tracking/blob/master/Data%20Loader-arcgis.ipynb
* https://github.com/treytomes/covid-19-tracking/blob/master/Virus%20Tracking.ipynb
* https://github.com/treytomes/covid-19-tracking/blob/master/data.csv

As of around March 24th I've been getting my data from this site:
* https://txdshs.maps.arcgis.com/apps/opsdashboard/index.html#/ed483ecd702b4298ab01e8b9cafc8b83

It looks like the New York Times has a Github page with more extensive county-level data.  I plan on checking it against my own data before adopting it myself, but you can find that data here:
* https://github.com/nytimes/covid-19-data

In [56]:
# Step 0: Import all the things!

import requests
from bs4 import BeautifulSoup
import pandas as pd
import datetime as dt
import geopy
from geopy.extra.rate_limiter import RateLimiter
from IPython.display import HTML, Markdown, clear_output
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import urllib3

%run controls.ipynb

def datestamp(date):
    """Convert a datetime object into a date stamp: YYYYMMDD"""
    return f"{date.month}/{date.day}/{date.year}"

today = dt.datetime.today()
today_text = datestamp(today)

yesterday = today - dt.timedelta(days=1)
yesterday_text = datestamp(yesterday)

# Now that data loading is done in a separate script, this should be all I need to load the data.
df_num_cases = pd.read_csv("data.csv")
df_num_cases = df_num_cases.drop(["Unnamed: 0"], axis=1)

min_date_text = df_num_cases["date"].min()
max_date_text = df_num_cases["date"].max()

parts = min_date_text.split("/")
min_date = dt.datetime(int(parts[2]), int(parts[0]), int(parts[1]))

parts = max_date_text.split("/")
max_date = dt.datetime(int(parts[2]), int(parts[0]), int(parts[1]))

total_num_counties = 254

# It's useful to have today's and yesterday's cases in their own DataFrames.
cases_today = df_num_cases.loc[df_num_cases["date"] == today_text]
cases_yesterday = df_num_cases.loc[df_num_cases["date"] == yesterday_text]

In [57]:
num_counties_added = len(cases_today) - len(cases_yesterday)

yesterday_counties = [row[0] for row in cases_yesterday.values]
today_counties = [row[0] for row in cases_today.values]

if num_counties_added > 0:
    display(Markdown(f"# *{num_counties_added}* counties were added from {yesterday_text} to {today_text}."))
    new_counties = [county for county in today_counties if county not in yesterday_counties]
elif num_counties_added < 0:
    display(Markdown(f"# *{-num_counties_added}* counties were **removed** from {yesterday_text} to {today_text}."))
    new_counties = [county for county in yesterday_counties if county not in today_counties]

new_counties = pd.DataFrame(new_counties)
new_counties.columns = ["county"]
display(new_counties)

# *2* counties were added from 4/2/2020 to 4/3/2020.

Unnamed: 0,county
0,Lampasas
1,Titus


In [58]:
total_output = widgets.Output()

def fct(date=dt.datetime.today()):
    date_text = f"{date.month}/{date.day}/{date.year}"
    
    prevdate = date - dt.timedelta(days=1)
    prevdate_text = f"{prevdate.month}/{prevdate.day}/{prevdate.year}"
    
    date_total = df_num_cases.loc[df_num_cases["date"] == date_text]["num_cases"].astype('int32').sum()
    prevdate_total = df_num_cases.loc[df_num_cases["date"] == prevdate_text]["num_cases"].astype('int32').sum()
    
    delta_total = date_total - prevdate_total
    delta_total_text = "-" if delta_total < 0 else "+"
    delta_total_text += str(delta_total)
    
    num_counties_date = len(df_num_cases.loc[df_num_cases["date"] == date_text])
    
    with total_output:
        clear_output()
        display(Markdown(f"*{date_total}* cases found in Texas as of {date_text} ({delta_total_text} from yesterday)."))
        display(Markdown(f"{int((num_counties_date / total_num_counties) * 100)}% of Texas counties have confirmed cases."))

w=DatePicker(start=min_date_text,end=max_date_text,freq='D',fmt='%m/%d/%Y')
w.observe=fct
w.display()

display(total_output)

fct(max_date)

SelectionSlider(continuous_update=False, description='date', options=(('03/20/2020', Timestamp('2020-03-20 00:…

Output()

In [59]:
deltas = []
prevdate = min_date
currdate = prevdate + dt.timedelta(days=1)
while currdate <= max_date:
    currdate_text = datestamp(currdate)
    prevdate_text = datestamp(prevdate)
    
    currdate_total = df_num_cases.loc[df_num_cases["date"] == currdate_text]["num_cases"].astype('int32').sum()
    prevdate_total = df_num_cases.loc[df_num_cases["date"] == prevdate_text]["num_cases"].astype('int32').sum()
    
    deltas.append({
        "date": currdate_text,
        "cases_added": currdate_total - prevdate_total
    })
    
    prevdate = currdate
    currdate = prevdate + dt.timedelta(days=1)

df_deltas = pd.DataFrame(deltas)
display(df_deltas)

import altair as alt
alt.Chart(df_deltas, title="Cases Added per Day").mark_bar().encode(
    x=alt.X('date', bin=False),
    y='cases_added'
)

Unnamed: 0,date,cases_added
0,3/21/2020,110
1,3/22/2020,30
2,3/23/2020,18
3,3/24/2020,58
4,3/25/2020,564
5,3/26/2020,422
6,3/27/2020,335
7,3/28/2020,321
8,3/29/2020,500
9,3/30/2020,325


In [60]:
# Everybody loves a good chart.  This one should become more useful over time.

import altair as alt

totals = pd.DataFrame({'num_cases' : df_num_cases.astype({'num_cases': 'int'}).groupby("date")["num_cases"].sum()}).reset_index()

alt.Chart(totals, title="The Curve to Flatten").mark_line().encode(
    x='date:O',
    y='num_cases:Q'
).properties(
  width=800
).interactive(bind_y=False)

So far the graph isn't exactly exponential, but it's not exactly linear either.  We're flattening the curve, but the exact slope of that the flattened curve is still being negotiated.

In [61]:
import folium
from folium import plugins

def fct(date):
    county_map = folium.Map(
        location=[31.9686, -99.9018],
        tiles='cartodbpositron',
        zoom_start=6
    )
    
    date_text = f"{date.month}/{date.day}/{date.year}"
    
    df_cases_date = df_num_cases.loc[df_num_cases["date"] == date_text]
   
    # This bit will put circles on the map.
    #df_num_cases.loc[df_num_cases["date"] == date_text].apply(lambda row: folium.CircleMarker(location=[row["latitude"], row["longitude"]], radius = row["num_cases"]).add_to(county_map), axis=1)
    
    plugins.HeatMap(data=df_cases_date[['latitude', 'longitude', 'num_cases']].groupby(['latitude', 'longitude']).sum().reset_index().values.tolist(), radius=24, max_zoom=16).add_to(county_map)
    
    df_cases_date.apply(lambda row: folium.Marker(location=[row["latitude"], row["longitude"]],
                                                 icon=folium.DivIcon(html=f"<div style='font-size:1.3em'><b>{row['num_cases']}<b></div>")).add_to(county_map), axis=1)

    total_cases = sum([int(n) for n in df_num_cases.loc[df_num_cases["date"] == date_text]["num_cases"].to_list()])
    with map_output:
        clear_output()
        display(county_map)

w = DatePicker(start=min_date_text,end=max_date_text,freq='D',fmt='%m/%d/%Y')
w.observe = fct
w.display()

display(Markdown(f"This map shows the locations where virus has been found as of *{today_text}*."))

map_output = widgets.Output()
display(map_output)
fct(max_date)


SelectionSlider(continuous_update=False, description='date', options=(('03/20/2020', Timestamp('2020-03-20 00:…

This map shows the locations where virus has been found as of *4/3/2020*.

Output()

## One of the counties is showing up in Iowa.  Not sure which one.  I'll try to weed it out when the # cases for that region is more unique.

In [48]:
cases_delta = []
for row_today in cases_today.values:
    # Grab the num_cases from yesterday.
    row_yesterday = [n for n in cases_yesterday.values if n[0] == row_today[0]]
    if len(row_yesterday) == 0:
        delta_cases = row_today[2]
    else:
        row_yesterday = row_yesterday[0]
        delta_cases = int(row_today[2]) - row_yesterday[2]
    cases_delta.append([row_today[0], delta_cases, row_today[3], row_today[4]])

df_cases_delta = pd.DataFrame(cases_delta)
df_cases_delta.columns = ["county", "num_cases", "latitude", "longitude"]
df_cases_delta = df_cases_delta.loc[df_cases_delta["num_cases"] > 0]

county_map = folium.Map(
    location=[31.9686, -99.9018],
    tiles='cartodbpositron',
    zoom_start=6,
)

df_cases_delta.apply(lambda row: folium.CircleMarker(location=[row["latitude"], row["longitude"]], radius = row["num_cases"]).add_to(county_map), axis=1)

df_cases_delta.apply(lambda row: folium.Marker(location=[row["latitude"], row["longitude"]],
                                               icon=folium.DivIcon(html=f"<div style='font-size:1.3em'><b>{row['num_cases']}<b></div>")).add_to(county_map), axis=1)

display(Markdown("This map shows how much growth each area has had in the last 24 hours.  Circle size corresponds to the number of new cases."))
display(county_map)

This map shows how much growth each area has had in the last 24 hours.  Circle size corresponds to the number of new cases.

+216 in Houston alone on 3/29/20!

In [40]:
# TODO: It would be interesting to make a county map showing the average growth rate for each county in the last 7 days.

In [18]:
pd.options.display.max_rows = len(cases_today)
display(Markdown(f"# Cases were reported in each county as of {max_date_text}."))
display(cases_today)

# Cases were reported in each county as of 3/30/2020.

Unnamed: 0,county,date,num_cases,latitude,longitude
723,Harris,3/30/2020,526,29.811977,-95.374125
724,Dallas,3/30/2020,488,32.762041,-96.779007
725,Travis,3/30/2020,200,30.287857,-97.756139
726,Denton,3/30/2020,165,33.183879,-97.141342
727,Bexar,3/30/2020,157,29.426399,-98.510478
728,Tarrant,3/30/2020,139,32.751366,-97.335696
729,Collin,3/30/2020,134,33.160963,-96.606098
730,Fort Bend,3/30/2020,119,29.511218,-95.780735
731,Galveston,3/30/2020,70,29.387225,-94.992736
732,Montgomery,3/30/2020,66,30.301949,-95.506594
