# Virus Tracking

The CDC tracks virus cases by state, where each state reports their current count to the CDC.
I want to see the number of cases in Texas as they grow over time, broken down by city if possible.

I have only found 1 website that offers anything like this kind of data, and it only shows the "current" cases:
https://dshs.texas.gov/news/updates.shtm

DSHS is updating their data source every day at noon, so if I *add* to my data source every day at 1pm (rather than replacing the current), I should be able to see change over time.

In [70]:
import requests
from bs4 import BeautifulSoup

response = requests.get("https://dshs.texas.gov/news/updates.shtm", verify=False) # The state's SSL certificate is expired...
source = BeautifulSoup(response.text, "html.parser")



This part loads the current day's county data from the DSHS website.

In [81]:
import pandas as pd
import datetime as dt

tables = source.find_all("table")
county_table = [t for t in tables if t.has_attr("summary") and t.attrs["summary"] == "COVID-19 Cases in Texas Counties"][0]

row_groups = [tr.find_all("td") for tr in county_table.find_all("tr")][1:]

today = dt.datetime.today()
today_text = f"{today.month}/{today.day}/{today.year}"

num_cases = [{
    "county": td[0].text,
    "date": today_text,
    "num_cases": td[1].text
} for td in row_groups]

df_num_cases = pd.DataFrame(num_cases)

Next we're going to add the latitude and longitude of each county to the grid.

In [82]:
import geopy
from geopy.extra.rate_limiter import RateLimiter

locator = geopy.Nominatim(user_agent="myGeocoder")
geocode = RateLimiter(locator.geocode, min_delay_seconds=0.1)

df_num_cases["point"] = (df_num_cases["county"] + ", TX").apply(geocode)

df_num_cases[['latitude', 'longitude']] = pd.DataFrame([
    (p.latitude, p.longitude) if p != None else (28.082612, -94.936773) # list(locator.geocode("Gulf of Mexico"))[1]
    for p in df_num_cases["point"].tolist()])
df_num_cases = df_num_cases.drop(["point"], axis=1)


In [83]:
# Need to add today's data to yesterday's data.  Also need to make sure I only do this *once* per day...

#df_num_cases = pd.concat([pd.read_csv("data.csv"), df_num_cases])
display(df_num_cases)

Unnamed: 0.1,Unnamed: 0,county,date,num_cases,latitude,longitude
0,0.0,Bell,3/20/2020,2,31.008166,-97.431441
1,1.0,Bexar,3/20/2020,12,29.426399,-98.510478
2,2.0,Bowie,3/20/2020,1,33.419889,-94.447963
3,3.0,Brazoria,3/20/2020,3,29.181610,-95.499337
4,4.0,Brazos,3/20/2020,2,30.652157,-96.381114
...,...,...,...,...,...,...
36,,Travis,3/21/2020,22,30.287857,-97.756139
37,,Webb,3/21/2020,2,27.698362,-99.252358
38,,Wichita,3/21/2020,1,33.951653,-98.708889
39,,Williamson,3/21/2020,8,30.658093,-97.604165


In [93]:
from IPython.display import HTML, Markdown
# Only add up *today's* cases.  It's still 50% more than yesterday.
total_cases = sum([int(n) for n in df_num_cases.loc[df_num_cases["date"] == today_text]["num_cases"].to_list()])
display(HTML(f"<h3><b>{total_cases}</b> cases found in Texas as of {today.month}/{today.day}/{today.year}.</h3>"))

In [96]:
Markdown(f"This map shows the locations where virus has been found, with the size of the circle corresponding to the number of cases as of *{today_text}*.")

This map shows the locations where virus has been found, with the size of the circle corresponding to the number of cases as of *3/21/2020*.

In [98]:
import folium

county_map = folium.Map(
    location=[31.9686, -99.9018],
    tiles='cartodbpositron',
    zoom_start=6,
)

today_text = f"{today.month}/{today.day}/{today.year}"
df_num_cases.loc[df_num_cases["date"] == today_text].apply(lambda row: folium.CircleMarker(location=[row["latitude"], row["longitude"]], radius = row["num_cases"]).add_to(county_map), axis=1)

county_map

* Note: That circle in the Gulf of Mexico represents the cases that are "Pending County Assignment".

The next map shows how much growth each area has had in the last 24 hours.  Circle size corresponds to the number of new cases.

In [122]:
yesterday = today - dt.timedelta(days=1)
yesterday_text = f"{yesterday.month}/{yesterday.day}/{yesterday.year}"

cases_today = df_num_cases.loc[df_num_cases["date"] == today_text]
cases_yesterday = df_num_cases.loc[df_num_cases["date"] == yesterday_text]

cases_delta = []
for row_today in cases_today.values:
    # Grab the num_cases from yesterday.
    row_yesterday = [n for n in cases_yesterday.values if n[1] == row_today[1]]
    if len(row_yesterday) == 0:
        delta_cases = row_today[3]
    else:
        row_yesterday = row_yesterday[0]
        delta_cases = int(row_today[3]) - row_yesterday[3]
    cases_delta.append([row_today[1], delta_cases, row_today[4], row_today[5]])

df_cases_delta = pd.DataFrame(cases_delta)
df_cases_delta.columns = ["county", "num_cases", "latitude", "longitude"]

county_map = folium.Map(
    location=[31.9686, -99.9018],
    tiles='cartodbpositron',
    zoom_start=6,
)

df_cases_delta.apply(lambda row: folium.CircleMarker(location=[row["latitude"], row["longitude"]], radius = row["num_cases"]).add_to(county_map), axis=1)

county_map

In [136]:
# Save the data and we're done.

#df_num_cases.columns = ["unnamed", "county", "date", "num_cases", "latitude", "longitude"]
#df_num_cases = df_num_cases.drop(columns=["unnamed"], axis=1)
df_num_cases.to_csv("data.csv")