# Venues Types in High COVID-19 Rate Iowa Counties

## Introduction

### Is there a difference in venue type and venue volumes in Iowa counties with high covid rates compared to those with low covid rates?

I will be exploring Iowa COVID-19 data at the county level in conjunction with venue data from FourSquare to determine any commanalities between counties with high & low positive COVID-19 rates. Iowa's govenor has been restricting hours of operations in certain business types. This analysis may pinpoint addition or different business types to target in order to stem the spread of the infection.

## Data Needed

County level COVID-19 data for the state of Iowa in the United States. 
 * Obtained from NY Time git respository. 
  * https://github.com/nytimes/covid-19-data
  * https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
  
County level population data for the state of Iowa in the United States.
 * US Census Bureau data access via Iowa State University's Iowa Community Indicator Program

County level venue data for the state of Iowa in the United States.
 * FourSquare

### County-level COVID-19 Data for Iowa, United States

I am sourcing my COVID-19 data from the NY Times Github repository. Since late January the Times has been collecting data from state and local governments and health departments across the United States in an effort to provide an ongoing record of the outbreak.  They have made this data publically available.

[The New York Times COVID Outbreak](https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html)

This data is provided with the fips code included (county detail code). This will help tie together population and venue data from other sources.

While the dataset contains information from across the US, I will be focused on the state of Iowa.

In [1]:
#Import libraries:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)


# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import json # library to handle JSON files
import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe

!pip install folium
#!conda install folium
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

!pip install geopy
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!pip install geopandas

print('Libraries imported.')






Libraries imported.


In [2]:
url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'
c19df = pd.read_csv(url, error_bad_lines=False)


c19df.head()


Unnamed: 0,date,county,state,fips,cases,deaths
0,2020-01-21,Snohomish,Washington,53061.0,1,0
1,2020-01-22,Snohomish,Washington,53061.0,1,0
2,2020-01-23,Snohomish,Washington,53061.0,1,0
3,2020-01-24,Cook,Illinois,17031.0,1,0
4,2020-01-24,Snohomish,Washington,53061.0,1,0


In [3]:
print(c19df.dtypes)

date       object
county     object
state      object
fips      float64
cases       int64
deaths      int64
dtype: object


In [4]:
print(c19df.describe())
print(c19df.describe(include=['object']))

                fips          cases         deaths
count  744591.000000  751749.000000  751749.000000
mean    31222.752582    1426.438910      45.459644
std     16284.064649    7756.791436     430.349411
min      1001.000000       0.000000       0.000000
25%     18177.000000      26.000000       0.000000
50%     29205.000000     149.000000       2.000000
75%     46095.000000     670.000000      15.000000
max     78030.000000  357541.000000   24188.000000
              date      county   state
count       751749      751749  751749
unique         305        1929      55
top     2020-10-18  Washington   Texas
freq          3248        7506   56588


In [5]:
iac19df = pd.DataFrame(c19df[c19df['state'] =='Iowa'])
iac19df.reset_index(drop=True,inplace=True)

In [6]:
iac19df.tail()

Unnamed: 0,date,county,state,fips,cases,deaths
23165,2020-11-20,Winnebago,Iowa,19189.0,855,23
23166,2020-11-20,Winneshiek,Iowa,19191.0,860,9
23167,2020-11-20,Woodbury,Iowa,19193.0,9706,110
23168,2020-11-20,Worth,Iowa,19195.0,318,0
23169,2020-11-20,Wright,Iowa,19197.0,1118,5


In [7]:
print(iac19df.describe())
print(iac19df.describe(include=['object']))

               fips         cases        deaths
count  22966.000000  23170.000000  23170.000000
mean   19099.166681    561.095339      9.083211
std       57.361880   1566.017872     25.148653
min    19001.000000      0.000000      0.000000
25%    19049.000000     27.000000      0.000000
50%    19099.000000    125.000000      1.000000
75%    19149.000000    438.000000      7.000000
max    19197.000000  30369.000000    319.000000
              date   county  state
count        23170    23170  23170
unique         258      100      1
top     2020-05-29  Johnson   Iowa
freq           100      258  23170


### County level population data for Iowa, United States

Iowa State University Iowa Community Indicators Program has cultivated population data for the State of Iowa from the US Census Bureau. This data contains annual population estimates at a county (fips) level.

https://www.icip.iastate.edu/tables/population/counties-estimates

I plan to use this data to assist in normalizing positive numbers/rates based upon the total population for the county.

In [8]:
url = 'https://www.icip.iastate.edu/sites/default/files/uploads/tables/population/popest-annual.xls'
xls=pd.ExcelFile(url)
pop_iacnty = pd.read_excel(xls, 'Counties', skiprows=[0,1,2,3,4,5])


pop_iacnty.head()   

Unnamed: 0,Fips,Area,Estimates Base (4/1/2010),2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,19,Iowa,3046871.0,3050745.0,3066336.0,3076190.0,3092997.0,3109350.0,3120960.0,3131371.0,3141550.0,3148618.0,3155070.0
1,19001,"Adair County, Iowa",7682.0,7679.0,7546.0,7468.0,7387.0,7368.0,7145.0,7005.0,7051.0,7074.0,7152.0
2,19003,"Adams County, Iowa",4029.0,4023.0,3994.0,3910.0,3891.0,3877.0,3754.0,3692.0,3657.0,3644.0,3602.0
3,19005,"Allamakee County, Iowa",14328.0,14378.0,14222.0,14149.0,14071.0,14062.0,13874.0,13851.0,13803.0,13852.0,13687.0
4,19007,"Appanoose County, Iowa",12887.0,12856.0,12848.0,12707.0,12654.0,12671.0,12577.0,12505.0,12353.0,12401.0,12426.0


In [9]:
print(pop_iacnty.dtypes)

Fips                          object
Area                          object
Estimates Base (4/1/2010)    float64
2010                         float64
2011                         float64
2012                         float64
2013                         float64
2014                         float64
2015                         float64
2016                         float64
2017                         float64
2018                         float64
2019                         float64
dtype: object


In [10]:
total_ia = pd.DataFrame(pop_iacnty[pop_iacnty['Fips'] ==19])
pop_iacnty = pd.DataFrame(pop_iacnty[pop_iacnty['Fips'] !=19])
pop_iacnty.reset_index(drop=True,inplace=True)

In [11]:
total_ia.head() 

Unnamed: 0,Fips,Area,Estimates Base (4/1/2010),2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,19,Iowa,3046871.0,3050745.0,3066336.0,3076190.0,3092997.0,3109350.0,3120960.0,3131371.0,3141550.0,3148618.0,3155070.0


In [12]:
pop_iacnty.head() 

Unnamed: 0,Fips,Area,Estimates Base (4/1/2010),2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,19001,"Adair County, Iowa",7682.0,7679.0,7546.0,7468.0,7387.0,7368.0,7145.0,7005.0,7051.0,7074.0,7152.0
1,19003,"Adams County, Iowa",4029.0,4023.0,3994.0,3910.0,3891.0,3877.0,3754.0,3692.0,3657.0,3644.0,3602.0
2,19005,"Allamakee County, Iowa",14328.0,14378.0,14222.0,14149.0,14071.0,14062.0,13874.0,13851.0,13803.0,13852.0,13687.0
3,19007,"Appanoose County, Iowa",12887.0,12856.0,12848.0,12707.0,12654.0,12671.0,12577.0,12505.0,12353.0,12401.0,12426.0
4,19009,"Audubon County, Iowa",6119.0,6098.0,6004.0,5865.0,5863.0,5771.0,5711.0,5626.0,5550.0,5471.0,5496.0


### FourSquare Venue Data

I am using FourSquare data to find venue types in each county to determine prevalent business types in Iowa counties.

In [13]:
# The code was removed by Watson Studio for sharing.

In [14]:
address = 'Appanoose County, IA'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

40.7455904 -92.8649061


In [15]:
radius = 10000

url2 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url2

'https://api.foursquare.com/v2/venues/explore?&client_id=2EN2GRB4CYL53K55NJI4X0LZM0JOGP2PIFFEFFPSDGTIMNR3&client_secret=M3GVSBQ3DDNWAZIF1SL0FVQYBNUWQZBLYQBOG1XOAYXLNBMT&v=20180604&ll=40.7455904,-92.8649061&radius=10000&limit=30'

In [16]:
results = requests.get(url2).json()
results

{'meta': {'code': 200, 'requestId': '5fb951e5f80c210a28e89ff8'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Centerville',
  'headerFullLocation': 'Centerville',
  'headerLocationGranularity': 'city',
  'totalResults': 28,
  'suggestedBounds': {'ne': {'lat': 40.83559049000009,
    'lng': -92.74633397106352},
   'sw': {'lat': 40.65559030999991, 'lng': -92.98347822893648}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bc119f4abf495217890c093',
       'name': "George & Nick's",
       'location': {'address': '111 E Van Buren St',
        'lat': 40.734908318428914,
        'lng': -92.87385457137748,
        'labeledLatLngs': [{'label': 'display',
          'lat': 40.73490831842

In [17]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [18]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,George & Nick's,Steakhouse,40.734908,-92.873855
1,The Continental Hotel,Hotel,40.734077,-92.873104
2,Double R Dairy Bar,Ice Cream Shop,40.732746,-92.867385
3,Hy-Vee,Grocery Store,40.737573,-92.866358
4,Casey's General Store,Pizza Place,40.734607,-92.86771


In [19]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

28 venues were returned by Foursquare.
