# West Nile Virus Prediction Model and Cost Analysis (Part 3/3)

# 1 Initialization

In [8]:
# Disable Future Warning
import warnings
warnings.filterwarnings(action='ignore', category=FutureWarning)

import pandas as pd
import folium

In [9]:
train = pd.read_csv('./assets/train_final.csv', parse_dates=['Date'])

# Read-in socioeconomy data of neighbourhood from the official City of Chicago archive
socio = pd.read_csv('https://data.cityofchicago.org/api/views/kn9c-c2s2/rows.csv?accessType=DOWNLOAD')

# Grouping the trainset by neighbourhood
neighbourhood = pd.DataFrame(train.groupby('Neighbourhood').mean()[['NumMosquitos', 'WnvPresent']]).reset_index()

# Merging the socio-economy data into our trainset data for further analysis
socioeco = pd.merge(neighbourhood, socio, how='left', left_on='Neighbourhood', right_on='COMMUNITY AREA NAME')

# 2 Socio-Economy and Cost Aspect

According to research, West Nile Virus (WNV) is particularly dangerous in older people, who account for a large number of severe cases and deaths caused by the virus [[ScienceDaily, 2015]](https://www.sciencedaily.com/releases/2015/07/150723181102.htm). So instead of focusing just on reducing the number of mosquitos or number of WNV-positive traps, our main concern should be to reduce WNV-related cases or fatality. 

Ultimately, the objective of this project is to help authority to spend their resources effectively, it would be insightful to look into the neighbourhood that are of higher risk of WNV cases. So far, we have found that the spraying attempt was most likely reactive, and some of the spraying are targeting at areas with low WNV-positive trap count.

Our strategy is to categorized each neighbourhood into different zone based on their risk of WNV outbreak. Since older people are more vulnerable to succumb to the illness, it would be critical to look into the external data with the demographics of the neighbourhood population. Areas with higher percentage of senior citizens and high historical WNV-positive trap counts will be classified as the high-risk zone where the authority should perform mandatory spraying during each summer when it is most suitable for mosquito proliferation.

In [10]:
# Selecting only required information from the dataset
cols = ['Neighbourhood', 'WnvPresent', 'NumMosquitos', 'PERCENT AGED UNDER 18 OR OVER 64']
socioeco = socioeco[cols]
socioeco.rename(columns={'PERCENT AGED UNDER 18 OR OVER 64': 'Dependencies'}, inplace=True)
socioeco['Dependencies'].fillna(0, inplace=True)

# Label-encoding dependencies (Demographic percentage of people under 18 and over 64)
# Criterion :
# Dependecies - 1 for above Median (50 percentile)
# WNV-samples - 1 for above Upper Quartile (75 percentile)

socioeco['Dependencies_above_median'] = (socioeco['Dependencies'] > socioeco['Dependencies'].quantile(0.5)).astype(int)
socioeco['WnvPresent_above_median'] = (socioeco['WnvPresent'] > socioeco['WnvPresent'].quantile(0.75)).astype(int)

# Average WNV-samples count should have 2 times weightage than Dependencies
socioeco['Risk_zone'] = socioeco['Dependencies_above_median'] + 2 * socioeco['WnvPresent_above_median']

In [11]:
# Retrieve the geo-location data we collected
neigh_df = pd.read_csv('./assets/neighbourhood.csv')

# Convert the area to square-miles
neigh_df['shape_area'] = neigh_df['shape_area']/10_000_000

# Retrieve spraying data with neighbourhood information
spray_with_neigh = pd.read_csv('./assets/spray_with_neigh.csv')
spray_with_neigh = spray_with_neigh.groupby('Neighbourhood').count()[['Date']].reset_index()

# Combine the spraying data with geo-spatial data
spray_with_neigh = pd.merge(neigh_df, spray_with_neigh, how='left', left_on='pri_neigh', right_on='Neighbourhood')
spray_with_neigh.drop('Neighbourhood', axis=1, inplace=True)
spray_with_neigh.fillna(0, inplace=True)
spray_with_neigh.columns = ['neighbourhood', 'area', 'lat', 'lon', 'spray_count']

# Getting only area with sprayed history
spray_with_neigh['is_sprayed'] = (spray_with_neigh['spray_count']>0).astype(int)

# Merging the socio-economy info with the primary dataset for analysis
socioeco = pd.merge(socioeco, spray_with_neigh, how='outer', left_on='Neighbourhood', right_on='neighbourhood').drop('neighbourhood', axis=1)

In [12]:
socioeco.groupby('is_sprayed').sum()['area']

is_sprayed
0.0    342.959409
1.0    300.872494
Name: area, dtype: float64

In [13]:
socioeco['is_sprayed'].value_counts()

0.0    63
1.0    35
Name: is_sprayed, dtype: int64

In [14]:
perc_area_sprayed = socioeco.groupby('is_sprayed').sum()['area'][1] / socioeco['area'].sum()
perc_neigh_sprayed = socioeco['is_sprayed'].value_counts(normalize=True)[1]
print("Percentage of Area sprayed : ", round(perc_area_sprayed, 2)*100, '%')
print("Percentage of Neighbourhood sprayed : ", round(perc_neigh_sprayed, 2)*100, '%')

Percentage of Area sprayed :  47.0 %
Percentage of Neighbourhood sprayed :  36.0 %


From the calculation above, we can see that close to half the area in the City of Chicago has been sprayed in 2011 and 2013. As mentioned before, some of these areas might not require any spraying because of their relatively low risk. We will look more in detail by mapping it out the next section.

In [15]:
# Convert Risk-Zone to integer
socioeco.dropna(inplace=True)
socioeco['Risk_zone'] = socioeco['Risk_zone'].astype(int)

# 3 WNV-Risk Neighbourhood Zoning

Researches show that advanced age remains as the key factor for WNV infection *(Yao, Yi, and Ruth R Montgomery)*. The main reason is the weakened immune system which makes them more vulnerable especially those who has certain health conditions.

With that, we continue to mark four distinct zones based on the age demographics and counts of mosquitoes that carries the WNV in each zones.

| Zone | Risk | Dependencies | Ave WNV-Positive Percentage | Colour-Code |
| ---- | ---- | ------------ | --------------------------- | ----------- |
| 3 | High | > Median | > 75th Percentile | Maroon |
| 2 | Medium | < Median | > 75th Percentile | Scarlet |
| 1 | Low | > Median | < 75th Percentile | Orange |
| 0 | Very Low | < Median | < 75th Percentile | Yellow |

In [23]:
round(socioeco.Risk_zone.value_counts(normalize=True).sort_index(), 2).mul(100).astype(str) + '%'

0    42.0%
1    33.0%
2     8.0%
3    17.0%
Name: Risk_zone, dtype: object

From the table above, we can see that only 17% of the neighbourhood are considered to be high risk, and 8% to be medium risk. Next we marked the existing spraying area on the map by its neighbourhoods to get a clearer picture of where the spraying is done on the different zones.

In [24]:
chicago_map = folium.Map(location=[41.835, -87.53], zoom_start=10, 
                        min_zoom=10, max_zoom=14) 

# Creating choropleth map based on the risk zoning
chicago_map.choropleth(
    geo_data=r'https://data.cityofchicago.org/api/geospatial/bbvz-uum9?method=export&format=GeoJSON',
    data=socioeco,
    columns=['Neighbourhood','Risk_zone'],
    key_on='feature.properties.pri_neigh',
    fill_color='YlOrRd',
    fill_opacity=0.7, 
    line_opacity=1,
    legend_name='Risk Zone',
    nan_fill_color='white',
    nan_fill_opacity=0.7
)

# Marking area with sprayed-history with point marker and info pop-up
sprayed_area = socioeco[socioeco['is_sprayed']==1]

for name, lat, lon in zip(sprayed_area['Neighbourhood'], sprayed_area['lat'], sprayed_area['lon']):
    label = folium.Popup(name, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color='black',
        weight=1.6,
        fill=True,
        fill_color='white',
        fill_opacity=0.7).add_to(chicago_map)

chicago_map

The map above shows the distribution of each risk zoning geographically. As discussed earlier, the hot zone for WNV are concentrated at the North-West and South-East corner of Chicago, which aligned with the distribution of High and Medium-Risk zone in the map.

Besides, we can see that there has been spraying in neighbourhood that are yellow-colored or 'low-risk', this may be a preventive strategy for mosquito to escape to the adjacent neighbourhood. However, if we are going to deploy a proactive tactics in spraying, there will be lesser need for such preventive spraying. Hence the resources can be optimized for the authority.

In [25]:
risk_zone_df = socioeco.groupby('Risk_zone').sum()[['area']].astype(int)
risk_zone_df['percentage_of_area_in_chicago'] = round(risk_zone_df['area'] / risk_zone_df['area'].sum(), 2) * 100

In [26]:
risk_zone_df

Unnamed: 0_level_0,area,percentage_of_area_in_chicago
Risk_zone,Unnamed: 1_level_1,Unnamed: 2_level_1
0,166,31.0
1,177,33.0
2,65,12.0
3,132,24.0


In [27]:
socioeco['WnvPresent'] = round(socioeco['WnvPresent'], 3) * 100
socioeco['area'] = round(socioeco['area'], 1)

As shown above, the area percentage of Zone 2 and 3 only accounted to 36%. As contrary to the original sprayed area of 47%, there is a 11 percent reduction in the coverage area. That would be directly reflected on the spraying cost borne by the authority. By taking on a proactive stance and focus on vulnerable community, the cost saving could be extended to healthcare expense or productivity loss for WNV victims.

## 3.1 Zone-3 Neighbourhood (High Risk)

In [0]:
socioeco.groupby('Risk_zone').get_group(3).sort_values('WnvPresent', ascending=False)[['Neighbourhood', 'area', 'WnvPresent', 'Dependencies']]

Unnamed: 0,Neighbourhood,area,WnvPresent,Dependencies
44,Norwood Park,12.2,12.8,39.5
39,Morgan Park,9.2,10.1,40.3
22,Garfield Ridge,11.8,10.0,38.1
1,Ashburn,13.5,9.0,36.9
43,North Park,7.0,7.5,39.0
6,Belmont Cragin,10.9,7.0,37.3
23,Grand Boulevard,4.8,6.8,39.5
3,Austin,17.0,6.7,37.9
0,Archer Heights,5.6,6.3,39.2
50,Riverdale,9.8,5.9,51.5


## 3.2 Zone-2 Neighbourhood (Medium Risk)

In [0]:
socioeco.groupby('Risk_zone').get_group(2).sort_values('WnvPresent', ascending=False)[['Neighbourhood', 'area', 'WnvPresent', 'Dependencies']]

Unnamed: 0,Neighbourhood,area,WnvPresent,Dependencies
17,Edison Park,3.2,10.5,35.3
45,O'Hare,37.2,9.0,30.3
14,Dunning,10.4,8.5,33.6
16,Edgewater,3.9,7.1,23.8
47,Portage Park,11.0,6.0,34.0


## 3.3 Zone-1 Neighbourhood (Low Risk)

In [0]:
socioeco.groupby('Risk_zone').get_group(1).sort_values('WnvPresent', ascending=False)[['Neighbourhood', 'area', 'WnvPresent', 'Dependencies']]

Unnamed: 0,Neighbourhood,area,WnvPresent,Dependencies
15,East Side,8.3,5.8,42.8
41,New City,13.5,5.5,38.9
10,Calumet Heights,4.9,5.4,44.0
9,Burnside,1.7,5.3,42.7
13,Clearing,7.1,5.2,37.6
7,Beverly,8.9,5.0,40.5
61,West Ridge,9.8,4.9,38.5
40,Mount Greenwood,7.6,4.9,36.8
52,Roseland,13.4,4.8,41.2
18,Englewood,17.4,4.7,42.5


# 4 Cost Benefit Analysis
Using the cost benefit analysis (CBA), it allows us to better determine if spraying of chemical pesticide considered a good approach to reduce wnv cases while preserving savings over time. 

Under the costs, we derive the estimated price of vector controls from the wnv outbreak occurred in Sacramento County, California. A six nights aerial spray on an area of `477` kilometre cost the city `$701,790`, which is equivalent to `$1471` square kilometre *(Bellini, R., Zeller, H. & Van Bortel, W.)*.

Between 2012 and 2013, we observed a drop in postive case of the West Nile Virus (WNV) in Cook County from `174` to `60` *(Idph.state.il.us. 2020)*. In 2012, it was reported that several counties experienced sudden substantial outbreak of WNV cases. The authorities responsed to the outbreak by increasing the spraying intensiveness, coupled with aerial spraying method *(Ruktanonchai, Duke J et al)*. Assuming the consistency of spraying continues on to the following year, the drop of `100` WNV cases in 2013 can be accounted for the medical and productivity savings for the year.

According to the calculations of the CBA, a reduction of `30` WNV cases is the breakeven point for the cost of vector control. With a benefit cost ratio of `3.34`, it would be reasonable to conclude that spraying is indeed a cost efficient method to counter the WNV in Chicago.



<img src="./images/cost_analysis.png">

# 5 Business recommendations
By inferring the benefit-cost ratio of `3.34`, it shows more benefit of spraying over costs implying that spraying might be a cost effective way to control the WNV in Chicago.

However according to our marked zones, there are existing spraying that was done on the low risk zones (Zone 1) which might lower the effectiveness of spraying. Furthermore, we found that there are several areas in Zone 3 which doesn't have any spraying activities. This might increase the possibilities of WNV infection as the old people residing in those areas are more prone to contract of virus.

Hence, we would like to propose to the council of Chicago to prioritise spraying in those higher risk zone (Zone 2 and Zone 3). It will be prudent to target specific areas to spray for example, the remaining Zone 3 to increase the effectiveness of vector control.


# 6 Limitations and further considerations
Chicago uses a truck to spray insecticide, hence spraying effectiveness is limited to areas near roads where the truck is able to spray.
Spraying should only be part of Chicago's toolbox to manage the vectors of the West Nile virus




There could also be other reasons for the decline in the number of WNV present such as other mosquito control efforts like larvicide implementation, varying effectivness of spraying in different ecological areas, increased implementation of personal protective measures in sprayed areas.

Chicago can consider including data on larvicide and personal protective measures to filter out their effects and increase the ability to determine spraying effectiveness


# 7 References

1.  Bellini, R., Zeller, H. & Van Bortel, W. A review of the vector management methods to prevent and control outbreaks of West Nile virus infection and the challenge for Europe. Parasites Vectors 7, 323 (2014). https://doi.org/10.1186/1756-3305-7-323

2. Idph.state.il.us. 2020. West Nile Virus In Illinois - Surveillance. [online] http://www.idph.state.il.us/envhealth/wnvsurveillance_humancases_13.htm

3. Ruktanonchai, Duke J et al. “Effect of aerial insecticide spraying on West Nile virus disease--north-central Texas, 2012.” The American journal of tropical medicine and hygiene vol. 91,2 (2014): 240-5. doi:10.4269/ajtmh.14-0072

4. Yao, Yi, and Ruth R Montgomery. “Role of Immune Aging in Susceptibility to West Nile Virus.” Methods in molecular biology (Clifton, N.J.) vol. 1435 (2016): 235-47. doi:10.1007/978-1-4939-3670-0_18