# Maps
Additional visualizations using maps and datasets.<br><br>
*WNV: West Nile Virus*

*Data sources: [Kaggle -- Link](https://www.kaggle.com/c/predict-west-nile-virus/)*

## Organization
**Across the Project**
1. [Cleaning, Exploratory Visualizations, and Export]('./01_cleaning_eda.ipynb')
2. **Maps (Visualizations) (Current Notebook)**
3. [Models and Conclusions]('./03_models.ipynb')

**Within this Notebook**
1. [Import](#Import)
1. [Mapping](#Mapping)
1. [Mapping: Dataframes](#Map-Dataframes)
1. [Map Plot 1: 2011 WNV vs Sprays](#Plot-1:-2011-WNV-vs-Sprays)
1. [Map Plot 2: 2013 WNV vs Sprays](#Plot-2:-2011-WNV-vs-Sprays)
1. [Map Plot 3: 2011 Number of Mosquitoes and WNV areas](#Plot-3:-2011-Number-of-Mosquitoes-(Blue)-vs-WNV-detected-(Cyan))
1. [Map Plot 4: 2011 Number of Mosquitoes and WNV areas](#Plot-4:-2013-Number-of-Mosquitoes-(Pink)-vs-WNV-detected-(Red))
1. [Map Plot 5: (GIF) 2013 mosquitoes over time](#Plot-5:-2013-Number-of-Mosquitoes-over-time)
1. [Map Plot 5: (GIF) 2013 Locations where WNV is detected over time](#Plot-6:-2013-Locations-where-WNV-is-detected-over-time)

## Summary of Findings

1. Using maps in visualization are effective in expressing relative proximity and size. This was useful in illustrating mosquito vectors and presence of West Nile virus clusters within these vectors.
1. By stitching map overlays into GIF loops, they also provide a way to interpret changes over time.
1. However, while chronological changes are easy to read, the underlying causality and correlation between datasets available are less obvious. For example,  when dealing with many clusters, are not as effective in helping readers discern correlation accurately.

#### Import

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
# !pip install Pillow
import PIL

In [None]:
# %reset

#### Import Datasets

In [None]:
# Import datasets
train_df = pd.read_csv('../datasets/final_train.csv', parse_dates=['date'], index_col=['date'])
spray_df = pd.read_csv('../datasets/spray_clean.csv', parse_dates=['date'], index_col=['date'])

In [None]:
train_df = train_df.drop(columns='Unnamed: 0')

In [None]:
train_df.head().T

In [None]:
# view spray df
spray_df.head().T

In [None]:
# check Datetime Index
train_df.index

In [None]:
# check Datetime Index
spray_df.index

[Back to top](#Organization)

### Mapping

#### Mapping function

In [None]:
# Define mapping function
# Adapted code; Credit: https://www.kaggle.com/jbobtaylor/show-map-image-in-python
def chicago_map_plot(lats1, longs1, fig_title, output_path,
                     color1='r', color2='b', 
                     lats2=[], longs2=[], 
                     size1=20, size2=20,
                     marker1='x', marker2=',',
                     alpha1=0.02, alpha2=0.02,
                     map_size=(16,10), 
                     map_base_filepath = '../datasets/mapdata_copyright_openstreetmap_contributors.txt',
                     intersection = [41.909614, -87.746134],  # intersection of IL64 / IL50 from Google Earth
                     **kwargs):
    # initialize map coordinates
    origin = [41.6, -88.0]      # lat/long of origin (lower left corner)
    upperRight = [42.1, -87.5]  # lat/long of upper right corner
    # load map
    mapdata = np.loadtxt(map_base_filepath)


# if any data generation needed
# numPoints = 50
# lats = (upperRight[0] - origin[0]) * np.random.random_sample(numPoints) + origin[0]
# longs = (upperRight[1] - origin[1]) * np.random.random_sample(numPoints) + origin[1]

# co-ordinates of blue dot (origin / intersection / reference point)
    plt.figure(figsize=map_size)
    plt.scatter(x=intersection[1], y=intersection[0], c='black', s=60, marker='s', **kwargs)

    # generate plot
    
    plt.imshow(mapdata,
               cmap=plt.get_cmap('gray'),
               extent=[
                   origin[1],
                   upperRight[1],
                   origin[0],
                   upperRight[0]
               ])
    # generate red points
    plt.scatter(x=longs1, y=lats1, c=color1, s=size1, marker=marker1, alpha=alpha1, **kwargs)

    # generate blue points
    plt.scatter(x=longs2, y=lats2, c=color2, s=size2, marker=marker2, alpha=alpha2, **kwargs)

    #plt.show()
    plt.title(fig_title)
    plt.tight_layout()
    plt.savefig(output_path)

In [None]:
# creating gif-ify function
# Source credit: https://www.blog.pythonlibrary.org/2021/06/23/creating-an-animated-gif-with-python/

import glob
from PIL import Image

def make_gif(frame_folder,
             output_filepath,
             time_per_frame=1000,
             loops=0,
             gif_frame_filetype = 'jpg'):
    # form list of Image objects
    frames = [Image.open(image) for image in glob.glob(f"{frame_folder}/*.{gif_frame_filetype}")]
    frames[0].save(
        output_filepath,
        format="GIF",
        append_images=frames,
        save_all=True,
        duration=time_per_frame,  # in ms
        loop=loops                # number of times to loop
    )

[Back to top](#Organization)

#### Map Dataframes

Generate relevant dataframes for plotting.

In [None]:
wnv_train_df = train_df[['latitude', 'longitude', 'wnvpresent']][train_df['wnvpresent'] == 1]

In [None]:
wnv_train_df

In [None]:
totalmozzies_lat_long = train_df[['nummosquitos', 'latitude', 'longitude']]

In [None]:
totalmozzies_lat_long.sample(5)

In [None]:
# explore rows beyond map boundaries
spray_df[spray_df['latitude'] > 42.1]

In [None]:
# drop rows beyond map boundaries
spray_df_map = spray_df.drop(index=spray_df[spray_df['latitude'] > 42.1].index)

[Back to top](#Organization)

## Visualizations

Code to plot files attached, although not embedded so as to keep notebook file size small.

#### Plot 1: 2011 WNV vs Sprays

- 2011 sprays were concentrated only one area, indicating the city may have focused efforts on an arbitrary large cluster with WNV.
- WNV clusters spread out across Chicago, not just the downtown.

In [None]:
chicago_map_plot(lats1=wnv_train_df.loc['2011']['latitude'],
                 longs1=wnv_train_df.loc['2011']['longitude'],
                 alpha1=1, size1=1000, marker1 = '$wnv$',
                 alpha2=0.02, size2=1000, marker2='x',
                 lats2=spray_df_map.loc['2011']['latitude'],
                 longs2=spray_df_map.loc['2011']['longitude'],
                 map_size=(16,20),
                 fig_title='WNV in 2011 (Red) vs Sprays in 2011 (Blue)',
                 output_path='../images/map_wnv_2011_vs_sprays_2011.png')

![WNV vs Sprays 2011](../images/map_wnv_2011_vs_sprays_2011.png "WNV vs Sprays 2011")

[Back to top](#Organization)

#### Plot 2: 2013 WNV vs Sprays

- 2013 sprays were spread out across more clusters but it is not clear from map visualizations that there were effective results.
- However, from plots of spray times and mosquito populations in the previous [notebook](#Organization), the sprays did not seem to show clear effectiveness in managing mosquito vectors.

In [None]:
chicago_map_plot(lats1=wnv_train_df.loc['2013']['latitude'],
                 longs1=wnv_train_df.loc['2013']['longitude'],
                 alpha1=1, size1=1000, marker1 = '$wnv$',
                 alpha2=0.02, size2=1000, marker2='x',
                 lats2=spray_df_map.loc['2013']['latitude'],
                 longs2=spray_df_map.loc['2013']['longitude'],
                 map_size=(16,20),
                 fig_title='WNV in 2013 (Red) vs Sprays in 2013 (Blue)',
                 output_path='../images/map_wnv_2013_vs_sprays_2013.png')

![WNV vs Sprays 2013](../images/map_wnv_2013_vs_sprays_2013.png "WNV vs Sprays 2011")

[Back to top](#Organization)

#### Plot 3: 2011 Number of Mosquitoes (Blue) vs WNV detected (Cyan)

- WNV appears in a number of mosquito clusters, but there are still many clusters where WNV was not detected (at least in 2011).

In [None]:
chicago_map_plot(lats2=wnv_train_df.loc['2011']['latitude'],
                 longs2=wnv_train_df.loc['2011']['longitude'],
                 alpha2=1, size2=1000, marker2 = '$wnv$', color2='cyan',
                 alpha1=0.3, color1='blue', marker1='o',
                 size1=[
                     5*i for i in totalmozzies_lat_long.loc['2011']['nummosquitos']
                 ],
                 lats1=totalmozzies_lat_long['2011']['latitude'],
                 longs1=totalmozzies_lat_long['2011']['longitude'],
                 map_size=(16,20),
                 fig_title='2011 Number of Mosquitoes (Blue) vs WNV detected (Cyan)',
                 output_path='../images/map_2011nummozzies_vs_wnv.png')

![2011 Number of Mosquitoes (Blue) vs WNV detected (Cyan)](../images/map_2011nummozzies_vs_wnv.png "2011 Number of Mosquitoes (Blue) vs WNV detected (Cyan)")

[Back to top](#Organization)

#### Plot 4: 2013 Number of Mosquitoes (Pink) vs WNV detected (Red)

- WNV seems to have spread to most mosquito clusters since 2011.

In [None]:
chicago_map_plot(lats2=wnv_train_df.loc['2013']['latitude'],
                 longs2=wnv_train_df.loc['2013']['longitude'],
                 alpha2=1, size2=1000, marker2 = '$wnv$', color2='red',
                 alpha1=0.3, color1='pink', marker1='o',
                 size1=[
                     5*i for i in totalmozzies_lat_long.loc['2013']['nummosquitos']
                 ],
                 lats1=totalmozzies_lat_long['2013']['latitude'],
                 longs1=totalmozzies_lat_long['2013']['longitude'],
                 map_size=(16,20),
                 fig_title='2013 Number of Mosquitoes (Pink) vs WNV detected (Red)',
                 output_path='../images/map_2013nummozzies_vs_wnv.png')

![2013 Number of Mosquitoes (Pink) vs WNV detected (Red)](../images/map_2013nummozzies_vs_wnv.png "2013 Number of Mosquitoes (Pink) vs WNV detected (Red)")

[Back to top](#Organization)

#### Plot 5: 2013 Number of Mosquitoes over time

In [None]:
for i in [date for date in totalmozzies_lat_long.loc['2013'].index.unique()]:
    year = '2013'
    chicago_map_plot(lats1=totalmozzies_lat_long.loc[i]['latitude'],
                     longs1=totalmozzies_lat_long.loc[i]['longitude'],
                     alpha1=1,
                     map_size=(16,20), marker1='o',
                     size1=[
                         3*i for i in totalmozzies_lat_long['2013'].loc[i]['nummosquitos']
                     ],
                     fig_title=f'Mozzies (in red) on {year}-{str([i])}',
                     output_path=f'../images/gif_frames/total_mozzies_yr{year}_{str(i)[:10]}.jpg')

In [None]:
if __name__ == '__main__':
    make_gif(frame_folder='../images/gif_frames/',
             output_filepath='../images/total_mozzies_2013_over_time.gif')

Taking a GIF loop of 2013 mosquito vector development; some observations below:
- Growth in vectors peak in July / August then wanes towards the end of September,
- A few large inland mosquito vectors; these may be worth investigating further to see if sprays will be effective to manage those clusters. Notably, there is still natural waning of mosquito population towards winter time.
- Vectors seem to have developed along the coast but this could also be related to how traps were laid.

![2013 mozzies over time](../images/total_mozzies_2013_over_time.gif' "2013 mozzies over time")

[Back to top](#Organization)

#### Plot 6: 2013 Locations where WNV is detected over time

- Peak detected locations for WNV in August which wanes towards September.

In [None]:
for i in [date for date in wnv_train_df.loc['2013'].index.unique()]:
    year = '2013'
    chicago_map_plot(lats1=wnv_train_df.loc[i]['latitude'],
                     longs1=wnv_train_df.loc[i]['longitude'],
                     alpha1=1,
                     map_size=(16,20), marker1='$wnv$',
                     size1=500,
                     fig_title=f'WNV (in red) on {year}-{str([i])}',
                     output_path=f'./images/gif_frames2/wnv_over_time_{year}_{str(i)[:10]}.jpg')

In [None]:
if __name__ == '__main__':
    make_gif(frame_folder='../images/gif_frames2/',
             output_filepath='../images/wnv_over_time_2013.gif')

![2013 WNV over time](../images/wnv_over_time_2013.gif "2013 WNV over time")