# WEEK 4

## Part 2: Visualizing geo-data

It turns out that `plotly` (which we used during Week 3) is not the only way of working with geo-data. There are many different ways to go about it. (The hard-core PhD and PostDoc researchers in my group simply use matplotlib, since that provides more control. For an example of that kind of thing, check out [this tutorial](https://towardsdatascience.com/visualizing-geospatial-data-in-python-e070374fe621).)

Today, we'll try another library for geodata called [Folium](https://github.com/python-visualization/folium). It's good for you all to try out a few different libraries - remember that data visualization and analysis in Python is all about the ability to use many different tools. 

The exercise below is based on the code illustrated in this nice [tutorial](https://www.kaggle.com/daveianhickey/how-to-folium-for-maps-heatmaps-time-data), so let us start by taking a look at that one.

*Reading*. Read through the following tutorial
 * "How to: Folium for maps, heatmaps & time data". Get it here: https://www.kaggle.com/daveianhickey/how-to-folium-for-maps-heatmaps-time-data
 * (Optional) There are also some nice tricks in "Spatial Visualizations and Analysis in Python with Folium". Read it here: https://towardsdatascience.com/data-101s-spatial-visualizations-and-analysis-in-python-with-folium-39730da2adf

> *Exercise*: A different take on geospatial data. 
>
>A couple of weeks ago (Part 3 of Week 2), we worked with spacial data by using color-intensity of shapefiles to show the counts of certain crimes within those individual areas. Today, we look at studying geospatial data by plotting raw data points as well as heatmaps on top of actual maps.
> 
> * First start by plotting a map of San Francisco with a nice tight zoom. Simply use the command `folium.Map([lat, lon], zoom_start=13)`, where you'll have to look up San Francisco's longitude and latitude.
> * Next, use the the coordinates for SF City Hall `37.77919, -122.41914` to indicate its location on the map with a nice, pop-up enabled maker. (In the screenshot below, I used the black & white Stamen tiles, because they look cool).

> <img src="https://raw.githubusercontent.com/suneman/socialdata2022/main/files/city_hall_2022.png" alt="drawing" width="600"/>

In [1]:
# import packages
import pandas as pd
import folium 
from folium.plugins import HeatMap
from folium import plugins

# Import Data
df = pd.read_csv('Police_Department_Incident_Reports__Historical_2003_to_May_2018.csv') # load the csv data into pandas dataframe 
focuscrimes = ['WEAPON LAWS', 'PROSTITUTION', 'DRIVING UNDER THE INFLUENCE', 'ROBBERY', 'BURGLARY', 'ASSAULT', 'DRUNKENNESS', 'DRUG/NARCOTIC', 'TRESPASS', 'LARCENY/THEFT', 'VANDALISM', 'VEHICLE THEFT', 'STOLEN PROPERTY', 'DISORDERLY CONDUCT']

# Convert data to a datetime format
df.Date = pd.to_datetime(df.Date)
df.Time = pd.to_datetime(df.Time, format = '%H:%M')
# Make a subset without including 2018 
df = df[df.Date.dt.year != 2018]

# Create year and month variable.
df['Year'] = df.Date.dt.year
df['Month'] = df.Date.dt.month
df['HourInt'] = df.Time.dt.hour
df['Minute'] = df.Time.dt.minute
df['Weekday'] = df.Date.dt.weekday
df['Week_hour'] = df.Weekday*24 + df.HourInt
# df = df[df.Category.isin(focuscrimes)]

In [2]:
# Create San Francisco map
sf_coords = [37.77919, -122.41914]
sf_map = folium.Map(location=sf_coords, zoom_start=12, tiles="Stamen Toner")
sf_map

In [3]:
# Put marker for San Francisco City Hall
sf_city_hall = [37.77919, -122.41914]
folium.Marker(sf_city_hall, popup='City Hall').add_to(sf_map)
sf_map

> * Now, let's plot some more data (no need for pop-ups this time). Select a couple of months of data for `'DRUG/NARCOTIC'` and draw a little dot for each arrest for those two months. You could, for example, choose June-July 2016, but you can choose anything you like - the main concern is to not have too many points as this uses a lot of memory and makes Folium behave non-optimally. 
> We can call this kind of visualization a *point scatter plot*.



In [4]:
# Filter the DataFrame to only include rows with Category 'DRUG/NARCOTIC'
drug_data = df[df.Category=='DRUG/NARCOTIC']

# Further filter the DataFrame to only include rows with Date between '2016-06-01' and '2016-07-31'
drug_data = drug_data[(drug_data.Date >= '2016-06-01') & (drug_data.Date <= '2016-07-31')]

# Create a Folium map centered at location [37.77, -122.42] with zoom level 12 and Stamen Toner tiles
sf_map = folium.Map(location=[37.77, -122.42], zoom_start=12, tiles="Stamen Toner")

# Iterate through each row in the drug_data DataFrame and add a CircleMarker with the specified location, radius, and color
sf_drug_map = [folium.CircleMarker(location=[row["Y"], row["X"]], radius=1, color="red").add_to(sf_map) for i, row in drug_data.iterrows()]

# Display the map
sf_map

Ok. Time for a little break. Note that a nice thing about Folium is that you can zoom in and out of the maps.

> *Exercise*: Heatmaps.
> * Now, let's play with **heatmaps**. You can figure out the appropriate commands by grabbing code from the main [tutorial](https://www.kaggle.com/daveianhickey/how-to-folium-for-maps-heatmaps-time-data)) and modifying to suit your needs.
>    * To create your first heatmap, grab all arrests for the category `'SEX OFFENSES, NON FORCIBLE'` across all time. Play with parameters to get plots you like.

In [5]:
# Make subset of data for category 'SEX OFFENSES, NON FORCIBLE' across all time
sexoff_data = df[df.Category == "SEX OFFENSES, NON FORCIBLE"]
sexoff_data.head()

# Plot the map with numbers of crimes as a heatmap
sf_map = folium.Map(location=[37.77, -122.42], zoom_start=12, tiles="Stamen Toner")
HeatMap(data=sexoff_data[["Y", "X"]].groupby(["Y", "X"]).size().reset_index().values.tolist(), radius=15, blur= 15, max_zoom=13).add_to(sf_map)
sf_map


>    * Now, comment on the differences between scatter plots and heatmaps. 
>.      - What can you see using the scatter-plots that you can't see using the heatmaps? 
>.      - And *vice versa*: what does the heatmaps help you see that's difficult to distinguish in the scatter-plots?
>    * Play around with the various parameters for heatmaps. You can find a list here: https://python-visualization.github.io/folium/plugins.html
>    * Comment on the effect on the various parameters for the heatmaps. How do they change the picture? (at least talk about the `radius` and `blur`).

**COMMENTS**

Differences between point scatter plots and heatmaps:
* The advantage of using a point scatter plot on a map is that it is very easy to see where on the map crimes have occured as one single incident stands out clearly. However, in areas where there are many incidents, it is not very easy to see how strong the density is, as each point just lies on top of eachother.
* On the other hand, heat maps are very good at showing the density and therefore makes it possible to distinguish many incidents occur in one are compared to the other. However, when there are few incidents, the density is very low and it can be difficult to see on the map.
* The radius parameter for heatmaps controls the size of each data point. A larger size will make each data point stand out more, and cover over other data points making it show the densinty even more. With a very low radius, the data points in areas with little density will not even show.
* The blur parameter is similar to the alpha parameter in matplotlib, and controls how blurry one data point appears. If it is set very low, just a few data point on top of each other will look very dense. If it set very high, it will be difficult to see the data points at all.

For the final element of working with heatmaps, let's now use the cool Folium functionality `HeatMapWithTime` to create a visualization of how the patterns of your favorite crime-type changes over time.

> *Exercise*: Heatmap movies. This exercise is a bit more independent than above - you get to make all the choices.
> * Start by choosing your favorite crimetype. Prefereably one with spatial patterns that change over time (use your data-exploration from the previous lectures to choose a good one).
> * Now, choose a time-resolution. You could plot daily, weekly, monthly datasets to plot in your movie. Again the goal is to find interesting temporal patterns to display. We want at least 20 frames though.
> * Create the movie using `HeatMapWithTime`.
> * Comment on your results: 
>   - What patterns does your movie reveal?
>   - Motivate/explain the reasoning behind your choice of crimetype and time-resolution. 

In [6]:
# Subset data for Vandalism crime for all time December
vehicle_data = df[(df.Category == 'VANDALISM') & (df.Month == 12)]

# Add a new column 'Day' containing the day of the month 
vehicle_data['Day'] = pd.to_datetime(vehicle_data.Date).dt.day

# Create a base map for San Francisco with Stamen Terrain tiles
sf_map = folium.Map(location=sf_coords, zoom_start=13, tiles="Stamen Terrain")

# Create a list of lists containing the 'Y' and 'X' coordinates for each day in December
heat_data = [[[row['Y'], row['X']] for index, row in vehicle_data[vehicle_data['Day'] == i].iterrows()] for i in range(1, 32)]

# Create a HeatMapWithTime plugin with the heat data, set to auto_play and with a specified max_opacity
hm = plugins.HeatMapWithTime(heat_data, auto_play=True, max_opacity=0.8)

# Add the HeatMapWithTime plugin to the base map
hm.add_to(sf_map)

# Display the map
sf_map



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vehicle_data['Day'] = pd.to_datetime(vehicle_data.Date).dt.day
