# Mapping

Creating maps is a fun and informative way of handling geospatial data. In this notebook, we will explore basic techniques in mapping with Python.

## A. Geopandas

GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types.

To use geopandas, import the library first

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd

Geopandas allows you to read different map file formats such as shapefiles (shp). Geospatial data always have a column that contains coordinates. In this data, it is the `geometry` column. Load the data to check. 

In [None]:
# Import the data
schools = gpd.read_file('./phl_schp_deped/phl_schp_deped.shp')
schools.head()

When working with geospatial data, although the data looks very similar, the data type is in fact different than the usual DataFrame. 

In [None]:
type(schools)

In some shapefiles, the longitude and latitude are already separated. In this case, however, we would still need to separate it using the centroid function.

In [None]:
schools["x"] = schools.geometry.centroid.x
schools["y"] = schools.geometry.centroid.y

schools.head()

If you noticed in our previous dataframe, we have points per row. If we want to create a heatmap, we would need the boundaries of the areas of interested. In this case, we would need a polygon. Load another shapefile containg polygon for provinces in the Philippines.

In [None]:
# Complete the code
shapefile = 
shapefile.head()

If you plot this dataframe, you will see the shape of the Philippines with the buondaries of each province.

In [None]:
shapefile.plot()

Our map is now ready! Let's try to add information to it. To do this, our goal is to be able to merge the two datasets. And since we want to create a heatmap by province, it makes sense to merge them using the province name

In [None]:
# Check province
print('schools df:\n', sorted(schools["Province"].unique()), '\n')
print('shapefile df:\n', sorted(shapefile["PROVINCE"].unique()))

As expected, when working with names, there will always be cleaning to do. We have prepared the codes below to clean the data for you.

In [None]:
# Create dictionry of those with discrepancy
province_dic = {'CITY OF COTABATO':'Maguindanao',
 'Manila, Ncr, First District':"Metropolitan Manila",
 'Ncr Fourth District':"Metropolitan Manila",
 'Ncr Second District':"Metropolitan Manila",
 'Ncr Third District':"Metropolitan Manila",
 'Western Sama':"Samar"}

In [None]:
# Replace province name
schools["Province"] = schools["Province"].str.title().replace(province_dic).str.replace("Del", 'del')
print(sorted(schools["Province"].unique()))

In [None]:
province_data = schools.groupby("Province")["Total_Enro", "Total_Inst"].sum().reset_index()
province_data

Now that the data is clean, we can merge them together. 

In [None]:
# Complete the code
merged_data = 
merged_data

Once you have the final dataframe, plotting in geopandas is easy. You simply need to add arguments to change the colors.

In [None]:
# set a variable that will call whatever column we want to visualise on the map
variable = 'Total_Enro'
# set the range for the choropleth
vmin, vmax = merged_data["Total_Enro"].min(), merged_data["Total_Enro"].max()
# create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(15, 10))

# Complete the code
merged_data.plot(column=variable, cmap='Oranges', linewidth=0.8, ax=ax, edgecolor='0.8', vmin=vmin, vmax=vmax)

sm = plt.cm.ScalarMappable(cmap='Oranges', norm=plt.Normalize(vmin=vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm)

In [None]:
merged_data["st_ratio"] = merged_data["Total_Enro"]/merged_data["Total_Inst"]

In [None]:
# Complete the code

# set a variable that will call whatever column we want to visualise on the map
variable = 
# set the range for the choropleth
vmin, vmax = 
# create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(15, 10))

____.plot()

sm = 
sm._A = []
cbar = fig.colorbar(sm)

## Folium

Folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map.

Again, let's import the library first.

In [None]:
import folium

An interesting feature of folium is that it already has maps built-in the package. When you type, `folium.Map`, it will prepare for you a basemap that you can edit.

In [None]:
# Coordinates to show
map_center = [14.583197, 121.051538]

# Styling the map
mymap = folium.Map(location=map_center, height=700, width=1000, tiles="OpenStreetMap", zoom_start=14)
mymap

To add points to the map, simple use `folium.Marker` and `.add_to()`

In [None]:
# Coordinate of point
marker_coords = [14.583197, 121.051538]

# Overlay point in map
folium.Marker(marker_coords).add_to(mymap)
mymap

Let's explore adding more than one point. Let's try to plot all the schools in Quezon City. To do so, let's get a subset containing only schools in Quezon City.

***Be careful though! Adding too many points can crash your notebook***

In [None]:
city = "Quezon City"

df_city = schools[schools["Division"]==city]
df_city

Now that we have this subset, we just need to loop the values inside it and add it to the map one by one.

In [None]:
# Complete the code

for i in np.arange(len(df_city)):
    lat = df_city["y"].values[i]
    lon = df_city["x"].values[i]
    name = df_city["School"].values[i]
    folium.Marker([_____ popup=_____)._________
    
mymap

When we have multiple points, there is a tendency for them to overlap depending on your zoom level. One way to handle this is to cluster the points together. You can import `MarkerCluster` for this.

In [None]:
from folium.plugins import MarkerCluster
mymap_cluster = folium.Map(location=map_center, height=700, width=1000, tiles="OpenStreetMap", zoom_start=13)
marker_cluster = MarkerCluster().add_to(mymap_cluster)

The syntax for cluster is very similar to the previous map. However, instead of adding it to the map directly, we add it to the `MarkerCluster` variable.

In [None]:
# Complete the code

for i in np.arange(len(df_city)):
    lat = df_city["y"].values[i]
    lon = df_city["x"].values[i]
    name = df_city["School"].values[i]
    ____________________________________
    
mymap_cluster

There you have it! Try to do the same but for another city.