# Lab 6 - Mapping data

In this lab, we will use the [folium package](https://python-visualization.github.io/folium/quickstart.html) to create maps with markers on them and to make [choropleth maps](https://en.wikipedia.org/wiki/Choropleth_map).

### Section 1:  Installing the folium package

Follow the appropriate instructions for how you are running Python and Jupyter notebooks.

#### Jupyter Hub on Lehman 360: 
Run the following code and wait until the \[*\] beside the cell changes to a number to continue (may take 5-10 min; if takes longer, try restarting the notebook)

In [4]:
!pip install --user folium



#### Google Colab
Folium should be installed, but there are other problems with getting the maps to display and uploading files.  Instead, use Jupyter Hub on Lehman 360 for this lab.

#### Anaconda on your own computer

1. In Anaconda Navigator, click on Environment in the left menu.
2. Select "Not installed" in the list.
3. Select folium and click "Apply" in the pop-up box.

[Instructions with images](https://docs.anaconda.com/anaconda/navigator/tutorials/manage-packages/#installing-a-package)

If you can't get folium installed, let me know what you have tried and any error messages as soon as possible, so we can figure out how to get it installed.

Now import folium and pandas.

In [1]:
import folium
import pandas as pd
%matplotlib inline

### Section 2:  Folium maps and markers

To create a folium map, we just need to provide a latitude and longitude.  Latitude specifies the north-south position on the globe, with north of the equator being positive and south of the equator being negative.  Longitude specifies the east-west position on the globe, relative to the meridian at Greenwich, UK.

In [3]:
folium.Map(location=[40.8747, -73.8951])

What is shown in the map?

We can also save the map as a variable and display it that way:

In [44]:
m = folium.Map(location=[40.8747, -73.8951])
m

Let's put a marker on the map `m` at Lehman College, which is located at 40.8733° N, 73.8941° W.

In [45]:
folium.Marker([40.8747, -73.8951], 
              popup="Lehman College", 
              tooltip="Click me!").add_to(m)
m

What does the marker look like?

Try clicking on the marker.  In the `Marker()` function, what does the parameter `popup` do?  What does the parameter `tooltip` does?

Add another marker for City College, which is located at 40.8200° N, 73.9493° W.  Use either the popup or tooltip parameter (your choice) to label the marker as City College.

In [46]:
folium.Marker([40.8200, -73.9493], 
              popup="City College", 
              tooltip="Click me!").add_to(m)
m

We can also change the color and image on the marker.  For example, the following code marks the location of Hostos Community College with a red marker with an info sign on it.

In [54]:
folium.Marker([40.8506, -73.8770], 
              popup='Zoo', 
              tooltip="Click for info",
              icon=folium.Icon(color='pink')
             ).add_to(m)
m

You can usually find the lattitude and longitude coordinates for many New York landmarks by using the landmark name and the word coordinates in the same search.  Find the coordinates for another landmark in New York and add that marker to your map.  

### Section 3: Location of Recycling Bins

We'll now create a map of all public recycling bins in New York City from the data set of locations on NYC Open Data:  [https://data.cityofnewyork.us/Environment/Public-Recycling-Bins/sxx4-xhzg](https://data.cityofnewyork.us/Environment/Public-Recycling-Bins/sxx4-xhzg)

To download from the NYC Open Data site: 
    - click "View Data" (blue button in upper right)
    - on the next page, click "Export" (in menu in upper right)
    - click "CSV" to download
    
Or use this URL for the CSV file: [https://raw.githubusercontent.com/megan-owen/MAT328-Techniques_in_Data_Science/main/data/Public_Recycling_Bins.csv](https://raw.githubusercontent.com/megan-owen/MAT328-Techniques_in_Data_Science/main/data/Public_Recycling_Bins.csv)
    
Read the CSV file into a dataframe called `bins`.

In [9]:
bins = pd.read_csv("https://raw.githubusercontent.com/megan-owen/MAT328-Techniques_in_Data_Science/main/data/Public_Recycling_Bins.csv")

Is there any missing data?  

In [12]:
bins

Unnamed: 0,Borough,Site type,Park/Site Name,Address,Latitude,Longitude
0,Bronx,Subproperty,227th St. Plgd,E 227 St/Bronx River Pkway,40.890849,-73.864224
1,Bronx,Subproperty,Allerton Ballfields,Allerton Ave & Moshulu Pkway,40.848891,-73.877128
2,Bronx,Outdoor,Arthur Ave & E 187 St,Arthur Ave & 187 St,40.855570,-73.887565
3,Bronx,Outdoor,Barstow Mansion,"895 Shore Road, Pelham Bay Park",40.871864,-73.805549
4,Bronx,Subproperty,Bradley Playground,2001-2017 Bronx Park E,40.851889,-73.868549
...,...,...,...,...,...,...
540,Staten Island,Indoor,West Brighton Pool,899 Henderson Ave,40.637121,-74.119287
541,Staten Island,Outdoor,Willowbrook Park,Willowbrook Park,40.603832,-74.158697
542,Staten Island,Outdoor,Willowbrook Park,Willowbrook Park,40.603828,-74.161250
543,Staten Island,Outdoor,Wolfe's Pond,Wolfe's Pond,40.517368,-74.190913


In [14]:
bins.describe()

Unnamed: 0,Latitude,Longitude
count,544.0,544.0
mean,40.73746,-73.937484
std,0.08225,0.089999
min,40.505646,-74.235477
25%,40.694341,-73.997056
50%,40.729002,-73.945462
75%,40.804353,-73.877504
max,40.911295,-73.734291


In [15]:
bins.describe(include = ["O"])

Unnamed: 0,Borough,Site type,Park/Site Name,Address
count,545,545,545,540
unique,6,5,362,423
top,Manhattan,Outdoor,Hudson River Park,East River Bikeway
freq,184,305,14,12


There is some missing data, so let's drop all rows with an NaN value.  We have already covered this, but an easier way to drop all rows with at least one missing value is shown below.

In [16]:
bins = bins.dropna()

In [17]:
bins

Unnamed: 0,Borough,Site type,Park/Site Name,Address,Latitude,Longitude
0,Bronx,Subproperty,227th St. Plgd,E 227 St/Bronx River Pkway,40.890849,-73.864224
1,Bronx,Subproperty,Allerton Ballfields,Allerton Ave & Moshulu Pkway,40.848891,-73.877128
2,Bronx,Outdoor,Arthur Ave & E 187 St,Arthur Ave & 187 St,40.855570,-73.887565
3,Bronx,Outdoor,Barstow Mansion,"895 Shore Road, Pelham Bay Park",40.871864,-73.805549
4,Bronx,Subproperty,Bradley Playground,2001-2017 Bronx Park E,40.851889,-73.868549
...,...,...,...,...,...,...
540,Staten Island,Indoor,West Brighton Pool,899 Henderson Ave,40.637121,-74.119287
541,Staten Island,Outdoor,Willowbrook Park,Willowbrook Park,40.603832,-74.158697
542,Staten Island,Outdoor,Willowbrook Park,Willowbrook Park,40.603828,-74.161250
543,Staten Island,Outdoor,Wolfe's Pond,Wolfe's Pond,40.517368,-74.190913


In [18]:
bins = bins.reset_index(drop = True)

In [19]:
bins

Unnamed: 0,Borough,Site type,Park/Site Name,Address,Latitude,Longitude
0,Bronx,Subproperty,227th St. Plgd,E 227 St/Bronx River Pkway,40.890849,-73.864224
1,Bronx,Subproperty,Allerton Ballfields,Allerton Ave & Moshulu Pkway,40.848891,-73.877128
2,Bronx,Outdoor,Arthur Ave & E 187 St,Arthur Ave & 187 St,40.855570,-73.887565
3,Bronx,Outdoor,Barstow Mansion,"895 Shore Road, Pelham Bay Park",40.871864,-73.805549
4,Bronx,Subproperty,Bradley Playground,2001-2017 Bronx Park E,40.851889,-73.868549
...,...,...,...,...,...,...
534,Staten Island,Indoor,West Brighton Pool,899 Henderson Ave,40.637121,-74.119287
535,Staten Island,Outdoor,Willowbrook Park,Willowbrook Park,40.603832,-74.158697
536,Staten Island,Outdoor,Willowbrook Park,Willowbrook Park,40.603828,-74.161250
537,Staten Island,Outdoor,Wolfe's Pond,Wolfe's Pond,40.517368,-74.190913


Create a new folium map variable called `bins_map` centered at 40.7128° N, 74.0060° W (New York City coordinates).

In [20]:
bins_map = folium.Map([40.7128, -74.0060])
bins_map

Recall that in a loop, the counter counts from 0 to the range - 1.  In the example below, the counter is `i` and it will count from 0 to 4.

In [21]:
for i in range(5):
    print(i)

0
1
2
3
4


We can use this to loop through the rows in our dataframe `bins`.  Below we loop through the first 10 rows of `bins`, store the current row in the variable `row`, and use it to print the latitude and longitude.

In [22]:
for i in range(10):
    row = bins.loc[i]
    print("Coordinates: ", row["Latitude"], row["Longitude"] )

Coordinates:  40.890848989 -73.86422391800001
Coordinates:  40.8488907878 -73.8771283938
Coordinates:  40.85557 -73.88756499999998
Coordinates:  40.871864 -73.805549
Coordinates:  40.851889 -73.868549
Coordinates:  40.861526 -73.88065899999998
Coordinates:  40.860755 -73.88042299999998
Coordinates:  40.859644 -73.88047199999998
Coordinates:  40.862602 -73.880171
Coordinates:  40.826939 -73.922314


Now, let's plot the bins on the map.  First change the code below through all rows in `bins`, not just the first 10.We want to loop through all the rows (how can we find the number of rows?) and use the latitude and longitude from each row to create a new Marker for our map.

In [29]:
bins_map = folium.Map([40.7128, -74.0060])
for i in range(len(bins)):
    row = bins.loc[i]
    folium.Marker([row["Latitude"], row["Longitude"]],
                  tooltip = row["Park/Site Name"]
                 ).add_to(bins_map)
bins_map

Next, in this loop, replace the code 

`print("Coordinates: ", row["Latitude"], row["Longitude"] )`

with code that places a folium Marker at latitude `row["Latitude"]` and longitude `row["Longitude"]` on `bins_map`, and display the map.

Note:  You have to rerun the code creating `bins_map` if you make a mistake in adding the markers and want to clear them.

What's the closest bin to Lehman College?

There's a column in the `bins` dataframe called "Park\Site Name".  Can you re-plot the map of the bins, but using the value in this column as the tooltip text?

<details><summary>Answer:</summary>
<code>
for i in range(bins.shape[0]):
    row = bins.iloc[i]
    folium.Marker([row["Latitude"], row["Longitude"]],
                 tooltip = row["Park/Site Name"]
                 ).add_to(bins_map)
bins_map
</code>
</details>



### Section 4: Choropleth Maps

A choropleth map is a map with areas shaded or colored in proportion to the mean (or some other statistical variable) of some property of that area (like income, population density, etc.)

For our first choropleth map, we'll use the NYC school district boundaries, originally downloaded from NYC Open Data.

Download the GeoJSON file from [https://github.com/megan-owen/MAT328-Techniques_in_Data_Science/blob/main/data/nyc_school_districts.json](https://github.com/megan-owen/MAT328-Techniques_in_Data_Science/blob/main/data/nyc_school_districts.json)


Open Data NYC Planning:
    - go to [https://www1.nyc.gov/site/planning/data-maps/open-data/districts-download-metadata.page](https://www1.nyc.gov/site/planning/data-maps/open-data/districts-download-metadata.page)
    - scroll down to "School, Police, Health & Fire"
    - in the school district row, click the geoJSON button to download in that format.
    - save the file as nyc_school_district.json if necessary
    - upload to Jupyter Hub

The district math scores are at https://infohub.nyced.org/reports-and-policies/citywide-information-and-data/test-results  under Math Test Results 2013 to 2019.  They are only available as an Excel file, which then needs to be opened in Excel and the correct page saved as a CSV file.

Instead, download the CSV file directly from [http://comet.lehman.cuny.edu/owen/teaching/mat328/math_district.csv](http://comet.lehman.cuny.edu/owen/teaching/mat328/math_district.csv)

Create a New York City map called `school_map`:

In [30]:
school_map = folium.Map([40.7128, -74.0060])
school_map

Create a layer showing the school districts, and add it to your map.  You may get an warning that "IOPub data rate exceeded." and no map displayed.  This does not prevent us from saving the map in the next step and viewing it that way.

In [31]:
folium.Choropleth(geo_data ="nyc_school_districts.json",
                     fill_opacity=0.5, line_opacity=0.5
                     ).add_to(school_map)
school_map

To save the map as an HTML file in the same place (folder) as this lab.  You should be able to open the map in a web browser to view it.

In [32]:
school_map.save(outfile='testScores.html')

Let's color the districts by the mean 8th grade math score in 2018.  Read the math district scores CSV file into a dataframe called `all_scores`:

In [33]:
all_scores = pd.read_csv("http://comet.lehman.cuny.edu/owen/teaching/mat328/math_district.csv")

In [34]:
all_scores

Unnamed: 0,District,Grade,Year,Category,Number Tested,Mean Scale Score,# Level 1,% Level 1,# Level 2,% Level 2,# Level 3,% Level 3,# Level 4,% Level 4,# Level 3+4,% Level 3+4
0,1,3,2013,All Students,887,307,249,28.1,266,30.0,179,20.2,193,21.8,372,41.9
1,1,3,2014,All Students,845,308,225,26.6,223,26.4,204,24.1,193,22.8,397,47.0
2,1,3,2015,All Students,765,309,221,28.9,175,22.9,168,22.0,201,26.3,369,48.2
3,1,3,2016,All Students,743,314,184,24.8,178,24.0,141,19.0,240,32.3,381,51.3
4,1,3,2017,All Students,726,312,185,25.5,166,22.9,152,20.9,223,30.7,375,51.7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1563,32,All Grades,2015,All Students,6580,290,2859,43.4,2368,36.0,984,15.0,369,5.6,1353,20.6
1564,32,All Grades,2016,All Students,6226,289,2730,43.8,2236,35.9,898,14.4,362,5.8,1260,20.2
1565,32,All Grades,2017,All Students,5929,290,2624,44.3,1889,31.9,975,16.4,441,7.4,1416,23.9
1566,32,All Grades,2018,All Students,5384,594,2202,40.9,1575,29.3,1040,19.3,567,10.5,1607,29.8


Filter this dataset to only include scores from 2018 and 8th grade.  Call your new dataframe `scores_gr8_2018`:

In [38]:
y2018_filter = all_scores["Year"] == 2018
gr8_filter = all_scores["Grade"] == "8"
scores_gr8_2018 = all_scores[y2018_filter & gr8_filter]
scores_gr8_2018

Unnamed: 0,District,Grade,Year,Category,Number Tested,Mean Scale Score,# Level 1,% Level 1,# Level 2,% Level 2,# Level 3,% Level 3,# Level 4,% Level 4,# Level 3+4,% Level 3+4
40,1,8,2018,All Students,602,596,271,45.0,157,26.1,93,15.4,81,13.5,174,28.9
89,2,8,2018,All Students,1565,612,287,18.3,315,20.1,369,23.6,594,38.0,963,61.5
138,3,8,2018,All Students,621,594,284,45.7,224,36.1,69,11.1,44,7.1,113,18.2
187,4,8,2018,All Students,591,593,319,54.0,164,27.7,59,10.0,49,8.3,108,18.3
236,5,8,2018,All Students,693,589,431,62.2,183,26.4,59,8.5,20,2.9,79,11.4
285,6,8,2018,All Students,1340,599,523,39.0,426,31.8,217,16.2,174,13.0,391,29.2
334,7,8,2018,All Students,1118,590,637,57.0,325,29.1,98,8.8,58,5.2,156,14.0
383,8,8,2018,All Students,1917,596,868,45.3,589,30.7,256,13.4,204,10.6,460,24.0
432,9,8,2018,All Students,2258,594,1122,49.7,668,29.6,307,13.6,161,7.1,468,20.7
481,10,8,2018,All Students,3035,596,1400,46.1,927,30.5,453,14.9,255,8.4,708,23.3


In [37]:
all_scores.describe(include = ["O"])

Unnamed: 0,Grade,Category
count,1568,1568
unique,7,1
top,3,All Students
freq,224,1568


Reset the `school_map` by creating it again: 

In [39]:
school_map = folium.Map([40.7128, -74.0060])
school_map

To add the shading to the map:

In [41]:
#Create a layer, shaded by test scores:
folium.Choropleth(geo_data ="nyc_school_districts.json",
                     fill_color='YlGn',
                     data = scores_gr8_2018,
                     key_on="feature.properties.SchoolDist",
                     columns = ['District', 'Mean Scale Score'],
                     fill_opacity=0.4, line_opacity=0.5
                     ).add_to(school_map)
school_map

Save your new map to a different .html file.

In [42]:
school_map.save(outfile='testScores2.html')