# EEC2006 - Data Science
# Project #2  Choropleth map: population of northeast states of Brazil
## Alex Furtunato
## Victor Hugo - 20171003230

## 1. Introduction

In DataScience, as in most fields of business or science, the presentation of the data is very important, through that presentation the author transmit the information that those data carry to the readers. Therefore, new libraries are constantly being developed, specially in python, in order to address more efficient forms of presentation.

In this notebook, we will show how to use thematic maps, more specifically a choropleth map, through the folium library. A thematic map "is a type of map specifically designed to show a particular theme connected with a specific geographic area" and the choropleth map "is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map".

We will use a choropleth map to analyze the population estimative of all the cities of the northeast states of Brazil in 2017.

This notebook is organized as follow. In section 2 we describe the dataset used for the population of those cities. In section 3 we describe the GeoJSON files that defines the perimeter of all the cities of the northeast states of Brazil. In section 4 we introduce the use of the folium library, with a very basic example of how to draw a map. In section 5 we show two forms of how to draw a choropleth map (with folium library), starting only with the RN state and then drawing the whole northeast.

## 2. Population dataset

The dataset comes from [IBGE - Instituto Brasileiro de Geografia e Estatística](https://downloads.ibge.gov.br/downloads_estatisticas.htm). It's a .csv file with the population of all the cities of Brazil and it has the following columns:

- <span style="background-color: #F9EBEA; color:##C0392B">UF</span>:
Contains the initials of the state that the city belongs.

- <span style="background-color: #F9EBEA; color:##C0392B">COD. UF</span>: 
Contains the code of the state, as a float number with 2 digits.

- <span style="background-color: #F9EBEA; color:##C0392B">COD. MUNIC</span>:
Contains the code of the city, as a float number with a maximum of 5 digits.

- <span style="background-color: #F9EBEA; color:##C0392B">NOME DO MUNICÍPIO</span>:
Contains the name of the city.

- <span style="background-color: #F9EBEA; color:##C0392B">POPULAÇÃO ESTIMADA</span>:
Contains the population of the city, as a float number.

In [1]:
import os
import folium
import json
import pandas as pd
import numpy as np
from shapely.geometry import Polygon
from shapely.geometry import Point

In [2]:
# dataset name
dataset_pop_2017 = os.path.join('data', 'population_2017.csv')

# read the data to a dataframe
data2017 = pd.read_csv(dataset_pop_2017)

# replace spaces for underlines in name of columns
# this is useful to access the column values as 'propertiers' 
# example: print(data2017.COD._UF) will print the values of the column 'COD. UF'
data2017.columns = [cols.replace(' ', '_') for cols in data2017.columns]

#print the first five rows
data2017.head()

Unnamed: 0,UF,COD._UF,COD._MUNIC,NOME_DO_MUNICÍPIO,POPULAÇÃO_ESTIMADA
0,RO,11.0,15.0,Alta Floresta D'Oeste,25437.0
1,RO,11.0,23.0,Ariquemes,107345.0
2,RO,11.0,31.0,Cabixi,6224.0
3,RO,11.0,49.0,Cacoal,88507.0
4,RO,11.0,56.0,Cerejeiras,17934.0


In our first example of the use of folium, we will draw only the map of the RN state. So, because of this, we are filtering the population data to only the RN state.

In [3]:
# filtering data about RN state
dataRN = data2017[data2017['UF'] == 'RN']

# sort dataset by city name
dataRN = dataRN.sort_values('NOME_DO_MUNICÍPIO')

#print first five rows
dataRN.head()


Unnamed: 0,UF,COD._UF,COD._MUNIC,NOME_DO_MUNICÍPIO,POPULAÇÃO_ESTIMADA
1075,RN,24.0,109.0,Acari,11333.0
1077,RN,24.0,307.0,Afonso Bezerra,11211.0
1079,RN,24.0,505.0,Alexandria,13827.0
1080,RN,24.0,604.0,Almino Afonso,4854.0
1081,RN,24.0,703.0,Alto do Rodrigues,14365.0


## 3. Geodata 

In order to draw a thematic map, as we defined early, we need to specify the geographic areas. A common way to do that is with [GeoJSON](http://geojson.org/) files, which has a defined structure to represent the perimeters of the geographic areas and also his properties.

In our case, we will use the GeoJSON files of the states of Brazil of the [Geodata BR - Brasil](http://geojson.org/) project. They have a file for each state with all the areas of all the cities.

A GeoJSON file is typically a python dictionary. The files used in this project has the following structure.

#### Keys:

##### type:
The value of this key is 'FeatureCollection', indicating that the other key contains a collection (list) of features.

##### features:
The value of this key is a list of features. Each feature represent one city and has the structure defined in the GeoJSON specification, which is also a typical python dictionary with the following structure:

- A <span style="background-color: #F9EBEA; color:##C0392B">geometry</span> key that holds the geometry of the perimeter of the geographic area as a list of points;

- A <span style="background-color: #F9EBEA; color:##C0392B">properties</span> key that holds three informations:
    - description:
    Is the name of the city.
    - id:
    Is the identification code of the city. The value is the concatenation of the UF id (the same value of the <span style="background-color: #F9EBEA; color:##C0392B">COD. UF</span> in the IBGE dataset), with 2 digits, and the city id (the same value of the <span style="background-color: #F9EBEA; color:##C0392B">COD. MUNIC</span> in the IBGE dataset), with 5 digits.
    - name:
    Is the name of the city, the same value of the description.

In [4]:
# searching the files in geojson/geojs-xx-mun.json
# where xx is the code of the state (same as COD.UF of the IBGE dataset)
# the code of RN is 24
geo_json_rn_path = os.path.join('geojson', 'geojs-24-mun.json')

# load the data and use 'latin-1'encoding because the accent
geo_json_data_rn = json.load(open(geo_json_rn_path,encoding='latin-1'))

# print the keys of the GeoJSON file
print(geo_json_data_rn.keys())
# print the value of the 'type' key
print(geo_json_data_rn['type'])
# print the value of the 'features' key
geo_json_data_rn['features']

dict_keys(['type', 'features'])
FeatureCollection


[{'geometry': {'coordinates': [[[-36.6752824479, -6.2695704427],
     [-36.6721661976, -6.2748710057],
     [-36.6621971359, -6.2781206182],
     [-36.6544080838, -6.2718175581],
     [-36.6302770363, -6.2681148661],
     [-36.625658466, -6.2854823428],
     [-36.6151351174, -6.292907263],
     [-36.6042576558, -6.2899699655],
     [-36.595723618, -6.2878348922],
     [-36.592155387, -6.2965905174],
     [-36.5839051708, -6.300143773],
     [-36.5844180215, -6.3059773274],
     [-36.5641014276, -6.3124566042],
     [-36.5596074435, -6.3270377373],
     [-36.5544893832, -6.3304039133],
     [-36.5528973655, -6.3394789997],
     [-36.5476492661, -6.3439752057],
     [-36.5505221875, -6.3536185214],
     [-36.5505219847, -6.3614560301],
     [-36.545503695, -6.3634059823],
     [-36.5420902739, -6.3725037009],
     [-36.517493878, -6.3778133912],
     [-36.5040431445, -6.3860516598],
     [-36.5071574456, -6.3997173571],
     [-36.5306368773, -6.4516718256],
     [-36.5304905779, -6.47143

## 4. The folium library

In this notebook we will use the [folium](https://github.com/python-visualization/folium) library of python to make our thematic maps. This library basically brings the leaflet.js, a JavaScript library, to the python ecosystem. 

We can say that this library is a work in progress and are being constatly updated. In 02/11/2017, the oficial version is 0.5.0, but the 0.6.0 is already in development mode and has some interesting features, such as tooltip for markers.

In the code below, we show the basic example of drawing a map and 'import' a GeoJSON file to it.

The `folium.Map()` method receives these basically parameters:

- **location:**
Is the coordinates of the center of the map that will be first displayed.

- **zoom_start:**
Is the level of the zoom that will be first displaye.

- **[tiles](http://wiki.openstreetmap.org/wiki/Tiles):**
Is the style of the layer of the map. The folium library supports many tiles, including custom tilesets. The default is <span style="background-color: #F9EBEA; color:##C0392B">OpenStreetMap</span>. We will use the <span style="background-color: #F9EBEA; color:##C0392B">Stamen Terrain</span> in our maps, because they show the geography of the area and we thought that this would be more suitable to our objective. Another tile that is much representative is the <span style="background-color: #F9EBEA; color:##C0392B">Mapbox Control Room</span>, which shows green areas varying his size according with the population.

In [5]:
# Create a map object
m = folium.Map(
    location=[-5.826592, -35.212558],
    zoom_start=7,
    tiles='Stamen Terrain'
)

# Configure geojson layer
folium.GeoJson(geo_json_data_rn).add_to(m)

m

In [None]:
# Create a map object
m = folium.Map(
    location=[-5.826592, -35.212558],
    zoom_start=7,
    tiles='OpenStreetMap'
)

m

In [None]:
# Create a map object
m = folium.Map(
    location=[-5.826592, -35.212558],
    zoom_start=7,
    tiles='Mapbox Control Room'
)

m

## 5. Drawing the choropleth map

Now that we know how to import a GeoJSON file into a map, we can draw our choropleth map. There is two basic ways of doing that:

- Customizing the color (and other properties) of each feature of the GeoJSON file with the [styling function](http://python-visualization.github.io/folium/docs-master/quickstart.html#Styling-function) in the `folium.GeoJson()` method; and

- Using the `folium.choropleth()` method and setting the configurations of it by their parameters.


### 5.1. Using the `folium.GeoJson()` method

First we need to create a function that maps one value to a RGB color (of the form <span style="background-color: #F9EBEA; color:##C0392B">#RRGGBB</span>). For this, we'll use <span style="background-color: #F9EBEA; color:##C0392B">colormap tools</span> from `folium.colormap`.

The `linear.color.scale()` method creates a linear scale of the specified color according the given parameters, where the first is the <span style="background-color: #F9EBEA; color:##C0392B">minimum value</span> and the second is the <span style="background-color: #F9EBEA; color:##C0392B">maximum value</span>.

In [6]:
from branca.colormap import linear

# colormap yellow and green (YlGn)
colormap_rn = linear.YlGn.scale(
    dataRN.POPULAÇÃO_ESTIMADA.min(),
    dataRN.POPULAÇÃO_ESTIMADA.max())

colormap_rn

After that, we need to convert our dataset into a dictionnary, in order to map a feature (the city name) to it's value (in our case, the population). Therefore, our dictionnary has the name of the city as the key and the population of it as the value.

In [7]:
population_dict_rn = dataRN.set_index('NOME_DO_MUNICÍPIO')['POPULAÇÃO_ESTIMADA']

Now, we have all the parameters to pass in the `folium.GeoJson()` method to transform it in a choropleth map. And we can call it like this

```python
folium.GeoJson(
    geo_json_data_rn,
    name='Population estimation of RN State in 2017',
    style_function=lambda feature: {
        'fillColor': colormap_rn(population_dict_rn[feature['properties']['description']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '5, 5',
        'fillOpacity': 0.4,
    }
).add_to(m)
```

where,

- **`geo_json_data_rn`:** contains the GeoJSON file to draw in the map;

- **`name`:** is the name of the layer;

- **`style_function`:** pass a function to be evaluated for each feature (city) of the GeoJSON file. We can use the form `lambda feature: {}` to pass the function. In our case, we pass as the key of our dictionary the name of the city stored in `feature['properties']['description']`, which return the population of this city and use it as the index of our colormap previously defined. The color of this colormap is set to fill the polygon that defines the city. The others properties are static and customize the map.

At last, we need to explicit insert the legend of the map. We do thi by adding the colormap to our map with `colormap_rn.add_to(m)`.


In [None]:
# Create a map object
m = folium.Map(
    location=[-5.826592, -35.212558],
    zoom_start=7,
    tiles='Stamen Terrain'
)

# customize the GeoJSON layer in order to make a choropleth map
folium.GeoJson(
    geo_json_data_rn,
    name='Population estimation of RN State in 2017',
    style_function=lambda feature: {
        'fillColor': colormap_rn(population_dict_rn[feature['properties']['description']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '5, 5',
        'fillOpacity': 0.4,
    }
).add_to(m)

# add a legend
colormap_rn.caption = 'Population estimation of RN State in 2017'
colormap_rn.add_to(m)

# add a layer control
folium.LayerControl().add_to(m)

# print the map
m

As we can see, there is a error in the execution of the code above, because the city <span style="background-color: #F9EBEA; color:##C0392B">Presidente Juscelino</span> doesn't exist in the dataset of the IBGE and therefore in the dictionary that we create. In this case, we are passing a key to the dictionary that doesn't exist.

This happens because the city <span style="background-color: #F9EBEA; color:##C0392B">Presidente Juscelino</span> changed his name to Serra Caiada. In this case, we change the description of the feature to match the name of the dataset.

In [9]:
# http://cidades.ibge.gov.br/painel/historico.php?codmun=241030
# Presidente Juscelino city changes your name to Serra Caiada
geo_json_data_rn['features'][112]['properties']['description'] = 'Serra Caiada'
geo_json_data_rn['features'][112]['properties']['name'] = 'Serra Caiada'

In [11]:
# Create a map object
m = folium.Map(
    location=[-5.826592, -35.212558],
    zoom_start=7,
    tiles='Stamen Terrain'
)

# customize the GeoJSON layer in order to make a choropleth map
folium.GeoJson(
    geo_json_data_rn,
    name='Population estimation of RN State in 2017',
    style_function=lambda feature: {
        'fillColor': colormap_rn(population_dict_rn[feature['properties']['description']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '5, 5',
        'fillOpacity': 0.4,
    }
).add_to(m)

# add a legend
colormap_rn.caption = 'Population estimation of RN State in 2017'
colormap_rn.add_to(m)

# add a layer control
folium.LayerControl().add_to(m)

# print the map
m

### 5.2. Using the `map.choropleth()` method

In this subsection we will show how to use the `map.choropleth()` method to draw a choropleth map. In this time we will draw all the states of the northeast of Brazil, considering a different choropleth map to each state, which means that each state will have a different reference of colors and scales, which also leads to the need of a different legend for each state.

The `map.choropleth()` method has a lot of options of parameters, where the full description can be found in the [documentation](http://python-visualization.github.io/folium/docs-master/modules.html#module-folium.map). Below, we describe the parameters used in this notebook.

```python
m.choropleth(
    geo_data=geo_json_data[state],
    name='Population estimation of ' + state + ' State in 2017',
    data=data[state],
    columns=['NOME_DO_MUNICÍPIO', 'POPULAÇÃO_ESTIMADA'],
    key_on='feature.properties.description',
    fill_color= fill_colors[j],
    legend_name='Population estimation of ' + state + ' State in 2017',
    highlight=True,
    threshold_scale = threshold_scales[state]
)
```

- **geo_data**: the GeoJSON file to draw the choropleth;
- **name**: name of the layer;
- **data**: data to bind to the GeoJSON, as a Pandas Dataframe or Series;
- **colums**: set the columns to be bound. By definition, the first column acts as the key, while the second column is the value;
- **key_on**: variable in the GeoJSON file to bind the data to. Must always start with ‘feature’ and be in JavaScript objection notation. In our case, the key is the name of the city, which is located in `feature.properties.description`;
- **fill_color**: area fill color;
- **legend_name**: the caption of the data legend;
- **highlight**: true means that when we pass the mouse across the geometry of the feature, this one will be highlighted;
- **threshold_scale**: data range for D3 threshold scale. The length of the threshold_scale can be a maximun of 6, which means that we will have 6 scales of colors in the choropleth map. Act's similarly as the definition of the colormap in the `folium.GeoJson()` method. 

In [None]:
# list of all the states of the northeast of Brazil
states = ['MA', 'PI', 'CE', 'RN', 'PB', 'PE', 'AL', 'SE', 'BA']

# creation of a dictionary containing the IBGE dataset for each state (with the UF being the key) 
data = {}
for i in states:
    data[i] = data2017[data2017['UF'] == i]


In [None]:
# list of colors pallets to use in each choropleth map (one for each state)
fill_colors = ['BuGn', 'OrRd', 'PuBu', 'GnBu', 'OrRd', 'BuGn', 'PuBu', 'GnBu', 'PuBuGn']

In [None]:
# creation of a dictionary containing the GeoJSON data for each state (with the UF being the key) 
geo_json_data = {}
j = 21
for i in states:
    filename = 'geojs-' + str(j) +'-mun.json'
    path = os.path.join('geojson', filename)  
    
    geo_json_data[i] = json.load(open(path, encoding='latin-1'))
    
    j = j+1
    


As can be observed in the description of the parameters of the `map.choropleth()` method, we basically don't need to do any manipulation of our data, only create a threshold_scale in order to scale the colors to be filled in the choropleth map.

In [None]:
# creation of a dictionary containing the threshold_scale for each state (with the UF being the key) 
threshold_scales = {}
for i in states:
    threshold_scales[i] = np.linspace(data[i]['POPULAÇÃO_ESTIMADA'].min(),
                              data[i]['POPULAÇÃO_ESTIMADA'].max(), 6, dtype=int).tolist()


In [None]:
# Create a map object
m = folium.Map(
    location = [-5.826592, -35.212558],
    zoom_start = 4,
    tiles='Stamen Terrain'
)

# Draw a choropleth map for each state
for j, state in enumerate(states):
    m.choropleth(
        geo_data=geo_json_data[state],
        name='Population estimation of ' + state + ' State in 2017',
        data=data[state],
        columns=['NOME_DO_MUNICÍPIO', 'POPULAÇÃO_ESTIMADA'],
        key_on='feature.properties.description',
        fill_color= fill_colors[j],
        legend_name='Population estimation of ' + state + ' State in 2017',
        highlight=True,
        threshold_scale = threshold_scales[state]
    )
    
    # Draw a CircleMarker inside each city. This would help in the identification of the cities.
    # Unfortunately, when we put this code, the output map doesn't get drawned 
    '''for city in geo_json_data[state]['features']:
        # get the name of neighborhood
        name = city['properties']['description']
        # take the coordinates (lat,log) of neighborhood
        geom = city['geometry']['coordinates']
        # create a polygon using all coordinates
        polygon = Polygon(geom[0])

        folium.CircleMarker([polygon.centroid.y, polygon.centroid.x],
                    radius=2,
                    popup=name,
                    tooltip=name,
                    color='red').add_to(m)'''

    

In [None]:
# add a layer control
folium.LayerControl().add_to(m)

# print the map
m   

As can be observed in the map drawned above, the many legends made difficult to analyze the choropleth map. The `folium library` doesn't allow (at least yet) the customization of the legends. Therefore, it just better to draw a unified choropleth map (with only one scale of colors) or separated choropleth maps.

Also, we didn't correct the incompatibility of the name of the city of Presidente Juscelino / Serra Caiada but there were no errors in the execution of the code above and the map was fully drawned. This happens because we don't have to create a dictionary anymore, so, there is no reference to a inexistent key.

Even though, we know there is at least one incompatibility in the bound of data of the GeoJSON file and the Dataframe. So the question is: what happens with the color of the Presidente Juscelino city, if there is no equivalent value of their population in the dataframe passed?

To help to solve this question, we add a CircleMarker in all the cities that we discover that has a difference in the names between the GeoJSON file and the dataset of the IBGE.

In [None]:
# draw a CircleMarker in the cities of 'RN' that has divergences between the GeoJSON and the dataset
for city in geo_json_data['RN']['features']:
    if city['properties']['description'] == 'Presidente Juscelino':
        name = city['properties']['description']
        geom = city['geometry']['coordinates']
        polygon = Polygon(geom[0])
        
        folium.CircleMarker(
            location=[polygon.centroid.y, polygon.centroid.x],
            radius=1.5,
            popup=name,
            color='red'
            ).add_to(m)

# draw a CircleMarker in the cities of 'PB' that has divergences between the GeoJSON and the dataset
for city in geo_json_data['PB']['features']:
    if (city['properties']['description'] == 'Quixabá' or 
        city['properties']['description'] == 'Santarém' or 
        city['properties']['description'] == 'Seridó' or 
        city['properties']['description'] == 'Campo de Santana'):
        
        name = city['properties']['description']
        geom = city['geometry']['coordinates']
        polygon = Polygon(geom[0])
        
        folium.CircleMarker(
            location=[polygon.centroid.y, polygon.centroid.x],
            radius=1.5,
            popup=name,
            color='red'
            ).add_to(m)
        
# draw a CircleMarker in the cities of 'PE' that has divergences between the GeoJSON and the dataset
for city in geo_json_data['PE']['features']:
    if (city['properties']['description'] == 'Belém de São Francisco' or 
        city['properties']['description'] == 'Iguaraci' or 
        city['properties']['description'] == 'Lagoa do Itaenga'):
        
        name = city['properties']['description']
        geom = city['geometry']['coordinates']
        polygon = Polygon(geom[0])
        
        folium.CircleMarker(
            location=[polygon.centroid.y, polygon.centroid.x],
            radius=1.5,
            popup=name,
            color='red'
            ).add_to(m)

In [None]:
# print the map
m

It's possible to see that all the cities that have the CircleMarker (have a incompatibility of the city name between the GeoJSON file and the dataset) is filled with the same default color, independent of the pallet of his state. Therefore, one possible presumption is that when a feature doesn't bound with the data passed, his fill color is not applied. Other possibility is that is assumed a zero value, but once the minimun value of the threshold_scale is greater than one, then the fill color remains the default one.

Either way, it's safe to conclude that for the `map.choropleth()` method, the user needs to ensure that all the features has corresponding matchs in the data passed.

Therefore, to elimate the issue of bad visualization because the number of legends and the incompatibility of names between the GeoJSON file an the dataset of the IBGE, we will draw another choropleth map, using the `folium.GeoJson()` method.

In this case, we use the GeoJSON of the Brazil as the base to create another GeoJSON file with all the cities of the northeast, using the first two digits of the `properties.id`, which correspond to the ID of the state.

In [None]:
# searching the files in geojson/geojs-xx-mun.json
br_path = os.path.join('geojson', 'geojs-100-mun.json')

# load the data and use 'latin-1'encoding because the accent
geo_json_br = json.load(open(br_path,encoding='latin-1'))

In [None]:
# creates a list containing all the cities of the northeast
# this is verified by the first two digits of the properties.id
cities = []
for city in geo_json_br['features']:
    if (city['properties']['id'][:2] == '21' or 
        city['properties']['id'][:2] == '22' or 
        city['properties']['id'][:2] == '23' or 
        city['properties']['id'][:2] == '24' or 
        city['properties']['id'][:2] == '25' or 
        city['properties']['id'][:2] == '26' or
        city['properties']['id'][:2] == '27' or 
        city['properties']['id'][:2] == '28' or 
        city['properties']['id'][:2] == '29' ):
        
        cities.append(city)

# creates a dictionary and insert all the cities as the value of the key 'features'
# this is the format of a GeoJSON file
geo_json_NE = {}        
geo_json_NE['features'] = cities


We also create only one dataframe with the population of all the cities of the northeastern states of Brazil.

In [None]:
# filters the dataset of the IBGE for the population estimation of all the northeast states
dataNE = data2017[ (( data2017['UF'] == 'MA' ) | 
                    ( data2017['UF'] == 'PI' ) | 
                    ( data2017['UF'] == 'CE' ) | 
                    ( data2017['UF'] == 'RN' ) | 
                    ( data2017['UF'] == 'PB' ) | 
                    ( data2017['UF'] == 'PE' ) | 
                    ( data2017['UF'] == 'AL' ) | 
                    ( data2017['UF'] == 'SE' ) | 
                    ( data2017['UF'] == 'BA' )) ]

For eliminate the problems with the incompatibility of names of the cities between the GeoJSON file and the dataframe, we create a new column in the dataframe that correspond to the full `id` of the city, with the same sintax of the `id` of the GeoJSON file, which is the concatenation of the <span style="background-color: #F9EBEA; color:##C0392B">state id (COD._UF) and the city id (COD._MUNIC)</span>, with 7 digits. Therefore, the bound between the GeoJSON file and the dataframe can now be the `id`.

In [None]:
# function that concatenate the 'COD._UF' and the 'COD._MUNIC' 
# in order to make the 'id' with the same sintax of the GeoJSON file
def column_id(row):
    return str(int(row['COD._UF'])) + str(int(row['COD._MUNIC'])).zfill(5)

# creates the column 'id' in the dataframe, ir order to bound with the GeoJSON file
dataNE['id'] = dataNE.apply(column_id, 'columns')

In [None]:
# print the first five rows of the new dataNE
dataNE.head()

Now we create the dictionary with the new column `id` being the key.

In [None]:
population_dict_NE = dataNE.set_index('id')['POPULAÇÃO_ESTIMADA']

In [None]:
colormap_NE = linear.YlOrRd.scale(
                dataNE.POPULAÇÃO_ESTIMADA.min(),
                dataNE.POPULAÇÃO_ESTIMADA.max() )

In [None]:
m = folium.Map(
    location=[-5.826592, -35.212558],
    zoom_start=6,
    tiles='Stamen Terrain'
)

folium.GeoJson(
    geo_json_NE,
    name='Population of northeastern Brazilian states in 2017',
    style_function=lambda feature: {
        'fillColor': colormap_NE(population_dict_NE[feature['properties']['id']]),
        'color': 'black',
        'weight': 1,
        'dashArray': '3, 3',
        'fillOpacity': 0.7,
    }
).add_to(m)

# Draw a CircleMarker inside each city. This would help in the identification of the cities.
# Unfortunately, when we put this code, the output map doesn't get drawned 
'''for city in geo_json_NE['features']:
    # get the name of the city
    name = city['properties']['description']
    # take the coordinates of the city
    geom = city['geometry']['coordinates']
    # create a polygon using all coordinates
    polygon = Polygon(geom[0])

    folium.CircleMarker([polygon.centroid.y, polygon.centroid.x],
                radius=2,
                popup=name,
                tooltip=name,
                color='red').add_to(m)'''

colormap_NE.caption = 'Population of northeastern Brazilian states in 2017'
colormap_NE.add_to(m)


# add a layer control
folium.LayerControl().add_to(m)

# print the map
m   

## 6. Conclusion

As can be noted in this notebook, the creation of a choropleth map can be really ease with the use of the `folium library`, although there is a few tricky issues that the user needs to be aware. 

As for the visualization of the data itself, with the aid of the choropleth map, the analysis of the population of a region can be really straightfoward, as it is easily noted that the capitals of all the states has the biggest population, specially Salvador and Fortaleza. But considering there is no hint capabilities (that could tell the name of the city as the mouse moves over it), the user could face trouble to identify each city. A workaround of this, could be insert a Marker in every city that would display the name of the city when the user click's on it, but when we inserted this Marker the output doesn't show up in the notebook. Once that problem could be related to the computer configuration, the code for the Markers is commented in this notebook, allowing that someone's else try to insert those Markers.