# EEC2006 - Data Science
# Project - Uber x Choropleth: mean wait time for neighborhoods in Natal - RN

## Alex Furtunato
## Victor Hugo - 20171003230

## 1. Introduction

In this notebook we will analyze the mean wait time of Uber in all neighborhoods of Natal - RN. We are going to visualize that information through a choropleth map, which will give a intuitive insight of the mean wait time of the neighborhoods. For collecting the necessary data, we need to use the Uber API and also some other API to treat the coordinates that we are going to query about.

This notebook is organized as follow. The section 2 indicate the GeoJSON file that we are going to use. The next section gives a brief explanation of the Uber API. In section 4 we show how the dataset of the wait times were generated, explaining others APIs tested. The section 5 shows the choropleth maps generated and gives some theories about the findings.

In [1]:
import os
import folium
import json
import pandas as pd
from branca.colormap import linear
import numpy as np
from shapely.geometry import Polygon
from shapely.geometry import Point
from numpy import random

from uber_rides.session import Session
from uber_rides.client import UberRidesClient
import csv
import datetime as dt

## 2. GeoJSON - Neighborhoods of Natal - RN

In order to draw the choropleth map, the first thing that is necessary is the GeoJSON file of the area to be analyzed. To get that, we use the [Overpass turbo](http://overpass-turbo.eu/) and made a query to Natal - RN neighborhoods, with the code described bellow. The site allow the download of the output as a GeoJSON file. More explanations about the Overpass turbo Project can be found in their [wiki](http://wiki.openstreetmap.org/wiki/Overpass_turbo) page.

>```python
[out:json][timeout:25];
{{geocodeArea:Natal RN Brasil}}->.searchArea;
(
  relation["admin_level"="10"](area.searchArea);
);
out body;
>;
out skel qt;
```

The code below just import the GeoJSON file and prints some useful informations.

In [2]:
# import geojson file about natal neighborhoods
natal_neigh = os.path.join('geojson', 'natal.geojson')

# load the data and use 'UTF-8'encoding
geo_json_natal = json.load(open(natal_neigh,encoding='UTF-8'))

In [None]:
# print the keys of the dictionary
print(geo_json_natal.keys())
# print the list of features (neighborhoods)
geo_json_natal['features']

In [29]:
neighborhood = []
# list all neighborhoods
for neigh in geo_json_natal['features']:
        neighborhood.append(neigh['properties']['name'])

In [None]:
# print the number of neighborhoods
len(neighborhood)

In [None]:
# print all neighborhoods
neighborhood

Here is the map with the GeoJSON layer set to be our imported file. It's possible to confirm that our file is right.

In [None]:
# Create a map object
m = folium.Map(
    location=[-5.802592, -35.212558],
    zoom_start=12,
    tiles='OpenStreetMap'
)

# Configure geojson layer
folium.GeoJson(geo_json_natal).add_to(m)

# print the map
m

## 3. Uber API

Now the we already have our GeoJSON file with the defination of all the neighborhoods of Natal - RN, which makes possible to draw the map we want, we need the data itself of the mean wait times. To get those estimations, it's obvius we need to query for the Uber services. To do that, the Uber created his own [API](https://developer.uber.com/docs) and made available for the developers.

The initial steps for begging to use th Uber API is:

1. create user in https://developer.uber.com/
2. create an app in https://developer.uber.com/
3. install uber-rides package
>```python
!pip install uber-rides
```

Once you’ve created your app, you’ll be given a server_token, client_id, & client_secret. These are used to authenticate your application and the rider when calling the API.

In [20]:
from uber_rides.session import Session
from uber_rides.client import UberRidesClient

session = Session(server_token=keys['uber'])
client = UberRidesClient(session)

### 3.1. Price Estimates and Time Estimates

We will show how the Uber API works with the two basic function that is probably the most important ones: ***Price Estimates and Time Estimates***.

The [Price Estimates endpoint](https://developer.uber.com/docs/riders/references/api/v1.2/estimates-price-get) returns an estimated price range for each product offered at a given location. The requisition has the following form:

>```python
response = client.get_price_estimates(
    start_latitude=-5.8323,
    start_longitude=-35.2054,
    end_latitude= -5.8734,
    end_longitude=-35.1776,
    seat_count=2
)
```

where,

- **`start_latitude`**: (float) Latitude component of start location.
- **`start_longitude`**: (float) Longitude component of start location.
- **`end_latitude`**: (float) Latitude component of end location.
- **`end_longitude`**: (float) Longitude component of end location.
- **`seat_count(optional)`**: (int) The number of seats required for uberPOOL. Default and maximum value is 2.


The price estimate is provided as a formatted string with the full price range and the localized currency symbol. The response also includes low and high estimates, and the ISO 4217 currency code for situations requiring currency conversion. When surge is active for a particular product, its surge_multiplier will be greater than 1, but the price estimate already factors in this multiplier. 

Also, the response already has a atribute to put in the JSON format, which is very handy and has the following fields:

- **`product_id`**: (string) Unique identifier representing a specific product for a given latitude & longitude. For example, uberX in San Francisco will have a different product_id than uberX in Los Angeles.
- **`currency_code`**: (string) ISO 4217 currency code.
- **`display_name`**: (string) Display name of product.
- **`localized_display_name`**: (string) Localized display name of product.
- **`estimate`**: (string) Formatted string of estimate in local currency of the start location. Estimate could be a range, a single number (flat rate) or “Metered” for TAXI.
- **`minimum`**: (int) Minimum price for product.
- **`low_estimate`**: (int) Lower bound of the estimated price.
- **`high_estimate`**: (int) Upper bound of the estimated price.
- **`surge_multiplier`**: (float) Expected surge multiplier. Surge is active if surge_multiplier is greater than 1. Price estimate already factors in the surge multiplier.
- **`duration`**: (int) Expected activity duration (in seconds). Always show duration in minutes.
- **`distance`**: (float) Expected activity distance (in miles).


In [None]:
# makes a request to get the price estimate
response = client.get_price_estimates(
    start_latitude=-5.8323,
    start_longitude=-35.2054,
    end_latitude= -5.8734,
    end_longitude=-35.1776,
    seat_count=2
)

In [None]:
# print the type of the variable 'response'
type(response)

In [None]:
# print the json format of the response
response.json

In [None]:
# print the price estimate for each product
for price in response.json['prices']:
    print('Product: %s\nPrice estimate: %s\n' % (price['display_name'], price['estimate']))

The [Time Estimates endpoint](https://developer.uber.com/docs/riders/references/api/v1.2/estimates-time-get) returns ETAs (estimate time of arrivals) for all products currently available at a given location, with the ETA for each product expressed as integers in seconds. The requisition has the following form:

>```python
response = client.get_pickup_time_estimates( 
                            start_latitude=-5.8323, 
                            start_longitude=-35.2054,
                            product_id='65cb1829-9761-40f8-acc6-92d700fe2924'
                            )
```

where,

- **`start_latitude`**: (float) Latitude component of start location.
- **`start_longitude`**: (float) Longitude component of start location.
- **`product_id`** (optional): (string) Unique identifier representing a specific product for a given latitude & longitude.

If a product returned from `get_pickup_time_estimates()` is not returned from this endpoint for a given latitude/longitude pair then there are currently none of that product available to request. The UBER recommend that this endpoint be called every minute to provide the most accurate, up-to-date ETAs. In some markets, the list of products returned from this endpoint may vary by the time of day due to time restrictions on when that product may be utilized.

The response has the following fields:

- **`product_id`**: (string) Unique identifier representing a specific product for a given latitude & longitude. For example, uberX in San Francisco will have a different product_id than uberX in Los Angeles.
- **`localized_display_name`**: (string) Localized display name of product.
- **`display_name`**: (string) Display name of product.
- **`estimate`**: (int) ETA for the product (in seconds).


In [None]:
wait_time = client.get_pickup_time_estimates( 
                            start_latitude=-5.8323, 
                            start_longitude=-35.2054,
                            product_id='65cb1829-9761-40f8-acc6-92d700fe2924'
                            )
wait_time.json

## 4. Dataset

To have a representative dataset for the wait time of Uber in all neighborhoods of Natal we adopted the following assumptions:

- We must have data of all days of the week, including the weekend;
- We must have data of all periods of the day;
- We must have the most points as possible;
- We must have data of all the products of the Uber available for the given point;
- The points should be chosen in a random way inside each neighborhood;
- The points chosen should be a **valid** one. We define a point as **invalid** if it is obvious that it can't be reached by a Uber car, eg, inside the ocean.

### 4.1. Interval of the requests

The Uber API has a limitation of 2.000 requests per hour. Because of this, we need to chose a time interval between the periods of requests in order to obey our assumption to have the most points as possible and obey the limitation of the API.

We defined that for each period (defined by the group of requests for all the neighborhoods), it should be requested two points per neighborhood. For each point, we make a request to get the products available for that point and another request for each product available. Once that in Natal - RN it could be available the maximun of 2 products (UberX and UberSelect), we have 6 requests per neighborhood. Considering the fact that we have 36 neighborhoods, it's 216 requests per period. Therefore, in one hour, we can have the maximun of 9,25 periods in one hour, which would lead us to the 6,48 minutes of interval. To be safe, we chose a time interval between the periods of 7 minutes. 

In [None]:
from math import ceil

number_of_points_per_neigh = 2
max_requests_per_point = 3
max_requests_per_neigh = number_of_points_per_neigh * max_requests_per_point
print('Maximum number of request per neighborhood: ', max_requests_per_neigh)

number_neigh = len(neighborhood)
print('Number of neighborhoods: ', number_neigh)

max_request_per_period = number_neigh * max_requests_per_neigh
print('Maximum number of requests per period: ', max_request_per_period)

max_number_request_per_hour = 2000
max_number_periods_per_hour = max_number_request_per_hour / max_request_per_period
print('Maximum number of periods per hour: ', max_number_periods_per_hour)

interval_time = 60 / max_number_periods_per_hour
print('Chosen interval time (in minutes): ', ceil(interval_time))


### 4.2. Choice of points

If we simple choose random points inside each neigborhood, inevitably we will face some "invalid" points, eg, in the middle of a river or at the top of Morro do Careca. The code below show this happening.

In [None]:
# return a number of points inside the polygon
def generate_random(number, polygon, neighborhood):
    list_of_points = []
    minx, miny, maxx, maxy = polygon.bounds
    counter = 0
    while counter < number:
        x = random.uniform(minx, maxx)
        y = random.uniform(miny, maxy)
        pnt = Point(x, y)
        if polygon.contains(pnt):
            list_of_points.append([x,y,neighborhood])
            counter += 1
    return list_of_points

In [None]:
number_of_points = 10

# search all features
for feature in geo_json_natal['features']:
    # get the name of neighborhood
    neighborhood = feature['properties']['name']
    # take the coordinates (lat,log) of neighborhood
    geom = feature['geometry']['coordinates']
    # create a polygon using all coordinates
    polygon = Polygon(geom[0])
    # return number_of_points by neighborhood as a list [[log,lat],....]
    points = generate_random(number_of_points,polygon, neighborhood)
    # iterate over all points and print in the map
    for i,value in enumerate(points):
        log, lat, name = value 
        # Draw a small circle
        folium.CircleMarker([lat,log],
                    radius=2,
                    popup='%s %s%d' % (name, '#', i),
                   color='red').add_to(m)

# print the map
m

Although the Uber gives us a good documentation about his API, they don't explain what is expected to happen if we make a request for an "invalid" point. To investigate this scenario, we made some requests of a couple of known points and create markers in the map along with the wait time estimated.

It's possible to observe that the Uber API returns a time estimated even for points that area clearly in the middle of the ocean and that this time estimated is considerably larger then the closer point in a road. Therefore, it's safe to say that the Uber API don't get the closer valid point to obtain the estimated time. **In this case, we need a way to say if the random point obtained is a valid one.**

In the map printed below is also possible to note that for a "invalid" point close (<300 meters) to a valid one, the diference of the time estimate is irrelevant. This can be seen with the points \#5 and \#6.

In [None]:
!pip install geopy

In [8]:
# import the geopy library (https://pypi.python.org/pypi/geopy)
# for calculate the distance between two coordinates
from geopy.distance import vincenty

# two chosen points
point1 = [-5.876702, -35.176005]
point2 = [-5.863845, -35.148890]
# distance between two points, in meters
distance12 = vincenty(point1, point2).meters

# two chosen points
point3 = [-5.751032, -35.207708]
point4 = [-5.756198, -35.213887]
# distance between two points, in meters
distance34 = vincenty(point3, point4).meters

# two chosen points
point5 = [-5.783261, -35.247053]
point6 = [-5.783835, -35.249602]
# distance between two points, in meters
distance56 = vincenty(point5, point6).meters


points = [point1, point2, point3, point4, point5, point6]
distances = [0, distance12, 0, distance34, 0, distance56]
lat = 0
log = 1

In [None]:
# Create a map object
m = folium.Map(
    location=[-5.802592, -35.212558],
    zoom_start=12,
    tiles='OpenStreetMap'
)

# Configure geojson layer
folium.GeoJson(geo_json_natal).add_to(m)

for i, point in enumerate(points):
    # get the estimates times for each point
    wait_time = client.get_pickup_time_estimates( 
                            start_latitude=point[lat], 
                            start_longitude=point[log],
                            product_id='65cb1829-9761-40f8-acc6-92d700fe2924'
                            )

    popup = ('Point #' + str(i) + '<br>' +
             ('Aproximate distance to nearest road: %.2f meters' % distances[i]) + '<br>' +
             ('Wait time: %.2f seconds' % wait_time.json.get('times')[0]['estimate'])
            )
    
    # print a marker for each point with the corresponding wait time
    folium.CircleMarker([point[lat], point[log]],
                radius=2,
                popup=popup,
                color='red').add_to(m)

m

#### 4.2.1. Google Maps Roads API

The Google has an [Maps Road API](https://developers.google.com/maps/documentation/roads/intro) which has a method to inform the nearest road of a given point. The code below show an example of his usage.

But this API has a much more restrict limitation, only allowing 2.500 free requests per **day**, which already makes impossible the use of this API. Beyond that, this API only return a nearest road if the given point is already much cloer to one road. For example, for the point #6, which is about 300 meters away from a road, the `gmaps.nearest_roads()` don't return any road. **Therefore, this API doesn't fit to our needs.**

In [19]:
import googlemaps

keys = json.load(open('keys.json'))

gmaps = googlemaps.Client(key=keys['google'])

result1 = gmaps.nearest_roads((point2[0], point2[1]))
result2 = gmaps.nearest_roads((point4[0], point4[1]))
result3 = gmaps.nearest_roads((point6[0], point6[1]))

print(result1)
print(result2)
print(result3)

[]
[]
[]


#### 4.2.2. Project OSRM

The [Open Source Routing Machine Project](http://project-osrm.org/) keeps an [HTTP Server](https://github.com/Project-OSRM/osrm-backend/blob/master/docs/http.md) that answer to several kinds of requests. One of them is the `nearest service`. The description of this service is detailed below.

##### Nearest service

Snaps a coordinate to the street network and returns the nearest `n` matches.

```endpoint
GET http://{server}/nearest/v1/{profile}/{coordinates}.json?number={number}
```

Where `coordinates` only supports a single `{longitude},{latitude}` entry.

In addition to the [general options](#general-options) the following options are supported for this service:

|Option      |Values                        |Description                                         |
|------------|------------------------------|----------------------------------------------------|
|number      |`integer >= 1` (default `1`)  |Number of nearest segments that should be returned. |

**Response**

- `code` if the request was successful `Ok` otherwise see the service dependent and general status codes.
- `waypoints` array of `Waypoint` objects sorted by distance to the input coordinate. Each object has at least the following additional properties:
  - `distance`: Distance in meters to the supplied input coordinate.
  
Below is an example of the usage of this service. We are using the point \#6 of above.

In [3]:
import requests

# point 6
url = 'http://router.project-osrm.org/nearest/v1/car/-35.249602,-5.783835'
response = requests.get(url)
response_json = json.loads(response.text)
distance = response_json.get('waypoints')[0]['distance']

print(response.text)
print('\nDistance: %.2f' % distance)

{"waypoints":[{"hint":"M5vcgv___3_lAQAA2QIAAAAAAAAAAAAA8wAAAG0BAAAAAAAAAAAAAIFIAACHK-b9hsKn_z4i5v3lvqf_AADvCbhRRQE=","distance":282.60670294090954,"location":[-35.247225,-5.782906],"name":"Ponte Presidente Costa e Silva"}],"code":"Ok"}

Distance: 282.61


Therefore, with the OSRM Project we can define our own criteria to classify a point in "valid" or not. Consedering the existance of some house condominiums, which has a largest area and could be considerably far from a road, we decide to use 400 meters as the criteria, eg, if a point is more then 400 meters from the nearest road, than is considered an "invalid" point.

Below, we show an example of 30 points in each neighborhood with our classification of "valid" point.

In [4]:
# return the nearest road of a given logitude and latitude
# using the OSRM Project server
def nearest_road_distance(log, lat):
    # define the options of the request
    server = 'router.project-osrm.org'
    service = 'nearest'
    version = 'v1'
    profile = 'car'
    
    # mount the request
    url = ('http://' + server + '/' + service + '/' + version +
            '/' + profile + '/' + str(log) + ',' + str(lat) )

    # try to get the response of the server
    try:
        # get the response of the server
        response = requests.get(url)
        # loads the response in a json format
        response_json = json.loads(response.text)

        # get the distance of the nearest road, in meters
        distance = response_json.get('waypoints')[0]['distance']
    
    # if can't get the answer, return infinite
    except:
        distance = float('inf')

    return distance

In [5]:
# return a number of points inside the polygon and has a determined max distance of a road
def generate_random_with_distance(number, polygon, neighborhood, max_distance):
    list_of_points = []
    minx, miny, maxx, maxy = polygon.bounds
    counter = 0
    while counter < number:
        x = random.uniform(minx, maxx)
        y = random.uniform(miny, maxy)
        pnt = Point(x, y)
        if polygon.contains(pnt) and nearest_road_distance(x, y) <= max_distance:
            list_of_points.append([x,y,neighborhood])
            counter += 1
    return list_of_points

In [7]:
# Create a map object
m = folium.Map(
    location=[-5.802592, -35.212558],
    zoom_start=12,
    tiles='OpenStreetMap'
)

# Configure geojson layer
folium.GeoJson(geo_json_natal).add_to(m)

#define the number of points
number_of_points = 20

# search all features
for feature in geo_json_natal['features']:
    # get the name of neighborhood
    neighborhood = feature['properties']['name']
    # take the coordinates (lat,log) of neighborhood
    geom = feature['geometry']['coordinates']
    # create a polygon using all coordinates
    polygon = Polygon(geom[0])
    # maximun distance of the point to a road to be considered valid
    max_distance = 400
    # return number_of_points by neighborhood as a list [[log,lat],....]
    points = generate_random_with_distance(number_of_points,polygon, neighborhood, max_distance)
    # iterate over all points and print in the map
    for i,value in enumerate(points):
        log, lat, name = value 
        # Draw a small circle
        folium.CircleMarker([lat,log],
                    radius=2,
                   color='red').add_to(m)

# print the map
m

### 4.3. Server to generate the dataset

Now we already have all the elements to run our server and collect the data to do a representative estimation of the mean wait time of Ubers in all the nieighborhoods of Natal - RN. The server has the code below.

```python
#time interval between periods, in minutes
INTERVAL = 7

print('Initializing server...')

#try to stablish a connection with the Uber Server
try:
    session = Session(server_token=keys['uber'])
    client = UberRidesClient(session)

    print('Uber client initialized')

except:
    print('Unable to stablish client connection with Uber.')

#initializing a counter to control the number of iterations
k = 0
# define the initial time
initial_time = dt.datetime.now()
while True:
    if dt.datetime.now() >= initial_time:
        k = k+1
        print('\n\nCollecting the data.')
        print('Iteration number: ', k )
        print('\n\n')

        #number of points for each neighborhood
        number_of_points = 2

        #open the file that will act like a database in the 'append' mode
        #which allow us to append a row each time we open it
        file = open('db.csv','a')
        writer = csv.writer(file)

        # search all features
        for feature in geo_json_natal['features']:
            # get the name of neighborhood
            neighborhood = feature['properties']['name']
            # take the coordinates (lat,log) of neighborhood
            geom = feature['geometry']['coordinates']
            # create a polygon using all coordinates
            polygon = Polygon(geom[0])

            # maximun distance of the point to a road to be considered valid
            max_distance = 400
            # return number_of_points by neighborhood as a list [[log,lat],....]
            points = generate_random(number_of_points,polygon, neighborhood, max_distance)
            # iterate over all points and print in the map
            for i,value in enumerate(points):
                log, lat, name = value

                #try to get the products for each point
                try:
                    response = client.get_products(lat,log)

                    # API - get/products
                    products = response.json.get('products')
                    #for each point, get the time estimates and write in the db file
                    for product in products:
                        #get the timestamp for insert into the db
                        now = dt.datetime.now()

                        #try to get the time estimates
                        try:
                            wait_time = client.get_pickup_time_estimates(lat,log,
                                                product['product_id'])

                            #mount the row to be inserted in the db file
                            row = [wait_time.json.get('times')[0]['localized_display_name'],
                                   lat,
                                   log,
                                   neighborhood,
                                   now,
                                   wait_time.json.get('times')[0]['estimate']]

                            #write the row mounted
                            writer.writerow(row)
                            #print the row in the terminal for the user see whats going on the server
                            print(row)

                        #we don't make any treatment with the exceptions
                        #because there isn't a problem if we miss a couple of points
                        except:
                            pass

                #we don't make any treatment with the exceptions
                #because there isn't a problem if we miss a couple of points
                except:
                    pass        

        # close the file
        file.close()

        # update the next initial time in order to obey the limitation of the Uber API
        initial_time += dt.timedelta(minutes=INTERVAL)

    #wait 10 seconds
    sleep(10)
```

The server ran from Sunday, October 29, 03:18, to Sunday, November 05, 09:49, which correspond to 7 days in a row. Collected 162.068 points, which gives a mean of, approximately, 4.500 points per neighborhood

In [33]:
data = pd.read_csv('db.csv')
date = data['REQUEST_TIME']
print('Start date: ', date.min())
print('End date: ',date.max())
print('Number of points: ', len(date))
print('Mean points per neighborhood: ', len(date) / len(neighborhood))

Start date:  2017-10-29 03:18:32.035788
End date:  2017-11-04 20:02:57.873885
Number of points:  162198
Mean points per neighborhood:  4505.5


In [32]:
#print the first 5 rows of the dataset
data.head()

Unnamed: 0,UBER_TYPE,LATITUDE,LONGITUDE,NEIGHBORHOOD,REQUEST_TIME,WAIT_TIME
0,uberX,-5.856011,-35.237765,Pitimbu,2017-10-29 03:18:32.035788,360
1,UberSELECT,-5.856011,-35.237765,Pitimbu,2017-10-29 03:18:32.915881,600
2,uberX,-5.866343,-35.227756,Pitimbu,2017-10-29 03:18:34.364739,540
3,UberSELECT,-5.866343,-35.227756,Pitimbu,2017-10-29 03:18:35.112814,600
4,uberX,-5.853383,-35.266591,Planalto,2017-10-29 03:18:36.590419,300


## 5. 

## Código de inicialização do 'banco de dados'

In [None]:
'''
session = Session(server_token='tph9nxWtIXGsrN1X_sBqs2LY10C88FaJpxYK_U6v')
client = UberRidesClient(session)

file = open('db.csv','w')
writer = csv.writer(file)

header = ['UBER_TYPE', 'LATITUDE', 'LONGITUDE', 'NEIGHBORHOOD', 'REQUEST_TIME', 'WAIT_TIME']
writer.writerow(header)
file.close()
'''

In [None]:
# Server in file server.py

In [None]:
data = pd.read_csv('db_reduced.csv')

for row in data.itertuples():
    folium.CircleMarker([row.LATITUDE,row.LONGITUDE],
                radius=1,
                popup='%s %s %d' % (row.NEIGHBORHOOD, '#', row.WAIT_TIME),
                color='red').add_to(m)

m

In [None]:
import requests

url = 'http://localhost:5000/nearest/v1/car/-35.155470,-5.787052'
response = requests.get(url)
response_json = json.loads(response.text)
distance = response_json.get('waypoints')[0]['distance']

print(response.text)
print(distance)


In [None]:
import googlemaps

gmaps = googlemaps.Client(key='AIzaSyDMtV6cvVPb23oEZdBAGRKUT-xbzMWCQ8g')

result = gmaps.nearest_roads((-5.783295,-35.210242))

print(result)


In [None]:
print(len(result))

In [None]:
data = pd.read_csv('db_test.csv')
mean_wait_time = data.pivot_table(index='NEIGHBORHOOD', values='WAIT_TIME', aggfunc=np.mean)

mean_wait_time.reset_index(inplace=True)

m = folium.Map(
    location = [-5.826592, -35.212558],
    zoom_start = 11,
    tiles='OpenStreetMap'
)

threshold_scale = np.linspace(mean_wait_time['WAIT_TIME'].min(),
                              mean_wait_time['WAIT_TIME'].max(), 6, dtype=int).tolist()


m.choropleth(
    geo_data=geo_json_natal,
    data=mean_wait_time,
    name='All data',
    columns=['NEIGHBORHOOD', 'WAIT_TIME'],
    key_on='feature.properties.name',
    fill_color = 'OrRd',
    legend_name='Mean wait time for uber in the neighborhoods of Natal (in seconds)',
    highlight=True,
    threshold_scale = threshold_scale
)

for neighborhood in geo_json_natal['features']:
    # get the name of neighborhood
    name = neighborhood['properties']['name']
    # take the coordinates (lat,log) of neighborhood
    geom = neighborhood['geometry']['coordinates']
    # create a polygon using all coordinates
    polygon = Polygon(geom[0])

    folium.CircleMarker([polygon.centroid.y, polygon.centroid.x],
                radius=2,
                popup='%s %s %.2f %s' % (name, '#', 
                                         mean_wait_time[ mean_wait_time['NEIGHBORHOOD'] == name ]['WAIT_TIME'], 
                                         's'),
                tooltip=name,
                color='red').add_to(m)


folium.LayerControl().add_to(m)

m


In [None]:
data = pd.read_csv('db_test.csv')
data = data[ data['UBER_TYPE'] == 'uberX' ]

mean_wait_time = data.pivot_table(index='NEIGHBORHOOD', values='WAIT_TIME', aggfunc=np.mean)
mean_wait_time.reset_index(inplace=True)

m = folium.Map(
    location = [-5.826592, -35.212558],
    zoom_start = 11,
    tiles='OpenStreetMap'
)

threshold_scale = np.linspace(mean_wait_time['WAIT_TIME'].min(),
                              mean_wait_time['WAIT_TIME'].max(), 6, dtype=int).tolist()


m.choropleth(
    geo_data=geo_json_natal,
    data=mean_wait_time,
    name='uberX',
    columns=['NEIGHBORHOOD', 'WAIT_TIME'],
    key_on='feature.properties.name',
    fill_color = 'OrRd',
    legend_name='Mean wait time for uber in the neighborhoods of Natal (in seconds)',
    highlight=True,
    threshold_scale = threshold_scale
)

for neighborhood in geo_json_natal['features']:
    # get the name of neighborhood
    name = neighborhood['properties']['name']
    # take the coordinates (lat,log) of neighborhood
    geom = neighborhood['geometry']['coordinates']
    # create a polygon using all coordinates
    polygon = Polygon(geom[0])

    folium.CircleMarker([polygon.centroid.y, polygon.centroid.x],
                radius=2,
                popup='%s %s %.2f %s' % (name, '#', 
                                         mean_wait_time[ mean_wait_time['NEIGHBORHOOD'] == name ]['WAIT_TIME'], 
                                         's'),
                tooltip=name,
                color='red').add_to(m)

folium.LayerControl().add_to(m)

m

In [None]:
data = pd.read_csv('db_test.csv')
data = data[ data['UBER_TYPE'] == 'UberSELECT' ]

mean_wait_time = data.pivot_table(index='NEIGHBORHOOD', values='WAIT_TIME', aggfunc=np.mean)
mean_wait_time.reset_index(inplace=True)

m = folium.Map(
    location = [-5.826592, -35.212558],
    zoom_start = 11,
    tiles='OpenStreetMap'
)

threshold_scale = np.linspace(mean_wait_time['WAIT_TIME'].min(),
                              mean_wait_time['WAIT_TIME'].max(), 6, dtype=int).tolist()


m.choropleth(
    geo_data=geo_json_natal,
    data=mean_wait_time,
    columns=['NEIGHBORHOOD', 'WAIT_TIME'],
    key_on='feature.properties.name',
    fill_color = 'OrRd',
    legend_name='Mean wait time for uber in the neighborhoods of Natal (in seconds)',
    highlight=True,
    threshold_scale = threshold_scale
)

for neighborhood in geo_json_natal['features']:
    # get the name of neighborhood
    name = neighborhood['properties']['name']
    # take the coordinates (lat,log) of neighborhood
    geom = neighborhood['geometry']['coordinates']
    # create a polygon using all coordinates
    polygon = Polygon(geom[0])

    folium.CircleMarker([polygon.centroid.y, polygon.centroid.x],
                radius=2,
                popup='%s %s %.2f %s' % (name, '#', 
                                         mean_wait_time[ mean_wait_time['NEIGHBORHOOD'] == name ]['WAIT_TIME'], 
                                         's'),
                tooltip=name,
                color='red').add_to(m)

folium.LayerControl().add_to(m)

m

In [None]:
data = pd.read_csv('db_test.csv')
data['TIME'] = pd.to_datetime( data['REQUEST_TIME'] ).dt.time

data = data[ ((data['TIME'] < dt.time(20,0,0)) | (data['TIME'] > dt.time(7,0,0))) ]

mean_wait_time = data.pivot_table(index='NEIGHBORHOOD', values='WAIT_TIME', aggfunc=np.mean)
mean_wait_time.reset_index(inplace=True)

m = folium.Map(
    location = [-5.826592, -35.212558],
    zoom_start = 11,
    tiles='OpenStreetMap'
)

threshold_scale = np.linspace(mean_wait_time['WAIT_TIME'].min(),
                              mean_wait_time['WAIT_TIME'].max(), 6, dtype=int).tolist()


m.choropleth(
    geo_data=geo_json_natal,
    data=mean_wait_time,
    columns=['NEIGHBORHOOD', 'WAIT_TIME'],
    key_on='feature.properties.name',
    fill_color = 'OrRd',
    legend_name='Mean wait time for uber in the neighborhoods of Natal (in seconds)',
    highlight=True,
    threshold_scale = threshold_scale
)

folium.LayerControl().add_to(m)

m



In [None]:
data = pd.read_csv('db_test.csv')
data['TIME'] = pd.to_datetime( data['REQUEST_TIME'] ).dt.time

data = data[ ((data['TIME'] > dt.time(20,0,0)) | (data['TIME'] < dt.time(7,0,0))) ]

mean_wait_time = data.pivot_table(index='NEIGHBORHOOD', values='WAIT_TIME', aggfunc=np.mean)
mean_wait_time.reset_index(inplace=True)

m = folium.Map(
    location = [-5.826592, -35.212558],
    zoom_start = 11,
    tiles='OpenStreetMap'
)

threshold_scale = np.linspace(mean_wait_time['WAIT_TIME'].min(),
                              mean_wait_time['WAIT_TIME'].max(), 6, dtype=int).tolist()


m.choropleth(
    geo_data=geo_json_natal,
    data=mean_wait_time,
    columns=['NEIGHBORHOOD', 'WAIT_TIME'],
    key_on='feature.properties.name',
    fill_color = 'OrRd',
    legend_name='Mean wait time for uber in the neighborhoods of Natal (in seconds)',
    highlight=True,
    threshold_scale = threshold_scale
)

folium.LayerControl().add_to(m)

m