##  Travel time prediction in Indian Metro cities using Uber Movement data and OpenStreetMap


Uber provides anonymized and aggregated travel time data through [Uber Movement](https://movement.uber.com/) platform for many citites across the world. For India, current and historic data is available for 5 cities - Bangalore, Hyderabad, New Delhi, Mumbai and Kolkata. It also provides the details on the ward boundaries in the form of JSON file.

[OpenStreetMap](https://wiki.openstreetmap.org/wiki/About_OpenStreetMap) (OSM) is a free, editable map of the whole world that is being built by volunteers largely from scratch and released with an open-content license. OSM data includes a global navigable street network dataset. Several services exists that provide routing and network analysis on top of this data. 

In this project, we use the open travel time dataset from Uber and leverage open-source routing services for OpenStreetMap to build a fairly accurate model for travel time within each of the metro cities in India. We show that by using rich ecosystem of Python Geospatial libraries, we can easily consume, process, and visualize large amount of geospatial data easily and incorporate it easily into a machine learning model.

**Open datasets**
- Uber Movement - Travel times and ward boundaries
- OpenStreetMap

**Python libraries**
- geopandas
- shapely
- matplotlib
- folium
- sklearn

**Services**
- Open Source Routing Machine (OSRM)
- OpenRouteService (ORS) API 

In [1]:
import pandas as pd
import geopandas as gpd
import requests
import shapely
import matplotlib.pyplot as plt
import datetime
%matplotlib inline

The Uber Movement Travel Times data comes as a CSV file for each quarter. Here we are using the **Travel Times By Date By Hour Buckets (All Days)** dataset. This data set includes the arithmetic mean, geometric mean, and standard deviations for aggregated travel times between every ward in the city, for every day of the quarter and aggregated into time categories. This is a large dataset with over **7M rows**.

We import the data as a Pandas DataFrame and call `convert_dtypes()` to select the best datatypes for each column. 

In [2]:
travel_times= pd.read_csv('data/uber/bangalore-wards-2020-1-All-DatesByHourBucketsAggregate.csv')
travel_times = travel_times.convert_dtypes()

In [3]:
travel_times

Unnamed: 0,sourceid,dstid,month,day,start_hour,end_hour,mean_travel_time,standard_deviation_travel_time,geometric_mean_travel_time,geometric_standard_deviation_travel_time
0,102,97,3,13,10,16,322.80,425.14,270.59,1.70
1,98,55,2,11,0,7,933.90,194.85,915.32,1.22
2,148,111,1,11,16,19,1861.08,317.79,1836.22,1.17
3,58,22,2,8,16,19,1463.30,455.25,1391.25,1.38
4,54,62,1,18,16,19,701.49,320.39,634.64,1.56
...,...,...,...,...,...,...,...,...,...,...
7648072,100,109,3,6,10,16,794.02,161.94,779.03,1.21
7648073,169,173,1,25,0,7,2539.70,317.88,2519.15,1.14
7648074,71,52,3,5,10,16,1072.40,273.84,1043.70,1.25
7648075,85,140,1,26,19,0,1298.86,347.86,1253.28,1.31
