# DSCI 511: Data Acquisition and Pre-Processing <br> Routes and Traffic

# Introduction

## Abstract

We are focusing on building a dataset with route information and incidents (constructions,collisions,etc) happening within that routes. In order to do that we need a route data and incidents data to find the time taken to travel between two routes.

## Data Source and Access Rights
The Traffic Service provides real time traffic information related to markets, incidents, and flow. Map Quest updates all traffic information in real time, mostly these updates are from the people who are using the direction and route applications.

There is no limit in terms of accessing the data unless the transactions cap reaches 15,000 a month per account


### Traffic Incidents API

This API has incident information like road constructions, traffic collisions, etc. Also, the location of those incidents.

Source = https://developer.mapquest.com/documentation/traffic-api/incidents/get/


### Directions Route API

This API has the routes between a source and destination. The boundingbox, source location and destination location can be extracted. The boundingbox can be used to obtain the incident information in the previous data and merge it with the route dataset so that we can identify whether the route has an incident happening that might block the traffic.

Source: https://developer.mapquest.com/documentation/directions-api/route/get/

# Approach

## Libraries Used

In [5]:
import requests 
from datetime import datetime, timedelta
import re
from pprint import pprint

## Data Acquisition

### Function to get the Route, Incidents with in that route and lattitude/longitude details from MapQuest API

In [10]:
def get_routes_incidents(Source, destination):
    routeURL = 'http://www.mapquestapi.com/directions/v2/route?key=UjTK8AMKrZdpn2MsGDHxPo102X9vaq8r&from='+str(Source)+'&to='+str(destination)
    route1 = requests.get(url = routeURL)
    route2 = route1.json()
    boundingbox = str(round(route2['route']['boundingBox']['ul']['lat'],5))+','+str(round(route2['route']['boundingBox']['ul']['lng'],5))+','+\
                    str(round(route2['route']['boundingBox']['lr']['lat'],5))+','+str(round(route2['route']['boundingBox']['lr']['lng'],5))
    incURL = 'http://www.mapquestapi.com/traffic/v2/incidents?key=UjTK8AMKrZdpn2MsGDHxPo102X9vaq8r&boundingBox='+boundingbox
    inciden = requests.get(url = incURL)
    incidents = inciden.json()
    return (route2,incidents,boundingbox)

### Getting the results using above function between a Source and Destination

In [11]:
Source = '3619 Spring Garden St, Philadelphia, PA'
Destination = '1515 Massachusetts Ave, Cambridge, MA'


route, incidents,boundingbox = get_routes_incidents(Source,Destination)
pprint(route)

{'info': {'copyright': {'imageAltText': '© 2019 MapQuest, Inc.',
                        'imageUrl': 'http://api.mqcdn.com/res/mqlogo.gif',
                        'text': '© 2019 MapQuest, Inc.'},
          'messages': [],
          'statuscode': 0},
 'route': {'boundingBox': {'lr': {'lat': 39.956898, 'lng': -71.117455},
                           'ul': {'lat': 42.376884, 'lng': -75.195023}},
           'computedWaypoints': [],
           'distance': 306.187,
           'formattedTime': '05:23:10',
           'fuelUsed': 13.43,
           'hasAccessRestriction': False,
           'hasBridge': True,
           'hasCountryCross': False,
           'hasFerry': False,
           'hasHighway': True,
           'hasSeasonalClosure': False,
           'hasTimedRestriction': False,
           'hasTollRoad': True,
           'hasTunnel': True,
           'hasUnpaved': False,
           'legs': [{'destIndex': 11,
                     'destNarrative': 'Proceed to 1515 MASSACHUSETTS AVE.',
      

## Data Pre-Processing

### Cleaning the Route details between source and destinaton and extracting only the following variables

- Direcction
- Distance in between flags (& Cumulative)
- Current Time and Time to reach (& Cumulative)
- Lat & Long of each flag (bounding box)


In [12]:
Routeddict = {}

for ind,ma in enumerate(route['route']['legs'][0]['maneuvers']):
    if ind == 0:
        for k,v in ma.items():
            if k == 'streets' and len(v) > 0:
                v = v[0]
                Routeddict.update({k:[v]})
            else:
                Routeddict.update({k:[v]})
    if ind > 0:
        for k,v in ma.items():
            if k == 'streets':
                if len(v) > 0:
                    v=v[0]
                    Routeddict[k].append(v)
                if len(v) == 0:
                    v='No Street'
                    Routeddict[k].append(v)
                
            else:
                Routeddict[k].append(v)
        


selectkeys = ['directionName','distance','formattedTime','streets','narrative','time','startPoint']

Routeddict = {k:Routeddict[k] for k in selectkeys}
pprint(Routeddict)



{'directionName': ['East',
                   'South',
                   'East',
                   'East',
                   'Northeast',
                   'Northeast',
                   'North',
                   'North',
                   'North',
                   'East',
                   'East',
                   'East',
                   'East',
                   'North',
                   'West',
                   'Northwest',
                   'Northeast',
                   'Northeast',
                   'North',
                   'Northwest',
                   ''],
 'distance': [0.923,
              0.087,
              1.497,
              0.383,
              93.729,
              2.88,
              72.304,
              37,
              2.14,
              40.743,
              52.48,
              0.572,
              0.053,
              0.362,
              0.275,
              0.088,
              0.05,
              0.342,
              0.214,
    

### Formatting Datetime and Calculation time from current time

Calculaing the time to reach from current time in between flags until the destination

In [13]:
now = datetime.strptime(str(datetime.now()).split('.',-1)[0],"%Y-%m-%d %H:%M:%S")

cumsumt = []

now1 = timedelta( hours=now.hour, minutes=now.minute, seconds=now.second, microseconds=now.microsecond)
for ind,i in enumerate(Routeddict['formattedTime']):
    i = datetime.strptime(str(now.date())+' '+str(i),"%Y-%m-%d %H:%M:%S")
    i1 = timedelta( hours=i.hour,minutes=i.minute, seconds=i.second, microseconds=i.microsecond)
    now1 = i1 + now1
    cumsumt.append(now1)
Routeddict.update({'Currenttime':cumsumt})

def cumsum(para):
    vb= 0
    dist = []
    for ind,j in enumerate(para):
        vb+=j
        dist.append(round(vb,2))
    return dist

dist = cumsum(Routeddict['distance'])
time = cumsum(Routeddict['time'])
Routeddict.update({'Total_Distance':dist})
Routeddict.update({'Cumulative_time':time})




pprint(Routeddict)  


{'Cumulative_time': [174,
                     187,
                     308,
                     338,
                     6162,
                     6405,
                     11366,
                     13571,
                     13730,
                     16067,
                     19103,
                     19171,
                     19181,
                     19225,
                     19252,
                     19271,
                     19279,
                     19343,
                     19378,
                     19390,
                     19390],
 'Currenttime': [datetime.timedelta(seconds=82738),
                 datetime.timedelta(seconds=82751),
                 datetime.timedelta(seconds=82872),
                 datetime.timedelta(seconds=82902),
                 datetime.timedelta(days=1, seconds=2326),
                 datetime.timedelta(days=1, seconds=2569),
                 datetime.timedelta(days=1, seconds=7530),
                 datetime.timedelta(

### Incidents happening between the source and destination 

- Incident type
- Start time and end time of incidents (DateTime formats)
- Delay Time
- Lat and Long


In [14]:
Incidentdict = {}

for ind,ma in enumerate(incidents['incidents']):
    if ind == 0:
        for k,v in ma.items():
            if k == 'streets' and len(v) > 0:
                v = v[0]
                Incidentdict.update({k:[v]})
            else:
                Incidentdict.update({k:[v]})
    if ind > 0:
        for k,v in ma.items():
            if k == 'streets':
                if len(v) > 0:
                    v=v[0]
                    Incidentdict[k].append(v)
                if len(v) == 0:
                    v='Reached'
                    Incidentdict[k].append(v)
                
            else:
                Incidentdict[k].append(v)
        


Incidentdict['endTime']=[datetime.strptime(str(i).split('T')[0]+' '+str(i).split('T')[1],"%Y-%m-%d %H:%M:%S") for i in Incidentdict['endTime']]
Incidentdict['startTime']=[datetime.strptime(str(i).split('T')[0]+' '+str(i).split('T')[1],"%Y-%m-%d %H:%M:%S") for i in Incidentdict['startTime']]


pprint(Incidentdict)


{'delayFromFreeFlow': [0.0,
                       0.0,
                       0.9800000190734863,
                       0.0,
                       0.019999999552965164,
                       0.0,
                       0.0,
                       0.07999999821186066,
                       0.0,
                       0.7300000190734863,
                       0.0,
                       0.0,
                       0.0,
                       0.0,
                       0.0,
                       0.0,
                       0.0,
                       0.0,
                       0.10000000149011612,
                       0.0,
                       0.0,
                       0.0,
                       0.0,
                       0.0,
                       0.029999999329447746,
                       0.0,
                       0.0,
                       0.0,
                       0.0,
                       0.07000000029802322,
                       0.0,
                    

              'Road closed intermittently due to construction work on Rockland '
              'Ave both ways between Slayton Ave and Keating Pl.',
              'Road closed intermittently due to construction work on 10th St '
              'both ways from Avenue P to Quentin Rd.',
              'Road closed intermittently due to construction work on 78th St '
              'both ways from 7th Ave to 10th Ave.',
              'Road closed due to burst water main on 44th St Eastbound '
              'between 4th Ave and 7th Ave.',
              'Road closed due to burst water main on 5th Ave both ways '
              'between 43rd St and 45th St.',
              'Road closed intermittently due to construction work on Bergen '
              'St both ways from New York Ave to Mother Gaston Blvd.',
              'Road closed due to long-term construction on Aqueduct Rd '
              'Northbound from Lefferts Blvd to Rockaway Blvd.',
              'Road closed intermittently due to const

        '119752822',
        '119752806',
        '100056787',
        '119889294',
        '121821951',
        '121820324',
        '121821851',
        '121215516',
        '121820834',
        '121215494',
        '119953959',
        '121214890',
        '121817232',
        '120199313',
        '121352259',
        '120905268',
        '101179371',
        '121853274',
        '121816566',
        '92284254',
        '115449247',
        '121819993',
        '121825239',
        '121818013',
        '115702973',
        '121817875',
        '121820116',
        '90530274',
        '121826776',
        '121439008',
        '121439012',
        '121221464',
        '90533726',
        '90532035',
        '87038309',
        '95494304',
        '90533697',
        '115141355',
        '115141339',
        '90533398',
        '118378023',
        '120226715',
        '120790182',
        '117174796',
        '115450811',
        '105462396',
        '100650379',
        '100650369',


                              {'crossRoad1': 'Lefferts Blvd',
                               'crossRoad2': 'Rockaway Blvd ',
                               'direction': 'Northbound',
                               'eventText': 'Road closed, long-Term '
                                            'Construction',
                               'position1': 'from',
                               'position2': 'to',
                               'roadName': 'Aqueduct  Rd'},
                              {'crossRoad1': '116th Ave / 145th St',
                               'crossRoad2': 'Foch Blvd / 145th St',
                               'direction': 'both ways',
                               'eventText': 'Road closed intermittently, '
                                            'construction work',
                               'position1': 'from',
                               'position2': 'to',
                               'roadName': '145th St'},
                              {'

                               'roadName': '33rd St'},
                              {'crossRoad1': 'NY-135  Exits 14E,14W / NY-25 '
                                             'Jericho Tpke / Crossways Park Dr',
                               'crossRoad2': 'NY-135  Exit 9 / Plainview Rd / '
                                             'Broadway',
                               'direction': 'Southbound',
                               'eventText': 'Intermittent Ramp Closures, '
                                            'construction work',
                               'fromLocation': 'Locust Grove',
                               'position1': 'between',
                               'position2': 'and',
                               'roadName': 'NY-135',
                               'toLocation': 'Bethpage'},
                              {'crossRoad1': '5th Ave / 49th St',
                               'crossRoad2': '6th Ave / 49th St',
                               'directi

                               'direction': 'Westbound',
                               'eventText': 'Emergency Repairs, (Named) Lane '
                                            'Blocked',
                               'fromLocation': 'Sweyze',
                               'position1': 'between',
                               'position2': 'and',
                               'roadName': 'I-495',
                               'toLocation': 'Calverton'},
                              {'crossRoad1': 'I-495  Exit 72 / NY-25 / NY-25 '
                                             'Main St',
                               'crossRoad2': 'I-495  Exit 71 / NY-24 Edwards '
                                             'Ave / Edwards Ave',
                               'direction': 'Westbound',
                               'eventText': 'Emergency Repairs, lane (or '
                                            'Lanes) closed',
                               'fromLocation': 'Sweyze',
     

              1,
              1,
              3,
              2,
              1,
              1,
              3,
              3,
              2,
              2,
              2,
              3,
              3,
              0,
              3,
              3,
              3,
              3,
              1,
              1,
              1,
              2,
              2,
              3,
              0,
              0,
              3,
              3,
              1,
              3,
              0,
              3,
              3],
 'shortDesc': ['Callowhill St: new road layout between 17th St and 15th St',
               '15th St S/B: road closed between Hamilton St and Callowhill St',
               'I-95 S/B: exit ramp to E Girard Ave W/B closed.',
               'Dauphin St: road closed from American St to 3rd St',
               'US-1 S/B: entry ramp from W Cayuga St closed.',
               'Cheltenham Ave N/B: exit ramp closed at PA-611 Old York Rd',
    

               datetime.datetime(2019, 10, 30, 4, 0),
               datetime.datetime(2019, 11, 7, 5, 16, 26),
               datetime.datetime(2019, 8, 1, 5, 0),
               datetime.datetime(2019, 10, 30, 4, 0),
               datetime.datetime(2019, 10, 30, 4, 0),
               datetime.datetime(2019, 6, 17, 0, 0),
               datetime.datetime(2018, 2, 9, 22, 0),
               datetime.datetime(2019, 10, 30, 4, 0),
               datetime.datetime(2019, 10, 30, 4, 0),
               datetime.datetime(2019, 3, 5, 6, 11, 20),
               datetime.datetime(2019, 10, 31, 12, 38, 56),
               datetime.datetime(2019, 12, 3, 22, 0),
               datetime.datetime(2019, 12, 3, 22, 0),
               datetime.datetime(2019, 12, 3, 22, 0),
               datetime.datetime(2019, 12, 3, 21, 30),
               datetime.datetime(2019, 12, 3, 22, 0),
               datetime.datetime(2019, 12, 3, 21, 0),
               datetime.datetime(2019, 11, 1, 4, 42, 2),
               

### Merging Routes and Incidents based on Datetime, Latitude and Longitude

- Incident’s Lat and Long in line/in direction with the Route’s Lat and Long
- Route’s Current Datetime is in between the Incident’s Start and End DateTime
- Add Delay time to Time to reach if the incident aligns Route


In [15]:
def findcord(x1, y1, x2,  y2, x, y) : 
    if (x > x1 and x < x2 and 
        y > y1 and y < y2) : 
        return True
    else : 
        return False
    
xycords = Routeddict['startPoint']
tstcords = [(Incidentdict['lat'][ind],Incidentdict['lng'][ind]) for ind in range(len(Incidentdict['lat']))]


# print(' From :',Source,'\n', 'To :', Destination,'\n', 'Start at:', now.time(),'\n', 'Reach At :', Routeddict['Currenttime'][-1],'\n',
#      'Total Distance :', Routeddict['Total_Distance'][-1],'miles','\n\n')
# for ind,i in enumerate(xycords): 
#     print(Routeddict['narrative'][ind],'[Drive for',Routeddict['distance'][ind], 'miles', ', Reach at: ',Routeddict['Currenttime'][ind],']','\n')
#     for lat,lon in tstcords:
#         if ind<len(xycords)-1:
#             if findcord(xycords[ind]['lat'],xycords[ind]['lng'],xycords[ind+1]['lat'],xycords[ind+1]['lng'],lat,lon):
# #                 print(xycords[ind+1]['lat'],xycords[ind+1]['lng'],xycords[ind]['lat'],xycords[ind]['lng'],lat,lon)
#                 if now > Incidentdict['startTime'][Incidentdict['lat'].index(lat)] and now < Incidentdict['endTime'][Incidentdict['lat'].index(lat)]:
#                     print('Incident (',Incidentdict['shortDesc'][Incidentdict['lat'].index(lat)],')','[Delay :',round(Incidentdict['delayFromTypical'][Incidentdict['lat'].index(lat)],2) ,'minutes]','\n')

source = []
destination = []
Startat = []
ttor = []
td = []
nar = []
dfor = []
Rat = []
st = []
inshd = []
dl = []
toin = []
inclat = []
inclong = []

for ind,i in enumerate(xycords):
    for lat,lon in tstcords:
        nar.append(Routeddict['narrative'][ind])
        dfor.append(Routeddict['distance'][ind])
        Rat.append(Routeddict['Currenttime'][ind])
        source.append(Source)
        destination.append(Destination)
        Startat.append(now.time())
        td.append(Routeddict['Total_Distance'][-1])
        ttor.append(Routeddict['Currenttime'][-1])
        st.append(Routeddict['streets'][ind])
        if ind<len(xycords)-1:
            
            if findcord(xycords[ind]['lat'],xycords[ind]['lng'],xycords[ind+1]['lat'],xycords[ind+1]['lng'],lat,lon):
#                 print(xycords[ind+1]['lat'],xycords[ind+1]['lng'],xycords[ind]['lat'],xycords[ind]['lng'],lat,lon)
                if now > Incidentdict['startTime'][Incidentdict['lat'].index(lat)] and now < Incidentdict['endTime'][Incidentdict['lat'].index(lat)]:
                    inclat.append(lat)
                    inclong.append(lon)
                    inshd.append(Incidentdict['shortDesc'][Incidentdict['lat'].index(lat)])
                    dl.append(round(Incidentdict['delayFromTypical'][Incidentdict['lat'].index(lat)],2))
                    toin.append(Incidentdict['type'][Incidentdict['lat'].index(lat)])
#                         print('Incident (',Incidentdict['shortDesc'][Incidentdict['lat'].index(lat)],')','[Delay :',round(Incidentdict['delayFromTypical'][Incidentdict['lat'].index(lat)],2) ,'minutes]','\n')
                else:
                    inclat.append(lat)
                    inclong.append(lon)
                    inshd.append(Incidentdict['shortDesc'][Incidentdict['lat'].index(lat)])
                    dl.append('Does not Interfere')
                    toin.append(Incidentdict['type'][Incidentdict['lat'].index(lat)])
            else:
                inclat.append(lat)
                inclong.append(lon)
                inshd.append(Incidentdict['shortDesc'][Incidentdict['lat'].index(lat)])
                dl.append(int(0))
                toin.append(Incidentdict['type'][Incidentdict['lat'].index(lat)])
        else:
            inclat.append(int(0))
            inclong.append(int(0))
            inshd.append("No Incidents")
            dl.append(int(0))
            toin.append('No Incidents')

toin1 = ['Construction' if i == 1 
         else 'Event' if i == 2 
         else 'Congestion' if i==3 
         else 'Accident' if i==4 
         else 'No Incidents' for i in toin]             

Dataset = {'Source':source,
          'Destination':destination,
          'Starting At':Startat,
          'Reaching At':ttor,
          'Total Distance':td,
          'Direction':nar,
          'Drive for_miles':dfor,
          'Reach at':Rat,
            'Street/Highway':st,
          'Description':inshd,
          'Delay_min':dl,
          'type of incident':toin1,
          'Incident Latitude':inclat,
          'incident Logintude':inclong}



### Exporting the Datset as csv

In [18]:
import pandas as pd
pd.DataFrame(Dataset).to_csv('Route.csv')
pd.DataFrame(Dataset)

Unnamed: 0,Source,Destination,Starting At,Reaching At,Total Distance,Direction,Drive for_miles,Reach at,Street/Highway,Description,Delay_min,type of incident,Incident Latitude,incident Logintude
0,"3619 Spring Garden St, Philadelphia, PA","1515 Massachusetts Ave, Cambridge, MA",22:56:04,1 days 04:19:14,306.19,Start out going east on Spring Garden St towar...,0.923,0 days 22:58:58,Spring Garden St,Callowhill St: new road layout between 17th St...,0.0,Construction,39.959789,-75.163719
1,"3619 Spring Garden St, Philadelphia, PA","1515 Massachusetts Ave, Cambridge, MA",22:56:04,1 days 04:19:14,306.19,Start out going east on Spring Garden St towar...,0.923,0 days 22:58:58,Spring Garden St,Callowhill St: new road layout between 17th St...,0.0,Construction,39.959789,-75.163719
2,"3619 Spring Garden St, Philadelphia, PA","1515 Massachusetts Ave, Cambridge, MA",22:56:04,1 days 04:19:14,306.19,Start out going east on Spring Garden St towar...,0.923,0 days 22:58:58,Spring Garden St,I-95 S/B: exit ramp to E Girard Ave W/B closed.,0.0,Construction,39.973728,-75.120430
3,"3619 Spring Garden St, Philadelphia, PA","1515 Massachusetts Ave, Cambridge, MA",22:56:04,1 days 04:19:14,306.19,Start out going east on Spring Garden St towar...,0.923,0 days 22:58:58,Spring Garden St,Dauphin St: road closed from American St to 3r...,0.0,Construction,39.985661,-75.138657
4,"3619 Spring Garden St, Philadelphia, PA","1515 Massachusetts Ave, Cambridge, MA",22:56:04,1 days 04:19:14,306.19,Start out going east on Spring Garden St towar...,0.923,0 days 22:58:58,Spring Garden St,US-1 S/B: entry ramp from W Cayuga St closed.,0.0,Construction,40.020618,-75.154716
5,"3619 Spring Garden St, Philadelphia, PA","1515 Massachusetts Ave, Cambridge, MA",22:56:04,1 days 04:19:14,306.19,Start out going east on Spring Garden St towar...,0.923,0 days 22:58:58,Spring Garden St,Cheltenham Ave N/B: exit ramp closed at PA-611...,0.0,Construction,40.061920,-75.136642
6,"3619 Spring Garden St, Philadelphia, PA","1515 Massachusetts Ave, Cambridge, MA",22:56:04,1 days 04:19:14,306.19,Start out going east on Spring Garden St towar...,0.923,0 days 22:58:58,Spring Garden St,Cheltenham Ave N/B: exit ramp closed at PA-611...,0.0,Construction,40.061920,-75.136642
7,"3619 Spring Garden St, Philadelphia, PA","1515 Massachusetts Ave, Cambridge, MA",22:56:04,1 days 04:19:14,306.19,Start out going east on Spring Garden St towar...,0.923,0 days 22:58:58,Spring Garden St,I-95 S/B: two lanes closed between Exit 39 PA-...,0.0,Construction,40.083244,-74.934753
8,"3619 Spring Garden St, Philadelphia, PA","1515 Massachusetts Ave, Cambridge, MA",22:56:04,1 days 04:19:14,306.19,Start out going east on Spring Garden St towar...,0.923,0 days 22:58:58,Spring Garden St,Tyson Ave: road closed from Custer Ave to Edge...,0.0,Construction,40.113899,-75.152473
9,"3619 Spring Garden St, Philadelphia, PA","1515 Massachusetts Ave, Cambridge, MA",22:56:04,1 days 04:19:14,306.19,Start out going east on Spring Garden St towar...,0.923,0 days 22:58:58,Spring Garden St,PA-132 Street Rd: intermittent lane closures b...,0.0,Construction,40.123859,-74.962601


The size of dataset depend upon the source, destination and time

## Who will be using the Dataset?

- Direction and routing Applications like Google Maps, Waze, etc. will be interested in this dataset. 
- They can utilize this dataset to build an optimized route between two places.
- Personaly application can also be developed using this dataset.


## Limitations

- Incident’s latitude and longitude may not align within the route between source and destination.
- Null values in Streets/Highway information.