# Extracting a weighted drive network from INRIX

INRIX stores road network links using TMCs (Traffic Monitoring Codes). This notebook provides a recipe for extracting nodes and weighted edges from INRIX data extracts for use with various network analysis tools. In our case, we'll primarily be performing our analyses with Pandana, which requires 5 pieces of network data: node_x, node_y, edge_start_node, edge_end_node, edge_weight.

In [101]:
import requests
import pandas as pd
from pyproj import Proj, transform

## Alameda County

*N.B.* - For some reason, the actual file of travel times/speeds (i.e. edge weights) extracted from INRIX has fewer edges (exactly 4,000) than are present in the network file (4,678). Is the travel time file getting clipped at 4k records? Either way, when we merge the speeds with the network we will be dropping ~700 edges.

### Load Inrix Data

When you generate a data extract from Inrix's [Massive Data Downloader](https://inrix.ritis.org/analytics/download/) tool store the TMC configuration data (i.e. the network) in a file called TMC_Indentification.csv

In [148]:
df = pd.read_csv('/home/max/TMC_Identification.csv')

In [149]:
df.head()

Unnamed: 0,tmc,road,direction,intersection,state,county,zip,start_latitude,start_longitude,end_latitude,end_longitude,miles,road_order
0,105+22231,105TH AVE,EASTBOUND,CA-185/INTERNATIONAL BLVD/E 14TH ST,CA,ALAMEDA,94603,37.735439,-122.174808,37.739259,-122.166718,0.514912,1.0
1,105-22230,105TH AVE,WESTBOUND,SAN LEANDRO ST,CA,ALAMEDA,94603,37.739259,-122.166718,37.735439,-122.174808,0.514912,1.0
2,105P22223,106TH AVE,EASTBOUND,BANCROFT AVE/LINK ST,CA,ALAMEDA,94603,37.741482,-122.156987,37.741561,-122.156799,0.011632,1.0
3,105+22224,106TH AVE,EASTBOUND,MACARTHUR BLVD,CA,ALAMEDA,94603,37.741561,-122.156799,37.742957,-122.152482,0.256061,2.0
4,105+22225,106TH AVE,EASTBOUND,I-580,CA,ALAMEDA,94605,37.742957,-122.152482,37.744939,-122.148928,0.238327,3.0


In [150]:
len(df)

4678

In [151]:
# The name of this file is specified by the user in the INRIX data download portal
weights = pd.read_csv('/home/max/alameda_tmc_weights.csv')

In [117]:
weights.head()

Unnamed: 0,tmc_code,measurement_tstamp,speed,average_speed,reference_speed,travel_time_minutes,confidence_score,cvalue
0,105-04494,2018-06-04 09:00:00,45.29,43.0,59.0,2.15,30.0,100.0
1,105P10962,2018-06-04 09:00:00,24.59,23.0,28.0,0.5,30.0,72.46
2,105+13377,2018-06-04 09:00:00,14.02,14.0,17.0,1.09,24.29,41.25
3,105+10988,2018-06-04 09:00:00,13.98,14.0,13.0,0.24,26.79,67.86
4,105-13898,2018-06-04 09:00:00,35.29,35.0,39.0,1.31,21.96,17.86


In [119]:
len(weights)

4000

### Expand network file

In [61]:
exploded_nodes = pd.concat([
    df[['start_latitude', 'start_longitude']].rename(columns={'start_latitude':'latitude', 'start_longitude':'longitude'}), 
    df[['end_latitude', 'end_longitude']].rename(columns={'end_latitude':'latitude', 'end_longitude':'longitude'})])

### Extract nodes and edges

In [108]:
nodes = exploded_nodes.drop_duplicates()
nodes.index.name = 'node_id'
nodes.reset_index(inplace=True)

In [109]:
nodes.head()

Unnamed: 0,node_id,latitude,longitude
0,0,37.735439,-122.174808
1,1,37.739259,-122.166718
2,2,37.741482,-122.156987
3,3,37.741561,-122.156799
4,4,37.742957,-122.152482


#### Project nodes to x,y

In [99]:
inProj = Proj(init='epsg:4326')
outProj = Proj(init='epsg:2768')

In [134]:
x, y = [pd.Series(x) for x in transform(inProj, outProj, nodes['longitude'].values, nodes['latitude'].values)]
nodes['x'] = x
nodes['y'] = y

In [135]:
nodes.head()

Unnamed: 0,node_id,latitude,longitude,x,y
0,0.0,37.735439,-122.174808,1852389.0,638430.990963
1,1.0,37.739259,-122.166718,1853109.0,638842.128354
2,2.0,37.741482,-122.156987,1853971.0,639073.555769
3,3.0,37.741561,-122.156799,1853988.0,639082.028775
4,4.0,37.742957,-122.152482,1854371.0,639230.211297


In [66]:
edges = df[['tmc', 'start_latitude', 'start_longitude', 'end_latitude', 'end_longitude']]

In [67]:
edges.head()

Unnamed: 0,tmc,start_latitude,start_longitude,end_latitude,end_longitude
0,105+22231,37.735439,-122.174808,37.739259,-122.166718
1,105-22230,37.739259,-122.166718,37.735439,-122.174808
2,105P22223,37.741482,-122.156987,37.741561,-122.156799
3,105+22224,37.741561,-122.156799,37.742957,-122.152482
4,105+22225,37.742957,-122.152482,37.744939,-122.148928


### Get edge weights

In [142]:
merged = pd.merge(df, nodes, left_on=['start_latitude', 'start_longitude'], right_on=['latitude','longitude'], suffixes=('','_from'))

In [143]:
merged = pd.merge(merged, nodes, left_on=['end_latitude', 'end_longitude'], right_on=['latitude','longitude'], suffixes=('','_to'))

In [144]:
merged = merged[['tmc', 'node_id', 'node_id_to']].rename(columns={'node_id': 'from', 'node_id_to':'to'})

In [145]:
merged.head()

Unnamed: 0,tmc,from,to
0,105+22231,0.0,1.0
1,105-06481,436.0,1.0
2,105+08310,903.0,1.0
3,105-13400,0.0,980.0
4,105P10860,981.0,980.0


In [146]:
edges = pd.merge(edges, weights, left_on='tmc', right_on='tmc_code')[['from', 'to', 'travel_time_minutes']]

In [147]:
edges.head()

Unnamed: 0,from,to,travel_time_minutes
0,0,1,1.52
1,436,1,1.44
2,903,1,3.98
3,0,980,2.25
4,981,980,0.08
