# CP255: Using Pandana to Measure Multi-Modal Accessibility with Automobile, Pedestrian, and Transit Networks

Created by Sam Blanchard (sablanchard@berkeley.edu) 11/7/2016

This script demonstrates the basic functionality of Pandana (Pandas Network Analysis) including: acquisition of OpenStreetMap (OSM) street network data and nearest neighbor and cumulative accessibility queries on auto, pedestrian, and transit Pandana networks.

In [None]:
import pandana as pdna
# from pandana.loaders import osm
import pandas as pd
from ipywidgets import FloatSlider, interact
from IPython.core.display import display
from urbansim.utils import misc
import matplotlib.pyplot as plt
import matplotlib
from mpl_toolkits.basemap import Basemap

#Set number of unique Pandana networks that will be generated in this session
pdna.network.reserve_num_graphs(5)

%matplotlib inline

In [None]:
def plot_nodes(x_data=None,y_data=None,node_size=15,node_color='black',color_map=None,edge_color=None):
    fig, ax = plt.subplots()
    ax.scatter(x_data, y_data, s=node_size, c=node_color,
                   alpha=1, edgecolor=edge_color,cmap=color_map,
                   zorder=3)
    plt.show()
    return fig, ax

Pandana is a Python and C++ network analysis tool that can compute network accessibility using: 1) shortest path queries between ODs for any number of nodes within a search radius; and 2) aggregation queries using a cumulative opportunities accessibility method.  

A variety of statistics and can be used including sum, average, standard deviation, and count along with a number of distance decay functions such as linear and exponential. Pandana requires: 1) a set of OD node coordinates (e.g. based on addresses or Census block centroids) for which accessibility will be computed between and can include variables of interest such as socioeconomic data or business establishments that can be queried or aggregated; and 2) a network of nodes and weighted edges used for network routing. The OD nodes are connected to the nearest node in the graph network.

Pandana calculates the shortest path (e.g. lowest cost path) between ODs over a hierarchical network using the contraction hierarchies algorithm. Contraction hierarchies allows for rapid network calculations over large geographic extents and is an improvement upon the more traditional Dijkstra's shortest path algorithm. 

Pandana can be acquired as a open source library here: https://github.com/UDST/pandana

For more detailed information on Pandana see:

Fletcher Foti and Paul Waddell. 2014. "A Generalized Computational Framework for Accessibility: From the Pedestrian to the Metropolitan Scale"

Set data path

In [None]:
data_path = 'C:/Users/Sam/Dropbox/Work/github_projects/urbanaccess/demos/CP255_Pandana_Demo/data/'

# #1 Pandana networks

## Nodes

Pandana nodes consist of a unique "id" with spatial coordinates (latitude and longitude). Nodes are the vertices of a graph network representing street intersections.

In [None]:
nodes = pd.DataFrame({'id':[1,2,3,4],
                      'x':[-122.302578,-122.177008,-122.181374,-122.184170],
                      'y':[37.560184,37.481747,37.483689,37.484536]})

In [None]:
nodes

Plot the nodes on a map

In [None]:
plot_nodes(x_data=nodes['x'],
           y_data=nodes['y'],
           node_size=15,
           node_color='black',
           color_map=None,
           edge_color='#999999')

## Edges

Pandana edges consist of a "from" node id and "to" node id column which is used to denote direction and an impedance column or weight column representing a friction factor for travel between the two nodes, in this case distance in meters, but this can also be travel time or a utility value. Edges are the connections between nodes representing streets and pathways.

In [None]:
edges = pd.DataFrame({'from':[1,2,3],
                      'to':[2,3,4],
                      'distance_m':[3000,6000,8000]})

In [None]:
edges

You can convert the edge weights to represent any unit  
for example convert distance into pedestrian travel time at 3 MPH

In [None]:
SPEED_MPH = 3
edges['travel_time_min'] = (edges['distance_m']/1609.34) / SPEED_MPH * 60
edges

Pandana edges can be either "one way" or "two way". Lets convert the one way edge table above to a two way edge table

In [None]:
edges = edges.append(edges.rename({'from':'to','to':'from'})).reset_index(drop=True)
edges

# #1 Nearest neighbor query

### Download street network data: Auto network

Use Pandana to acquire the auto street network within a bounding box for Alameda County, CA from OpenStreetMap. Then remove low connectivity nodes from the graph network and save as a local hdf5 file. "drive" network type extracts auto only OSM street network components. "Two_way" denotes each edge in the network is traversable in both directions with the same weight.

Bounding box from: http://boundingbox.klokantech.com/

In [None]:
'''
%%time
h5file = 'osm_drive_2way_subset.h5'
network = pdna.loaders.osm.network_from_bbox(lat_min=37.454539, 
                                                lng_min=-122.342665, 
                                                lat_max=37.905668, 
                                                lng_max=-121.469214, 
                                                network_type='drive', 
                                                two_way=True)
lcn = network.low_connectivity_nodes(10000, 10, imp_name='distance')
network.save_hdf5(data_path+h5file, rm_nodes=lcn)
print 'OSM data save completed:', data_path, h5file
'''

Load previously generated auto network

In [None]:
%%time
h5file = 'osm_drive_2way_subset.h5'
osm_drive_nodes = pd.HDFStore(data_path+h5file).nodes
osm_drive_edges = pd.HDFStore(data_path+h5file).edges

## Inspect the network

### Nodes

In [None]:
print 'Total number of nodes:', len(osm_drive_nodes)
osm_drive_nodes.head()

### Edges

In [None]:
print 'Total number of edges:', len(osm_drive_edges)
osm_drive_edges.head()

## Initialize a Pandana network

Initialize the Pandana network object to be used in the nearest neighbor query

In [None]:
%%time
drive_net = pdna.Network(osm_drive_nodes["x"], 
                   osm_drive_nodes["y"], 
                   osm_drive_edges["from"], 
                   osm_drive_edges["to"],
                   osm_drive_edges[["distance"]],twoway=True)
print 'Network initialized'

## Load input data

### POI data

OSHPD Hospitals 2012 from: https://www.oshpd.ca.gov/documents/HWDD/GIS/HealthcareFacilities201210.zip  
CPAD Parks 2016 from: http://atlas.ca.gov/casil/planning/Land_Ownership/GreenInfoNetworkProject/CPAD-2016a-June2016/CPAD_2016a.zip

Shapefiles have been post-processed by converting each shapefile to a csv and adding explicit centroid coordinates columns.

Read data into a Pandas dataframe and subset for Alameda county and other attributes.

In [None]:
hospitals = pd.read_csv(data_path+'oshpd_points.csv')
hospitals = hospitals[hospitals['COUNTY'] == 'Alameda']
hospitals = hospitals[hospitals['TYPE'] == 'Hospital']
hospitals = hospitals[hospitals['FAC_STATUS'] == 'Open']
print 'Loaded', str(len(hospitals)), 'hospitals.'

In [None]:
hospitals.head()

In [None]:
plot_nodes(x_data=hospitals['X'],
           y_data=hospitals['Y'],
           node_size=15,
           node_color='black',
           color_map=None,
           edge_color='#999999')

In [None]:
parks = pd.read_csv(data_path+'cpad_points.csv')
parks = parks[parks['COUNTY'] == 'Alameda']
parks = parks[parks['ACCESS_TYP'] == 'Open Access']
print 'Loaded', str(len(parks)), 'parks.'

In [None]:
parks.head()

In [None]:
plot_nodes(x_data=parks['XCOORD'],
           y_data=parks['YCOORD'],
           node_size=15,
           node_color='black',
           color_map=None,
           edge_color='#999999')

## Calculate nearest neighbor

For each node on the network calculate the distance to the nearest 2 hospital and 2 park features within a 10 km network search radius.

Initialize the POIs on the network for two types of POIs with a max query distance of 10 km meters for the nearest 2 POIs. Then set the POIs on the network.

In [None]:
%%time
drive_net.init_pois(num_categories=2, max_dist=10000, max_pois=2)
drive_net.set_pois(category="parks", x_col=parks['XCOORD'], y_col=parks['YCOORD'])
drive_net.set_pois(category="hospitals", x_col=hospitals['X'], y_col=hospitals['Y'])

Calculate the distance between each network node and the nearest 2 POIs for each category within a 10 km network radius.

In [None]:
%%time
nearest_parks = drive_net.nearest_pois(distance=10000, category="parks", num_pois=2,max_distance=0)
nearest_hospitals = drive_net.nearest_pois(distance=10000, category="hospitals", num_pois=2,max_distance=0)

## View results

Inspect results: Network node id and network distance to first and second closest hospital within 10 km

In [None]:
nearest_hospitals[nearest_hospitals[1]>0].head()

## View results on a map

### Oakland: Distance to nearest hospital within 10 km

In [None]:
bbox = (37.6991981,-122.3426649,37.8847249,-122.1149234) #oakland
drive_net.plot(nearest_hospitals[1], 
         bbox=bbox,
         plot_type='scatter',
         fig_kwargs={'figsize':[20,14]},
         bmap_kwargs={'epsg':'26943','resolution':'h'},
         plot_kwargs={'cmap':'BrBG_r','s':8,'edgecolor':'none'})

### Alameda County: Distance to nearest hospital within 10 km

In [None]:
drive_net.plot(nearest_hospitals[1], 
         plot_type='scatter',
         fig_kwargs={'figsize':[20,14]},
         bmap_kwargs={'epsg':'26943','resolution':'h'},
         plot_kwargs={'cmap':'gist_heat_r','s':8,'edgecolor':'none'})

Interactively change the distance threshold

In [None]:
def net_query(distance):
    nearest_hospitals = drive_net.nearest_pois(distance=distance, 
                                         category="hospitals", 
                                         num_pois=2,
                                         max_distance=0)
    drive_net.plot(nearest_hospitals[1], 
         plot_type='scatter',
         fig_kwargs={'figsize':[20,14]},
         bmap_kwargs={'epsg':'26943','resolution':'h'},
         plot_kwargs={'cmap':'gist_heat_r','s':8,'edgecolor':'none'})

In [None]:
v = interact(net_query,distance=FloatSlider(min=0, max=10000, step=1000,continuous_update=False))
display(v)

### Alameda County: Distance to nearest park within 10 km

In [None]:
drive_net.plot(nearest_parks[1], 
         plot_type='scatter',
         fig_kwargs={'figsize':[20,14]},
         bmap_kwargs={'epsg':'26943','resolution':'h'},
         plot_kwargs={'cmap':'gist_heat_r','s':8,'edgecolor':'none'})

# #3 Cumulative opportunities accessibility query

## Load input variable and network data

Data includes Census blocks for Alameda County with block level synthesized job and household data (as agents)

In [None]:
%%time
blocks = pd.read_csv(data_path+'blocks_subset.csv', index_col='block_id')
jobs = pd.read_csv(data_path+'jobs_subset.csv', index_col='job_id')
households = pd.read_csv(data_path+'households_subset.csv', index_col='block_id')
print 'Loaded',str(len(blocks)),'blocks',str(len(jobs)),'jobs',str(len(households)),'households'

## Inspect the data

In [None]:
blocks.head()

In [None]:
jobs.head()

In [None]:
households.head()

## Create accessibility variables of interest and set them on the network

Aggregate job and worker agents to the block level

In [None]:
blocks['node_id'] = drive_net.get_node_ids(blocks['x'], blocks['y'])
blocks['jobs'] = jobs.groupby(jobs.block_id).size()
blocks['workers'] = households.groupby(households.index).workers.sum()
blocks = blocks.fillna(0)

set the variables on the Pandana network

In [None]:
drive_net.set(blocks.node_id, variable = blocks.jobs, name='jobs')
drive_net.set(blocks.node_id, variable = blocks.workers, name='workers')

## Precompute the network

Precompute the distance query from 0 to 3000 m to speed up the query. Note: This is memory intensive.

In [None]:
%%time
drive_net.precompute(3000)

## Calculate cumulative accessibility

For each block, calculate the cumulative number of total jobs accessible for blocks within a 1, 2, 3 km network radius.

In [None]:
%%time
jobs_1000 = drive_net.aggregate(1000, 
                              type='sum', 
                              decay='linear', 
                              name = 'jobs')
jobs_2000 = drive_net.aggregate(2000, 
                              type='sum', 
                              decay='linear', 
                              name = 'jobs')
jobs_3000 = drive_net.aggregate(3000, 
                              type='sum', 
                              decay='linear', 
                              name = 'jobs')

Do the same for the number of workers

In [None]:
%%time
workers_3000 = drive_net.aggregate(3000, 
                              type='sum', 
                              decay='linear', 
                              name = 'workers')

## View results

Combine all results into a Pandas dataframe

In [None]:
results_drive = pd.DataFrame({'jobs_1000':misc.reindex(jobs_1000, blocks.node_id),
                              'jobs_2000':misc.reindex(jobs_2000, blocks.node_id),
                              'jobs_3000':misc.reindex(jobs_3000, blocks.node_id),
                              'workers_3000':misc.reindex(workers_3000, blocks.node_id)})
results_drive.tail()

## View results on a map

### Alameda County: Total jobs within 1 km

In [None]:
drive_net.plot(jobs_1000, 
         plot_type='scatter',
         fig_kwargs={'figsize':[20,14]},
         bmap_kwargs={'epsg':'26943','resolution':'h'},
         plot_kwargs={'cmap':'gist_heat_r','s':8,'edgecolor':'none'})

### Alameda County: Total jobs within 2 km

In [None]:
drive_net.plot(jobs_2000, 
         plot_type='scatter',
         fig_kwargs={'figsize':[20,14]},
         bmap_kwargs={'epsg':'26943','resolution':'h'},
         plot_kwargs={'cmap':'gist_heat_r','s':8,'edgecolor':'none'})

### Alameda County: Total jobs within 3 km

In [None]:
drive_net.plot(jobs_3000, 
         plot_type='scatter',
         fig_kwargs={'figsize':[20,14]},
         bmap_kwargs={'epsg':'26943','resolution':'h'},
         plot_kwargs={'cmap':'gist_heat_r','s':8,'edgecolor':'none'})

### Alameda County: Total workers within 3 km

In [None]:
drive_net.plot(workers_3000, 
         plot_type='scatter',
         fig_kwargs={'figsize':[20,14]},
         bmap_kwargs={'epsg':'26943','resolution':'h'},
         plot_kwargs={'cmap':'gist_heat_r','s':8,'edgecolor':'none'})

# Transit+Pedestrian network

The transit+pedestrian network used here was created using UrbanAccess. UrbanAccess is a Python General Transit Feed Specification (GTFS) data acquisition, processing, and Pandana network creation tool designed to be used in tandem with Pandana for accessibility queries. UrbanAccess includes tools: 1) to connect and search GTFS data APIs; 2) validate GTFS data; 3) create individual agency or metropolitan scale transit networks; 4) compute headways; 5) penalize network impedance by transit mode.

The UrbanAccess library will soon be on UDST: https://github.com/UDST/

For more detailed information on UrbanAccess see:

Samuel D. Blanchard and Paul Waddell. Forthcoming. "UrbanAccess: A Generalized Methodology for Measuring Regional Accessibility with an Integrated Pedestrian and Transit Network" Transportation Research Record: Journal of the Transportation Research Board.

In [None]:
%%time
blocks = pd.read_csv(data_path+'blocks_subset.csv', index_col='block_id')
jobs = pd.read_csv(data_path+'jobs_subset.csv', index_col='job_id')
households = pd.read_csv(data_path+'households_subset.csv', index_col='block_id')

hdffile = 'transit_ped_network.h5'
transit_nodes = pd.HDFStore(data_path+hdffile).nodes
transit_edges = pd.HDFStore(data_path+hdffile).edges
transit_edges.drop('id', axis=1, inplace=True)

print 'Loaded',str(len(blocks)),'blocks',str(len(jobs)),'jobs',str(len(households)),'households'
print 'Loaded',str(len(transit_nodes)),'nodes',str(len(transit_edges)),'edges'

The transit network represents the AM Peak scheduled network of AC Transit from 7 am to 10 am with edges weighted by travel time. Pedestrian to transit connector edges have been weighted by the average route stop headways to represent expected passenger wait time. This network has been integrated with the pedestrian network which uses a standard walking speed of 3 MPH to calculate the pedestrian travel time.

In [None]:
%%time
transit_net = pdna.Network(transit_nodes["x"], 
                           transit_nodes["y"], 
                           transit_edges["from"], 
                           transit_edges["to"],
                           transit_edges[["weight"]],twoway=False)

In [None]:
%%time
%%capture
blocks['node_id'] = transit_net.get_node_ids(blocks['x'], blocks['y'])
blocks['jobs'] = jobs.groupby(jobs.block_id).size()
transit_net.set(blocks.node_id, variable = blocks.jobs, name='transit_jobs')

Run an aggregation query to calculate the total number of jobs accessible within a 20 minute travel time along the transit+pedestrian network.

In [None]:
%%time
transit_jobs_20 = transit_net.aggregate(20, type='sum', decay='linear', name = 'transit_jobs')

In [None]:
results_transit = pd.DataFrame({'transit_jobs_20':misc.reindex(transit_jobs_20, blocks.node_id)})
results_transit.tail()

## View results on a map

### Alameda County

### 20 min

In [None]:
bbox = (37.454539,-122.342665,37.905668,-121.469214) # alameda county
transit_net.plot(transit_jobs_20, 
         bbox=bbox,
         plot_type='scatter',
         fig_kwargs={'figsize':[20,14]},
         bmap_kwargs={'epsg':'3310','resolution':'h'},
         plot_kwargs={'cmap':'BrBG','s':8,'edgecolor':'none'})

In [None]:
bbox = (37.454539,-122.342665,37.905668,-121.469214) # alameda county
transit_net.plot(transit_jobs_20, 
         bbox=bbox,
         plot_type='scatter',
         fig_kwargs={'figsize':[20,14]},
         bmap_kwargs={'epsg':'3310','resolution':'h'},
         plot_kwargs={'cmap':'gist_heat_r','s':8,'edgecolor':'none'})

### Oakland

In [None]:
bbox = (37.6991981,-122.3426649,37.8847249,-122.1149234) #oakland
transit_net.plot(transit_jobs_20, 
         bbox=bbox,
         plot_type='scatter',
         fig_kwargs={'figsize':[20,14]},
         bmap_kwargs={'epsg':'3310','resolution':'h','area_thresh':100000000000000000},
         plot_kwargs={'cmap':'BrBG','s':30,'edgecolor':'none'})

You can combine census data with accessibility metrics in order to investigate patterns of low or high access neighborhoods and socioeconomics.

In [None]:
households.head()

In this case lets see what the relationship is between average household income and transit accessibility (using a 20 min travel time)

In [None]:
results_transit['av_inc'] = households.groupby(households.index).income.mean()
results_transit.head()

In [None]:
results_transit.plot.scatter(x='transit_jobs_20',y='av_inc')

# Compare accessibility using different networks: walking vs transit

# Pedestrian network

Download the pedestrian network from OSM: This includes all pedestrian accessible pathways including paths and stairways and omits auto only roads such as limited access highways

In [None]:
'''
%%time
h5file = 'osm_walk_2way_subset.h5'
network = pdna.loaders.osm.network_from_bbox(lat_min=37.454539, 
                                                lng_min=-122.342665, 
                                                lat_max=37.905668, 
                                                lng_max=-121.469214, 
                                                network_type='walk', 
                                                two_way=True)
lcn = network.low_connectivity_nodes(10000, 10, imp_name='distance')
network.save_hdf5(data_path+h5file, rm_nodes=lcn)
print 'OSM data save completed:', data_path, h5file
'''

Load the pedestrian network, convert edge weight to walking travel time, and calculate the total number of jobs accessible with a 20 min walk. Then combine these results with those of the transit network accessibility metric and compare.

In [None]:
%%time
blocks = pd.read_csv(data_path+'blocks_subset.csv', index_col='block_id')
jobs = pd.read_csv(data_path+'jobs_subset.csv', index_col='job_id')

h5file = 'osm_walk_2way_subset.h5'
osm_walk_nodes = pd.HDFStore(data_path+h5file).nodes
osm_walk_edges = pd.HDFStore(data_path+h5file).edges
print 'Loaded',str(len(osm_walk_nodes)),'nodes',str(len(osm_walk_edges)),'edges'

SPEED_MPH = 3
osm_walk_edges['travel_time_min'] = (osm_walk_edges['distance']/1609.34) / SPEED_MPH * 60
print 'Converted edge weight'

walk_net = pdna.Network(osm_walk_nodes["x"], 
                   osm_walk_nodes["y"], 
                   osm_walk_edges["from"], 
                   osm_walk_edges["to"],
                   osm_walk_edges[["travel_time_min"]],twoway=True)
print 'Network initialized'

blocks['node_id'] = walk_net.get_node_ids(blocks['x'], blocks['y'])
blocks['jobs'] = jobs.groupby(jobs.block_id).size()
walk_net.set(blocks.node_id, variable = blocks.jobs, name='jobs')
walk_jobs_20 = walk_net.aggregate(20, 
                              type='sum', 
                              decay='linear', 
                              name = 'jobs')
print 'Aggregation completed'

results_walk = pd.DataFrame({'walk_jobs_20':misc.reindex(walk_jobs_20, blocks.node_id)})
results_combined = results_transit.join(results_walk, how='left', sort=False)
results_combined = results_combined.join(blocks[['x','y']], how='left', sort=False)
results_combined['access_diff'] = results_combined['walk_jobs_20']-results_combined['transit_jobs_20']
results_combined.tail()

## View the difference between the two results on a map

Blue = Transit network provides more access to jobs than pedestrian alone

In [None]:
fig, ax = plt.subplots(figsize=[20,14])

bbox = (
    results_combined.y.min(),
    results_combined.x.min(),
    results_combined.y.max(),
    results_combined.x.max())

bmap = Basemap(bbox[1], bbox[0], bbox[3], bbox[2], ax=ax)
x, y = bmap(results_combined.x.values, results_combined.y.values)
plot = bmap.scatter(x, y, c=results_combined.access_diff.values, cmap='bwr',s=8,edgecolor='none')
bmap, fig, ax

In [None]:
fig, ax = plt.subplots(figsize=[20,14])
bbox = (37.6991981,-122.3426649,37.8847249,-122.1149234) # Oakland

bmap = Basemap(bbox[1], bbox[0], bbox[3], bbox[2], ax=ax)
x, y = bmap(results_combined.x.values, results_combined.y.values)
plot = bmap.scatter(x, y, c=results_combined.access_diff.values, cmap='bwr_r',s=8,edgecolor='none')
bmap, fig, ax