*This GOSTnets example workbook will focus on Dar Es Salaam. We will build a multi-modal network comprised of the city streets from OSM, and the transit network as represented by their GTFS feed. We will then work out the shortest path travel time for each of 900 households, with access to the transit network and without. We will close by calculating the effect on trip times that having access to the GTFS-defined transit network affords the average citizen.*

        Author: Charles Fox, G.O.S.T  |  SD Chief Economist Front Office

### Library Import

In [1]:
import geopandas as gpd
import pandas as pd
import networkx as nx
import peartree as pt
import osmnx as ox
import sys, os, time, json, copy
ghub = r'C:\Users\charl\Documents\GitHub\GOST\NetworkAnalysis\GOSTNets'
sys.path.append(ghub) # Allows system to look in the 'ghub' folder for any python libraries. Here used to import GOSTnets
import GOSTnet as gn
import importlib # This library lets you re-load libraries if you make a change to the underlying python
from shapely.geometry import Point

peartree version: 0.6.0 
networkx version: 2.2 
matplotlib version: 2.2.2 
osmnx version: 0.8.2 


### Define paths, file names

In [2]:
pth = r'C:\Users\charl\Documents\GOST\DarEsSalaam'
pointz = r'workplace_geo.csv'
feed_path = os.path.join(pth, r'GTFS.zip')

### Preparing the MultiModal Network Step 1: The Walking Graph
This function searches OSM for objects called 'Dar Es Salaam'. It is functionally equivalent to going to www.openstreetmap.org and manually entering the search term 'Dar Es Salaam'. If you don't get the desired area first time (it's a search function...) you can modify the result number the function takes as the area. To do this, follow the documentation steps here: 
https://osmnx.readthedocs.io/en/stable/osmnx.html 


Here, Dar Es Salaam gives the city relation, which is perfectly adequate for our purposes. We have imported the network type 'walk', which is all of the walkable streets in OSM. Switch keyword argument to 'drive' to get the driving network

In [3]:
%time G_OSM = ox.graph_from_place('Dar Es Salaam', network_type = 'walk')

Wall time: 2min 59s


This line checks to see if the graph is already simplified. It usually will be if it is from osmnx, but a useful check for other networkx objects

In [4]:
ox.simplify.is_simplified(G_OSM)

True

This GOSTnets command takes a grpah object and makes an equivalent GeoDataFrame of the edges. The partner function, gn.node_gdf_from_graph, does the same thing for the nodes of a target graph object

In [5]:
G_OSM_gdf = gn.edge_gdf_from_graph(G_OSM)

I wanted to get a summary of the roads broken down by highway type, for my personal interest. For this, each road must be labelled only as one type of highway. "highway = ['primary, secondary'] " is not admissable, nor helpful. 

Hence, this function iterates through each row, checks whether or not the contents of the column are a list, and if true, returns the first object in the list. Otherwise, it returns what it found. This removes all list objects, leaving only text strings.

In [6]:
def check(x):
    if type(x.highway) == list:
        return x.highway[0]
    else:
        return x.highway

G_OSM_gdf['highway'] = G_OSM_gdf.apply(lambda x: check(x), axis = 1)

Print the total number of edges in the dataframe

In [7]:
len(G_OSM_gdf)

268850

...that's a lot of edges. Let's get the summary of edges by the highway type as labelled in OpenStreetMap:

In [8]:
G_OSM_gdf.highway.value_counts()

residential       125322
unclassified       55913
footway            38031
path               27887
service             7464
tertiary            6632
secondary           2626
primary             1562
trunk               1322
track               1241
pedestrian           217
road                 198
steps                153
trunk_link           110
yes                   92
primary_link          46
secondary_link        20
tertiary_link         12
bridleway              2
Name: highway, dtype: int64

Send the GeoDataFrame to a .csv for visual inspection in QGIS / future reference. 

In [9]:
G_OSM_gdf.to_csv(os.path.join(pth, 'walkingnet.csv'))

In order for the network edges to be useful in calculating travel times, they need a value for how long it takes to 'traverse' that edge of the graph. There is a handy GOSTnets function for this, which automatically generates the traverse time for graphs which already include a 'length' property measured in metres. 

Users must specify what graph type they are working with. 

**'walk'** will return a traverse time measured in seconds, at the assumed walk speed (defined in kmph, default 4.5 for average human).

**'drive'** will define traverse times according to highway types. Although there is a built in default dictionary of assumed speeds by highway type, it is best to pass an explicit dictionary of key:value pairs to the function to avoid unexpected results. Here, we are trying to generate travel time estimates for walking across this OSM network, so 'walk' is used, and I re-affirm the walk speed as the default, at 4.5 kmph

In [10]:
Gwalk = G_OSM.copy()
%time Gwalk = gn.convert_network_to_time(Gwalk, distance_tag = 'length', graph_type = 'walk', speed_dict = None, walk_speed = 4.5)

Wall time: 2.27 s


This function prints out an example edge in (u, v, {data}) format, typical of Networkx edge objects

In [11]:
gn.example_edge(Gwalk, 1)

(4850712582, 4850712590, {'osmid': 493140573, 'highway': 'residential', 'oneway': False, 'length': 21.276, 'geometry': <shapely.geometry.linestring.LineString object at 0x0000020F6F2799E8>, 'time': 17.020799999999998, 'mode': 'walk'})


### Preparing the MultiModal Network Step 2: The GTFS Graph

Here I am making use of the peartree library to import a GTFS feed as a representative network graph. It is important to define the time of day for which we want to get the graph. This is because public transport is not like a road network - the 'edges' along which people can travel only exist at certain times of day, i.e. when buses are in service, for example. Here, I follow the peartree documentation example and use 7am to 9am as my time stretch to model a morning commute. We load this GTFS feed into a networkx object with 'load_feed_as_graph'. This will form another part of our multi-modal network.

In [12]:
feed = pt.get_representative_feed(feed_path)
start = 7 * 60 * 60
end = 9 * 60 * 60
%time Gtransit = pt.load_feed_as_graph(feed, start, end)

Wall time: 40.6 s


I pass this to a GeoDataFrame and send to .csv for future reference / visual inspection

In [13]:
Gtransit_gdf = gn.edge_gdf_from_graph(Gtransit)
Gtransit_gdf.to_csv(os.path.join(pth, 'transitnet.csv'))

One thing we do need to do is make sure every edge on the GTFS network has a property called 'time' in its data dictionary, to ensure that there is one 'key' across the combined multimodal network which will always represent the traverse time of an edge, no matter where it came from.

This will act as our impedance later for calculating journey times. As such, here we iterate through all edges and make a new property 'time', equal to 'length'. At first sight this looks counterintuitive - 'length' isn't time, right? But it is not, for Peartree loads GTFS feeds into graphs with the edge 'length' measured in seconds. Ergo, we have the value we need, it is just currently mis-labelled. As good practice we also add an edge 'mode' of 'GTFS transit', so we can pick apart the GTFS edges easily later.

In [14]:
gn.example_edge(Gtransit, 1)

for u, v, data in Gtransit.edges(data = True):
    data['time'] = data['length']
    data['mode'] = 'GTFStransit'

gn.example_edge(Gtransit, 1)

('4IVQV_0', '4IVQV_1', {'length': 65.33333333333333, 'mode': 'transit'})
('4IVQV_0', '4IVQV_1', {'length': 65.33333333333333, 'mode': 'GTFStransit', 'time': 65.33333333333333})


### Bind the graphs

Having prepared the walking and transit graphs individually, we now bind these together with GOSTnets' 'bind_graphs' command. The order of the graphs is important - the first is the base graph onto which we want to bind the smaller graph (second). 

The connection_threshold parameter is the distance in metres for which we will accept nodes to be 'bound' via the creation of a new edge. If there are no nodes on the other graph within 50m, no connection is made. the .crs of both objects is irrelevant, as this distance is always returned in projected metres, irrespective of the network objects passed to it. 

The speed parameter is the speed assumed for the new binding edges - as these trips are expected to be made on foot, the default is the same as our default walking speed of 4.5kmph.

This function takes some time to run for larger networks, hence the use of progress statements. These can be turned off by adding an optional parameter, 'verbose = False' to the function call.

In [15]:
importlib.reload(gn)
%time Gbound = gn.bind_graphs(Gwalk, Gtransit, name = 'GTFS', connection_threshold = 50, exempt_nodes = [], speed = 4.5)

peartree version: 0.6.0 
networkx version: 2.2 
matplotlib version: 2.2.2 
osmnx version: 0.8.2 
    finished binding 0 percent of nodes
    finished binding 10 percent of nodes
    finished binding 20 percent of nodes
    finished binding 30 percent of nodes
    finished binding 40 percent of nodes
    finished binding 50 percent of nodes
    finished binding 60 percent of nodes
    finished binding 70 percent of nodes
    finished binding 80 percent of nodes
    finished binding 90 percent of nodes
    finished binding 100 percent of nodes
Wall time: 10min 8s


Once again, we send the bound network to .csv for visual inspection.

In [16]:
Gbound_gdf = gn.edge_gdf_from_graph(Gbound)
Gbound_gdf.to_csv(os.path.join(pth, 'multinet.csv'))

We also save the graphs we will use for travel time calculations in a handy format that will allow recall later

In [17]:
nx.write_gml(Gbound, os.path.join(pth, 'multinet.gml'))
nx.write_gml(Gwalk, os.path.join(pth, 'walknet.gml'))

NetworkXError: 'streets_per_node' is not a valid key

### Preparing the Journey File 
Having prepared the network, we now prepare the journey file which describes which trips will be made. 

We import the file of origins and destinations as a standard pandas dataframe using pd.read_csv

In [36]:
points = pd.read_csv(os.path.join(pth, pointz))

points.head(3)

Unnamed: 0,gpsLatitude,gpsLongitude,ID,job_latitude,job_longitude
0,-6.788069,39.270149,5,-6.815389,39.284908
1,-6.788004,39.269943,10,-6.78206,39.268879
2,-6.788637,39.245564,11,-6.867843,39.272415


Here, we define new columns, 'origin' and 'destination', which are shapely objects of the Lat / Long points for the origins and destinations. Longitude always comes first. We initially generate a tuple, then pass the tuple to shapely's Point() method to generate shapely point instances. These have geometric properties, as opposed to just being a string.

In [37]:
points['origin'] = list(zip(points.gpsLongitude, points.gpsLatitude))
points['origin'] = points['origin'].apply(lambda x: Point(x))
points['destination'] = list(zip(points.job_longitude, points.job_latitude))
points['destination'] = points['destination'].apply(lambda x: Point(x))

GOSTnets' snap_points_to_graph function requires a GeoDataFrame, so we generate one - by defining which column is the geometry (we will start with the origin col), and passing in the definition of the CRS (here, WGS 84). 

In [38]:
points = gpd.GeoDataFrame(points, crs = {'init':'epsg:4326'}, geometry = 'origin')

We run the point snapper, which returns the nearest node's ID in a new column called 'Nearest_node'

In [39]:
importlib.reload(gn)

%time points_nn = gn.snap_points_to_graph(Gwalk, points, geomcol = 'origin')

peartree version: 0.6.0 
networkx version: 2.2 
matplotlib version: 2.2.2 
osmnx version: 0.8.2 
Wall time: 14.3 s


This isn't very helpful if the file contains both origins and destinations, as we will need the IDs of the closest node for both origin and destination points (one for each). Therefore, we rename the column we just calculated as 'Origin_node'.

In [40]:
points_nn = points_nn.rename(columns = {'Nearest_node':'Origin_node'})

In [41]:
points_nn.head(3)

Unnamed: 0,gpsLatitude,gpsLongitude,ID,job_latitude,job_longitude,origin,destination,Origin_node
0,-6.788069,39.270149,5,-6.815389,39.284908,POINT (39.270149 -6.7880688),POINT (39.284908 -6.815389200000001),3500168844
1,-6.788004,39.269943,10,-6.78206,39.268879,POINT (39.269943 -6.788003900000001),POINT (39.268879 -6.7820597),1910420787
2,-6.788637,39.245564,11,-6.867843,39.272415,POINT (39.245564 -6.7886367),POINT (39.272415 -6.8678427),3457193660


We re-define the geometry property of the GeoDataFrame as the destination column. for more info on why this is relevant and necessary, check out: http://geopandas.org/data_structures.html

In [42]:
points_nn = points_nn.set_geometry('destination')

We now re-run snap_points, asking it instead to snap the destination points to the graph and return the nearest node

In [43]:
importlib.reload(gn)

%time points_nn = gn.snap_points_to_graph(Gwalk, points_nn, geomcol = 'destination')

peartree version: 0.6.0 
networkx version: 2.2 
matplotlib version: 2.2.2 
osmnx version: 0.8.2 
Wall time: 14.1 s


We rename the resulting column as 'Destination_node' for ease of keeping track of what's going on

In [44]:
points_nn = points_nn.rename(columns = {'Nearest_node':'Destination_node'})

We check to make sure the dataframe looks how we want it to after the snapping process has been completed. We print the first 3 rows:

In [45]:
points_nn.head(3)

Unnamed: 0,gpsLatitude,gpsLongitude,ID,job_latitude,job_longitude,origin,destination,Origin_node,Destination_node
0,-6.788069,39.270149,5,-6.815389,39.284908,POINT (39.270149 -6.7880688),POINT (39.284908 -6.815389200000001),3500168844,3727522143
1,-6.788004,39.269943,10,-6.78206,39.268879,POINT (39.269943 -6.788003900000001),POINT (39.268879 -6.7820597),1910420787,4699740135
2,-6.788637,39.245564,11,-6.867843,39.272415,POINT (39.245564 -6.7886367),POINT (39.272415 -6.8678427),3457193660,4714173941


Save our prepared dataframe as a .csv

In [46]:
points_nn.to_csv(os.path.join(pth, 'preparedpoints.csv'))

### Running the travel time calculations

Re-import saved versions of our prepared graphs and points file

In [48]:
#Gbound = nx.read_gml(os.path.join(pth, 'multinet.gml'))
#Gwalk = nx.read_gml(os.path.join(pth, 'walknet.gml'))
points = pd.read_csv(os.path.join(pth, 'preparedpoints.csv'))

For each origin and destination we now have an approximate node start and end point. We also have a fully connected multi-modal network with a consistently labelled traverse time for each edge. We are ready to start calculating travel times at this point.

Networkx' shortest path calculation requires that we pass it the origins, the destination, and the time. So, we iterate through our DataFrame like so:

In [53]:
points_nn['TT_multi'] = 0
points_nn['TT_walking'] = 0

for i in range(0, len(points_nn)):
    origin = points_nn.Origin_node.loc[i]
    destination = points_nn.Destination_node.loc[i] 
    try:
        points_nn['TT_multi'].loc[i] = nx.shortest_path_length(Gbound, source=origin, target=destination, weight='time')
    except:
        points_nn['TT_multi'].loc[i] = None
    try:
        points_nn['TT_walking'].loc[i] = nx.shortest_path_length(Gwalk, source=origin, target=destination, weight='time')
    except:
        points_nn['TT_walking'].loc[i] = None
    if i % 100 == 0 and i != 0:
        print('%d trips done' % i)
    elif i == len(points_nn):
        print('Analysis complete')

100 trips done
200 trips done
300 trips done
400 trips done
500 trips done
600 trips done
700 trips done
800 trips done
900 trips done


Convert values in seconds to minutes; identify travel time reduction from utilizing transit network

In [55]:
out = points_nn.copy()
out['TT_walking'] = out['TT_walking'] / 60
out['TT_multi'] = out['TT_multi'] / 60
out['perf_improvement'] = 1 - (out['TT_multi'] / out['TT_walking'])

Calculate average trip time reduction from use of the transit network

In [57]:
print('Average performance improvement: %d percent' % int(out['perf_improvement'].mean() * 100))

Average performance improvement: 38 percent


Send the results to .csv

In [58]:
out.to_csv(os.path.join(pth, 'output.csv'))

Comments? Questions? cfox1@worldbank.org