## Generate Results

In this notebook, we will import and manipulate the OD matrix - which should have been calculated separately using the graphtool methodology. We assume this has been done, and in turn, that an OD_matrix.csv file exists in the graphtool folder


In [3]:
import pandas as pd
import os, sys
sys.path.append(r'C:\Users\charl\Documents\GitHub\GOST_PublicGoods\GOSTNets\GOSTNets')
import GOSTnet as gn
import importlib
importlib.reload(gn)
import geopandas as gpd
import rasterio
from rasterio import features
from shapely.wkt import loads
import numpy as np

peartree version: 0.6.1 
networkx version: 2.3 
matplotlib version: 3.0.3 
osmnx version: 0.9 
peartree version: 0.6.1 
networkx version: 2.3 
matplotlib version: 3.0.3 
osmnx version: 0.9 


Set path locations

In [59]:
basepth = r'C:\Users\charl\Documents\GOST\SierraLeone'
graphtool_pth = os.path.join(basepth, 'graphtool')
net_pth = basepth
pckle = r'final_G.pickle'
flood = 'flood'
if flood == 'flood':
    OD_name = r'OD_matrix_flood.csv'
else:
    OD_name = r'OD_matrix.csv'

Settings. Subset refers to which destination file we will be using to subset the larger OD matrix

In [60]:
walk_speed = 3
WGS = {'init':'epsg:4326'}
measure_crs = {'init':'epsg:3857'}
date = 'May2019'
dest_type = 'schools'
subset = r'%s_%s_%s' % (dest_type, date, flood)

### Import All-Destination OD

In [61]:
OD = pd.read_csv(os.path.join(graphtool_pth, OD_name))
OD = OD.rename(columns = {'Unnamed: 0':'O_ID'})
OD = OD.set_index('O_ID')
OD = OD.replace([np.inf, -np.inf], np.nan)

OD_original = OD
print(OD_original.shape)

(60905, 8309)


### Optional: Subset to Accepted Nodes

Load the destination file which we are currently analyzing.

In [62]:
acceptable_df = pd.read_csv(os.path.join(graphtool_pth, '%s_snapped.csv' % dest_type))

Convert this pandas DataFrame to a geopandas GeoDataFrame

In [63]:
acceptable_df['geometry'] = acceptable_df['geometry'].apply(loads)
acceptable_gdf = gpd.GeoDataFrame(acceptable_df, geometry = 'geometry', crs = {'init':'epsg:4326'})

Generate a list of the unique snapped-to nodes in this destination file. Subset the entire OD-matrix accordingly

In [64]:
accepted_facilities = list(set(list(acceptable_df.NN)))
accepted_facilities_str = [str(i) for i in accepted_facilities]
OD = OD_original[accepted_facilities_str]
acceptable_df.to_csv(os.path.join(basepth,'Output','%s.csv' % subset))

In [65]:
OD.shape

(60905, 7470)

### Add Walk Time from Final Node to Destination

Here, we collapse the acceptable DF (basically, the destinations) to just the nearest node and nearest node distance column. 

Then, we work out the approximate striaght-line walking time to the node, and then add this value to the OD matrix values (which represents on-network times) to the destination.

Note, we do this in a quite clumsy way as more than one destination may snap to a given node (e.g. in a rural area). As such, we move the entire O-D matrix on to the dest_df object using a for-loop, and then min along this axis later once the travel time from network to the destination has been added to each OD matrix value. I know this is horrid but I have tried and failed to find a better way. 

For large calculations, this may require a lot of RAM

In [66]:
dest_df = acceptable_df[['NN','NN_dist']]

dest_df = dest_df.set_index('NN')

dest_df['NN_dist'] = dest_df['NN_dist'] / 1000 * 3600 / walk_speed

dest_df.index = dest_df.index.map(str)

d_f = OD.transpose()

for i in d_f.columns:
    dest_df[i] = d_f[i]
    
for i in dest_df.columns:
    if i == 'NN_dist':
        pass
    else:
        dest_df[i] = dest_df[i] + dest_df['NN_dist']

dest_df = dest_df.drop('NN_dist', axis = 1)

dest_df = dest_df.transpose()

dest_df['min_time'] = dest_df.min(axis = 1)

### Prepare Origin Layer for Travel Time Binding

At this point, we have two of three of our component times calculated: 

1.) Walking time from the exact DESTINATION point, to the network; 

2.) Drive time on the network from each origin node to the nearest destination.

We still need to add the walking time from the ORIGIN to the network.

Now, we re-import our origin points, and set the snapped-to node as the index. Bear in mind this WILL involve duplicates.

In [67]:
grid_name = r'origins_100m_snapped.csv'
grid = pd.read_csv(os.path.join(graphtool_pth, grid_name))
grid = grid.rename(columns = {'NN':'O_ID','NN_dist':'walk_to_road_net'})
grid = grid.set_index(grid['O_ID'])

We add a field for each origin point called 'on_network_time'. This is us moving the OD matrix on to the grid (or at least, the min() value - i.e. the on network travel time from each origin node to its closest facility)

In [68]:
grid['on_network_time'] = dest_df['min_time']

Next, we add walk time by translating the walk_to_road_net distance column into a time

In [69]:
grid['walk_to_road_net'] = grid['walk_to_road_net'] / 1000 * 3600 / walk_speed 

Then we add this to the On-network time to get the total travel time from any given origin node to its closest destination

In [70]:
grid['total_time_net'] = grid['on_network_time'] + grid['walk_to_road_net']

In some cases, it may be faster to simply walk directyly to your destination - never using the road network. This prevents spuriously high travel times which assumes a long snapping distance, short on-network travel, then long walk from the nearest node to the final destination.

In [71]:
grid['geometry'] = grid['geometry'].apply(loads)
o_2_d = gpd.GeoDataFrame(grid, crs = {'init':4326}, geometry = 'geometry')

In [72]:
o_2_d = gn.pandana_snap_points(o_2_d, 
                               acceptable_gdf, 
                               source_crs='epsg:4326',
                               target_crs='epsg:3857',
                               add_dist_to_node_col = True)

o_2_d['walk_time_direct'] = o_2_d['NN_dist'] / 1000 * 3600 / walk_speed

grid['walk_time_direct'] = o_2_d['walk_time_direct']

Now, we want the time plotted to be the minimum of directly walking to your destination, and using the road network to get there. So, we take the minimum of the 'total network time' and the time it would take if you walked in a straight line to your destination. 

For ease of organization, we make it abundantly clear what the units are by including them in the column names.

In [73]:
grid['PLOT_TIME_SECS'] = grid[['walk_time_direct','total_time_net']].min(axis = 1)
grid['PLOT_TIME_MINS'] = grid['PLOT_TIME_SECS'] / 60

It is also useful to know which model is being used for any given cell (whether they are using the network, or not). We record these results in the 'choice' column, as defined below

In [74]:
def choice(x):
    if x.walk_time_direct < x.total_time_net:
        return 'walk'
    else:
        return 'net'

grid['choice'] = grid.apply(lambda x: choice(x), axis = 1)

Here, we can whether the majority of people are using straight line walking rather than the network (the assumption being they use the most time efficient method of getting from A to B):

In [75]:
grid['choice'].value_counts()

walk    5727218
net     2860687
Name: choice, dtype: int64

### Generate Output Raster from Origin Tif

In this section, we take the original raster layer, and burn in the travel times as contain in the grid GeoDataFrame

In [76]:
rst_fn = os.path.join(os.path.join(basepth, 'Origins', 'WP', 'SLE14adjv1.tif'))
out_fn = os.path.join(basepth,'Output','%s.tif' % subset)

# Update metadata
rst = rasterio.open(rst_fn, 'r')
meta = rst.meta.copy()
D_type = rasterio.float64
meta.update(compress='lzw', dtype = D_type, count = 2)

with rasterio.open(out_fn, 'w', **meta) as out:
    with rasterio.open(rst_fn, 'r') as pop:
        
        # this is where we create a generator of geom, value pairs to use in rasterizing
        shapes = ((geom,value) for geom, value in zip(grid.geometry, grid.PLOT_TIME_MINS))

        population = pop.read(1).astype(D_type)
        cpy = population.copy()

        travel_times = features.rasterize(shapes=shapes, fill=0, out=cpy, transform=out.transform)

        out.write_band(1, population)
        out.write_band(2, travel_times)

### OPTIONAL: Generate change rasters

When we have two similar scenarios that we want to compare, it is useful to generate a change raster to help us pick out the major differences. 

This process generates a tri band raster. The bands are as follows:

Band 1 - The basic input population. This is a copy of the original WorldPop information.
Band 2 - The travel time delta between the two scenarios.
Band 3 - The population weighted change - i.e. the delta multiplied by the population. This allows us to see from a utilitarian perspective where the largest utility changes are occuring between the two scenarios. 

This block involves a series of custom inputs (e.g. choice of file names and file paths) - check it carefully before executing. 

In [5]:
for service_type in ['schools']:
    for second_type in ['Senior Secondary','Primary','Pre-Primary','Junior Secondary']:
        
        WGS = {'init':'epsg:4326'}

        basepth = r'C:\Users\charl\Documents\GOST\SierraLeone\Output'

        subset = r'%s_May2019_%s' % (service_type, second_type)

        pre_raster = os.path.join(basepth, '%s_%s' % (subset,'no_flood.tif'))
        post_raster = os.path.join(basepth, '%s_%s' % (subset,'flood.tif'))
        out_fn = os.path.join(basepth,'%s_%s_change.tif' % (service_type,second_type))

        pre = rasterio.open(pre_raster, 'r')
        arr_pre = pre.read(2)

        post = rasterio.open(post_raster, 'r')
        arr_post = post.read(2)

        delta = arr_pre - arr_post

        # Update metadata
        rst_fn = pre_raster
        rst = rasterio.open(rst_fn, 'r')
        meta = rst.meta.copy()
        D_type = rasterio.float64
        meta.update(compress='lzw', dtype = D_type, count = 3)

        with rasterio.open(out_fn, 'w', **meta) as out:
            with rasterio.open(rst_fn, 'r') as pop:

                population = pop.read(1).astype(D_type)

                out.write_band(1, population)
                out.write_band(2, delta)
                out.write_band(3, delta * population)

### OPTIONAL: Automated processing

This block is a copy of the above, but designed to execute different scenarios in a loop fashion. Only use this when comfortable with the above processing steps. There are no comments below as it is a clone of the above process - follow that, and you can follow this. 

In [5]:
basepth = r'C:\Users\charl\Documents\GOST\SierraLeone'
graphtool_pth = os.path.join(basepth, 'graphtool')
net_pth = basepth
pckle = r'final_G.pickle'
flood = 'no_flood'

if flood == 'flood':
    OD_name = r'OD_matrix_flood.csv'
else:
    OD_name = r'OD_matrix.csv'

OD = pd.read_csv(os.path.join(graphtool_pth, OD_name))
OD = OD.rename(columns = {'Unnamed: 0':'O_ID'})
OD = OD.set_index('O_ID')
OD = OD.replace([np.inf, -np.inf], np.nan)

walk_speed = 3
WGS = {'init':'epsg:4326'}
measure_crs = {'init':'epsg:3857'}
date = 'May2019'
dest_type = 'schools'

template_df = pd.read_csv(os.path.join(graphtool_pth, '%s_snapped.csv' % dest_type))
template_df['geometry'] = template_df['geometry'].apply(loads)

template_grid = pd.read_csv(os.path.join(graphtool_pth, r'origins_100m_snapped.csv'))
template_grid = template_grid.rename(columns = {'NN':'O_ID','NN_dist':'walk_to_road_net'})
template_grid = template_grid.set_index(template_grid['O_ID'])

for sub_type in template_df.SCHOOL_LEVEL.unique():

    subset = r'%s_%s_%s_%s' % (dest_type, date, sub_type, flood)

    OD_cut = OD.copy()
    acceptable_df = template_df.copy()
    grid = template_grid.copy()
    
    acceptable_df = acceptable_df.loc[acceptable_df.SCHOOL_LEVEL == sub_type]
    acceptable_gdf = gpd.GeoDataFrame(acceptable_df, geometry = 'geometry', crs = {'init':'epsg:4326'})

    accepted_facilities = list(set(list(acceptable_df.NN)))
    accepted_facilities_str = [str(i) for i in accepted_facilities]
    OD_cut = OD_cut[accepted_facilities_str]
    
    acceptable_df.to_csv(os.path.join(basepth,'Output','%s.csv' % subset))

    dest_df = acceptable_df[['NN','NN_dist']]
    dest_df = dest_df.set_index('NN')
    dest_df['NN_dist'] = dest_df['NN_dist'] / 1000 * 3600 / walk_speed
    dest_df.index = dest_df.index.map(str)
    d_f = OD_cut.transpose()

    for i in d_f.columns:
        dest_df[i] = d_f[i]

    for i in dest_df.columns:
        if i == 'NN_dist':
            pass
        else:
            dest_df[i] = dest_df[i] + dest_df['NN_dist']

    dest_df = dest_df.drop('NN_dist', axis = 1)
    dest_df = dest_df.transpose()
    dest_df['min_time'] = dest_df.min(axis = 1)

    grid['on_network_time'] = dest_df['min_time']
    grid['walk_to_road_net'] = grid['walk_to_road_net'] / 1000 * 3600 / walk_speed 
    grid['total_time_net'] = grid['on_network_time'] + grid['walk_to_road_net']
    grid['geometry'] = grid['geometry'].apply(loads)
    o_2_d = gpd.GeoDataFrame(grid, crs = {'init':4326}, geometry = 'geometry')

    o_2_d = gn.pandana_snap_points(o_2_d, 
                                   acceptable_gdf, 
                                   source_crs='epsg:4326',
                                   target_crs='epsg:3857',
                                   add_dist_to_node_col = True)

    o_2_d['walk_time_direct'] = o_2_d['NN_dist'] / 1000 * 3600 / walk_speed

    grid['walk_time_direct'] = o_2_d['walk_time_direct']

    grid['PLOT_TIME_SECS'] = grid[['walk_time_direct','total_time_net']].min(axis = 1)
    grid['PLOT_TIME_MINS'] = grid['PLOT_TIME_SECS'] / 60

    def choice(x):
        if x.walk_time_direct < x.total_time_net:
            return 'walk'
        else:
            return 'net'

    grid['choice'] = grid.apply(lambda x: choice(x), axis = 1)

    rst_fn = os.path.join(os.path.join(basepth, 'Origins', 'WP', 'SLE14adjv1.tif'))
    out_fn = os.path.join(basepth,'Output','%s.tif' % subset)

    rst = rasterio.open(rst_fn, 'r')
    meta = rst.meta.copy()
    D_type = rasterio.float64
    meta.update(compress='lzw', dtype = D_type, count = 2)

    with rasterio.open(out_fn, 'w', **meta) as out:
        with rasterio.open(rst_fn, 'r') as pop:

            # this is where we create a generator of geom, value pairs to use in rasterizing
            shapes = ((geom,value) for geom, value in zip(grid.geometry, grid.PLOT_TIME_MINS))

            population = pop.read(1).astype(D_type)
            cpy = population.copy()

            travel_times = features.rasterize(shapes=shapes, fill=0, out=cpy, transform=out.transform)

            out.write_band(1, population)
            out.write_band(2, travel_times)