### Extension B - Catchment by Clinic

This is the final analysis that has been done for the Yemen project. It aims to identify the number of people that can access a clinic within a given time frame, and also the number of unique users - the people who can ONLY access a given healthcare facility within the time frame, and no subsitute clinic.

This process is very closely modelled on Step 4 - Generate Results. Read that notebook first to get a feel for what is going on. It breaks from this process at the point labelled 'BREAK' - annotations will begin from there. 

In [703]:
import pandas as pd
import os, sys
sys.path.append(r'/home/public/GOST_PublicGoods/GOSTNets/GOSTNets')
sys.path.append(r'C:\Users\charl\Documents\GitHub\GOST')
import GOSTnet as gn
import importlib
import geopandas as gpd
import rasterio as rt
from rasterio import features
from shapely.wkt import loads
import numpy as np
import networkx as nx
from shapely.geometry import box, Point, Polygon

### Settings

In [704]:
walking = 1 # set to 1 for walking
conflict = 1 # set to 1 to prevent people from crossing warfronts
zonal_stats = 1 # set to 1 to produce summary zonal stats layer
facility_type = 'HOS'   # Options: 'HOS' or 'PHC' or 'ALL'
year = 2018   # default = 2018
service_index = 0 # Set to 0 for all services / access to hospitals

services = ['ALL',
            'Antenatal',
            'BEmONC',
            'CEmONC',
            'Under_5',
            'Emergency_Surgery',
            'Immunizations',
            'Malnutrition',
            'Int_Outreach']

### Import All-Destination OD

In [705]:
basepth = r'/home/wb493355/data/yemen/Round 3'
pth = os.path.join(basepth, 'graphtool')
util_path = os.path.join(basepth, 'util_files')
srtm_pth = os.path.join(basepth, 'SRTM')

In [706]:
if conflict == 1: 
    conflict_tag = 'ConflictAdj'
    appendor = 'Jan24th'
else:
    conflict_tag = 'NoConflict'
    appendor = 'normal'
    
if walking == 1:
    type_tag = 'walking'
    net_name = r'walk_graph.pickle'
    appendor = 'normal'
else:
    type_tag = 'driving'
    net_name = r'G_salty_time_conflict_adj.pickle'

YEHNP = 1

OD_pth = pth
net_pth = pth

OD_name = r'OD_%s_%s_%s.csv' % (appendor, type_tag, year)

WGS = {'init':'epsg:4326'}
measure_crs = {'init':'epsg:32638'}

subset = r'%s_24th_HERAMS_%s_%s_%s_%s' % (type_tag, facility_type, services[service_index], conflict_tag, year)
    
if YEHNP == 1:
    subset = subset+'_YEHNP_only'
elif YEHNP == -1:
    subset = subset+'_Excl_YEHNP'
        
print("Output files will have name: ", subset)
print("network: ",net_name)
print("OD Matrix: ",OD_name)
print("Conflict setting: ",conflict_tag)
                                            
offroad_speed = 4

Output files will have name:  walking_24th_HERAMS_HOS_ALL_ConflictAdj_2018_YEHNP_only
network:  walk_graph.pickle
OD Matrix:  OD_normal_walking_2018.csv
Conflict setting:  ConflictAdj


In [707]:
OD = pd.read_csv(os.path.join(OD_pth, OD_name))
OD = OD.rename(columns = {'Unnamed: 0':'O_ID'})
OD = OD.set_index('O_ID')
OD = OD.replace([np.inf, -np.inf], np.nan)
OD_original = OD.copy()

### Optional: Subset to Accepted Nodes

In [708]:
acceptable_df = pd.read_csv(os.path.join(OD_pth, 'HeRAMS 2018 April_snapped.csv'))

# Adjust for facility type
if facility_type == 'HOS':
    acceptable_df = acceptable_df.loc[acceptable_df['Health Facility Type Coded'].isin(['1',1])]
elif facility_type == 'PHC':
    acceptable_df = acceptable_df.loc[acceptable_df['Health Facility Type Coded'].isin([2,'2',3,'3'])]
elif facility_type == 'ALL':
    pass
else:
    raise ValueError('unacceptable facility_type entry!')

# Adjust for facility type
if YEHNP == 1 and facility_type == 'HOS':
    acceptable_df = acceptable_df.loc[(acceptable_df['YEHNP_Hospitals'] == 1)]
elif YEHNP == 1 and facility_type == 'PHC':
    acceptable_df = acceptable_df.loc[(acceptable_df['YEHNP_PHCs'] == 1)]
elif YEHNP == -1 and facility_type == 'HOS':
    acceptable_df = acceptable_df.loc[(acceptable_df['YEHNP_Hospitals'] != 1)]
elif YEHNP == -1 and facility_type == 'PHC':
    acceptable_df = acceptable_df.loc[(acceptable_df['YEHNP_PHCs'] != 1)]

# Adjust for functionality in a given year
acceptable_df = acceptable_df.loc[acceptable_df['Functioning %s' % year].isin(['1','2',1,2])]

# Adjust for availability of service

SERVICE_DICT = {'Antenatal_2018':'ANC 2018',
               'Antenatal_2016':'Antenatal Care (P422) 2016',
               'BEmONC_2018':'Basic emergency obstetric care 2018',
               'BEmONC_2016':'Basic Emergency Obsteteric Care (P424) 2016',
               'CEmONC_2018':'Comprehensive emergency obstetric care 2018',
               'CEmONC_2016':'Comprehensive Emergency Obstetric Care (S424) 2016',
               'Under_5_2018':'Under 5 clinics 2018',
               'Under_5_2016':'Under-5 clinic services (P23) 2016',
               'Emergency_Surgery_2018':'Emergency and elective surgery 2018',
               'Emergency_Surgery_2016':'Emergency and Elective Surgery (S14) 2016',
               'Immunizations_2018':'EPI 2018',
               'Immunizations_2016':'EPI (P21a) 2016',
               'Malnutrition_2018':'Malnutrition services 2018',
               'Malnutrition_2016':'Malnutrition services (P25) 2016',
               'Int_Outreach_2018':'Integrated outreach (IMCI+EPI+ANC+Nutrition_Services) 2018',
               'Int_Outreach_2016':'Integrated Outreach (P22) 2016'}

if service_index == 0:
    pass
else:
    acceptable_df = acceptable_df.loc[acceptable_df[SERVICE_DICT['%s_%s' % (services[service_index],year)]].isin(['1',1])]
print(len(acceptable_df))

64


In [709]:
acceptable_df['geometry'] = acceptable_df['geometry'].apply(loads)
acceptable_gdf = gpd.GeoDataFrame(acceptable_df, geometry = 'geometry', crs = {'init':'epsg:4326'})
accepted_facilities = list(set(list(acceptable_df.NN)))
accepted_facilities_str = [str(i) for i in accepted_facilities]
OD = OD_original[accepted_facilities_str]
acceptable_df.to_csv(os.path.join(basepth,'output_layers','Round 3','%s.csv' % subset))
print(OD_original.shape)
print(OD.shape)

(36624, 3824)
(36624, 64)


### Define function to add elevation to a point GeoDataFrame

In [710]:
def add_elevation(df, x, y, srtm_pth):
    # walk all tiles, find path
    
    tiles = []
    for root, folder, files in os.walk(os.path.join(srtm_pth,'high_res')):
        for f in files:
            if f[-3:] == 'hgt':
                tiles.append(f[:-4])

    # load dictionary of tiles
    arrs = {}
    for t in tiles:
        arrs[t] = rt.open(os.path.join(srtm_pth, 'high_res', '{}.hgt'.format(t), '{}.hgt'.format(t)), 'r')

    # assign a code
    uniques = []
    df['code'] = 'placeholder'
    def tile_code(z):
        E = str(z[x])[:2]
        N = str(z[y])[:2]
        return 'N{}E0{}'.format(N, E)
    df['code'] = df.apply(lambda z: tile_code(z), axis = 1)
    unique_codes = list(set(df['code'].unique()))
    
    z = {}
    # Match on High Precision Elevation
    property_name = 'elevation'
    for code in unique_codes:
        
        df2 = df.copy()
        df2 = df2.loc[df2['code'] == code]
        dataset = arrs[code]
        b = dataset.bounds
        datasetBoundary = box(b[0], b[1], b[2], b[3])
        selKeys = []
        selPts = []
        for index, row in df2.iterrows():
            if Point(row[x], row[y]).intersects(datasetBoundary):
                selPts.append((row[x],row[y]))
                selKeys.append(index)
        raster_values = list(dataset.sample(selPts))
        raster_values = [x[0] for x in raster_values]

        # generate new dictionary of {node ID: raster values}
        z.update(zip(selKeys, raster_values))
        
    elev_df = pd.DataFrame.from_dict(z, orient='index')
    elev_df.columns = ['elevation']
    
    missing = elev_df.copy()
    missing = missing.loc[missing.elevation < 0]
    if len(missing) > 0:
        missing_df = df.copy()
        missing_df = missing_df.loc[missing.index]
        low_res_tifpath = os.path.join(srtm_pth, 'clipped', 'clipped_e20N40.tif')
        dataset = rt.open(low_res_tifpath, 'r')
        b = dataset.bounds
        datasetBoundary = box(b[0], b[1], b[2], b[3])
        selKeys = []
        selPts = []
        for index, row in missing_df.iterrows():
            if Point(row[x], row[y]).intersects(datasetBoundary):
                selPts.append((row[x],row[y]))
                selKeys.append(index)
        raster_values = list(dataset.sample(selPts))
        raster_values = [x[0] for x in raster_values]
        z.update(zip(selKeys, raster_values))

        elev_df = pd.DataFrame.from_dict(z, orient='index')
        elev_df.columns = ['elevation']
    df['point_elev'] = elev_df['elevation']
    df = df.drop('code', axis = 1)
    return df

### Define function to convert distances to walk times

In [711]:
def generate_walktimes(df, start = 'point_elev', end = 'node_elev', dist = 'NN_dist', max_walkspeed = 6, min_speed = 0.1):
    # Tobler's hiking function: https://en.wikipedia.org/wiki/Tobler%27s_hiking_function
    def speed(incline_ratio, max_speed):
        walkspeed = max_speed * np.exp(-3.5 * abs(incline_ratio + 0.05)) 
        return walkspeed

    speeds = {}
    times = {}

    for index, data in df.iterrows():
        if data[dist] > 0:
            delta_elevation = data[end] - data[start]
            incline_ratio = delta_elevation / data[dist]
            speed_kmph = speed(incline_ratio = incline_ratio, max_speed = max_walkspeed)
            speed_kmph = max(speed_kmph, min_speed)
            speeds[index] = (speed_kmph)
            times[index] = (data[dist] / 1000 * 3600 / speed_kmph)

    speed_df = pd.DataFrame.from_dict(speeds, orient = 'index')
    time_df = pd.DataFrame.from_dict(times, orient = 'index')

    df['walkspeed'] = speed_df[0]
    df['walk_time'] = time_df[0]
    
    return df

### Add elevation for destination nodes

In [712]:
dest_df = acceptable_df[['NN','NN_dist','Latitude','Longitude']]
dest_df = add_elevation(dest_df, 'Longitude','Latitude', srtm_pth).set_index('NN')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


### Add elevation from graph nodes (reference)

In [713]:
G = nx.read_gpickle(os.path.join(OD_pth, net_name))
G_node_df = gn.node_gdf_from_graph(G)
G_node_df = add_elevation(G_node_df, 'x', 'y', srtm_pth)
match_node_elevs = G_node_df[['node_ID','point_elev']].set_index('node_ID')
match_node_elevs.loc[match_node_elevs.point_elev < 0] = 0

### Match on node elevations for dest_df; calculate travel times to nearest node

In [714]:
dest_df['node_elev'] = match_node_elevs['point_elev']
dest_df = generate_walktimes(dest_df, start = 'node_elev', end = 'point_elev', dist = 'NN_dist', max_walkspeed = offroad_speed)
dest_df = dest_df.sort_values(by = 'walk_time', ascending = False)

### Add Walk Time to all travel times in OD matrix

In [715]:
dest_df = dest_df[['walk_time']]
dest_df.index = dest_df.index.map(str)

d_f = OD.transpose()

for i in d_f.columns:
    dest_df[i] = d_f[i]
    
for i in dest_df.columns:
    if i == 'walk_time':
        pass
    else:
        dest_df[i] = dest_df[i] + dest_df['walk_time']

dest_df = dest_df.drop('walk_time', axis = 1)

dest_df = dest_df.transpose()

### Import Shapefile Describing Regions of Control

In [716]:
if conflict == 1:
    conflict_file = r'merged_dists_%s.shp' % year
elif conflict == 0:
    conflict_file = r'NoConflict.shp'
merged_dists = gpd.read_file(os.path.join(util_path, conflict_file))
if merged_dists.crs != {'init':'epsg:4326'}:
    merged_dists = merged_dists.to_crs({'init':'epsg:4326'})
merged_dists = merged_dists.loc[merged_dists.geometry.type == 'Polygon']

### Factor in lines of Control - Import Areas of Control Shapefile

In [717]:
# Intersect points with merged districts shapefile, identify relationship

def AggressiveSpatialIntersect(points, polygons):
    import osmnx as ox
    spatial_index = points.sindex
    container = {}
    cut_geoms = []
    for index, row in polygons.iterrows():
        polygon = row.geometry
        if polygon.area > 0.5:
            geometry_cut = ox.quadrat_cut_geometry(polygon, quadrat_width=0.5)
            cut_geoms.append(geometry_cut)
            print('cutting geometry %s into %s pieces' % (index, len(geometry_cut)))
            index_list = []
            for P in geometry_cut:
                possible_matches_index = list(spatial_index.intersection(P.bounds))
                possible_matches = points.iloc[possible_matches_index]
                precise_matches = possible_matches[possible_matches.intersects(P)]
                if len(precise_matches) > 0:
                    index_list.append(precise_matches.index)
                flat_list = [item for sublist in index_list for item in sublist]
                container[index] = list(set(flat_list))
        else:
            possible_matches_index = list(spatial_index.intersection(polygon.bounds))
            possible_matches = points.iloc[possible_matches_index]
            precise_matches = possible_matches[possible_matches.intersects(polygon)]
            if len(precise_matches) > 0:
                container[index] = list(precise_matches.index)
    return container

In [718]:
graph_node_gdf = gn.node_gdf_from_graph(G)

In [719]:
gdf = graph_node_gdf.copy()
gdf = gdf.set_index('node_ID')
possible_snap_nodes = AggressiveSpatialIntersect(graph_node_gdf, merged_dists)
print('**bag of possible node snapping locations has been successfully generated**')

cutting geometry 56 into 121 pieces
cutting geometry 59 into 47 pieces
cutting geometry 78 into 236 pieces
cutting geometry 79 into 16 pieces
**bag of possible node snapping locations has been successfully generated**


### Load Grid

In [721]:
# Match on network time from origin node (time travelling along network + walking to destination)
if year == 2018:
    year_raster = 2018
elif year == 2016:
    year_raster = 2015
grid_name = r'origins_1km_%s_snapped.csv' % year_raster

grid = pd.read_csv(os.path.join(OD_pth, grid_name))
grid = grid.rename({'Unnamed: 0':'PointID'}, axis = 1)
grid['geometry'] = grid['geometry'].apply(loads)
grid_gdf = gpd.GeoDataFrame(grid, crs = WGS, geometry = 'geometry')
grid_gdf = grid_gdf.set_index('PointID')

### Adjust Nearest Node snapping for War

In [722]:
origin_container = AggressiveSpatialIntersect(grid_gdf, merged_dists)
print('bag of possible origins locations has been successfully generated')

bundle = []
for key in origin_container.keys():
    origins = origin_container[key]
    possible_nodes = graph_node_gdf.loc[possible_snap_nodes[key]]
    origin_subset = grid_gdf.loc[origins]
    origin_subset_snapped = gn.pandana_snap_points(origin_subset, 
                                possible_nodes, 
                                source_crs = 'epsg:4326', 
                                target_crs = 'epsg:32638', 
                                add_dist_to_node_col = True)
    bundle.append(origin_subset_snapped)

grid_gdf_adjusted = pd.concat(bundle)

cutting geometry 56 into 121 pieces
cutting geometry 59 into 47 pieces
cutting geometry 78 into 236 pieces
cutting geometry 79 into 16 pieces
bag of possible origins locations has been successfully generated


  G_tree = spatial.KDTree(target_gdf[['x','y']].as_matrix())
  distances, indices = G_tree.query(source_gdf[['x','y']].as_matrix())


In [723]:
grid_gdf = grid_gdf_adjusted

In [724]:
# Add origin node distance to network - walking time
grid = grid_gdf
grid = add_elevation(grid, 'Longitude','Latitude', srtm_pth)
grid = grid.reset_index()
grid['O_ID'] = grid['NN']
grid = grid.set_index('NN')
grid['node_elev'] = match_node_elevs['point_elev']
grid = grid.set_index('PointID')
grid = generate_walktimes(grid, start = 'point_elev', end = 'node_elev', dist = 'NN_dist', max_walkspeed = offroad_speed)
grid = grid.rename({'node_elev':'nr_node_on_net_elev', 
                    'walkspeed':'walkspeed_to_net', 
                    'walk_time':'walk_time_to_net',
                   'NN_dist':'NN_dist_to_net',
                   'O_ID':'NN',
                   'Unnamed: 0.1':'PointID'}, axis = 1)

### Adjust acceptable destinations for each node for the war

In [725]:
gdf = graph_node_gdf.copy()
gdf['node_ID'] = gdf['node_ID'].astype('str')
gdf = gdf.loc[gdf.node_ID.isin(list(dest_df.columns))]
gdf = gdf.set_index('node_ID')

dest_container = AggressiveSpatialIntersect(gdf, merged_dists)

gdf = graph_node_gdf.copy()
gdf = gdf.loc[gdf.node_ID.isin(list(dest_df.index))]
gdf = gdf.set_index('node_ID')

origin_snap_container = AggressiveSpatialIntersect(gdf, merged_dists)

cutting geometry 56 into 121 pieces
cutting geometry 59 into 47 pieces
cutting geometry 78 into 236 pieces
cutting geometry 79 into 16 pieces
cutting geometry 56 into 121 pieces
cutting geometry 59 into 47 pieces
cutting geometry 78 into 236 pieces
cutting geometry 79 into 16 pieces


# BREAK

From this point the script diverges from Step 4 - Generate Results. 

In this cell, we DO NOT use a min function to work out the closest destination to each origin cell. Instead, we merge on to the grid the travel time to ALL relevant destinations in the same polygon of homogeneous control that is accessible. 

In [726]:
bundle = []

for key in origin_snap_container.keys():
    
    # print which polygon we are looking at
    print('\nregion:',key)
    
    # identify bundle of origin, dest nodes inside region
    origins = origin_snap_container[key]
    print('number of origin nodes in this reigon:',len(origins))
    destinations = dest_container[key]
    print('number of destination nodes in this reigon:',len(destinations))
    
    # get part of OD that is relevant
    relevant_dests = dest_df.copy()
    relevant_dests = relevant_dests[destinations].loc[origins]
    print('How many destination facilities in this region?',len(relevant_dests.columns))
    
    # get part of grid that is relevant
    relevant_grid = grid.copy()
    relevant_grid = relevant_grid.loc[origin_container[key]]
    print('How many origin grid cells in this region?',len(relevant_grid))
    
    # match on dest-df
    relevant_grid = relevant_grid.set_index('NN')
    for i in relevant_dests.columns:
        relevant_grid[i] = relevant_dests[i]
    
    # append to bundle for reconstruction
    bundle.append(relevant_grid)
    
combo_grid = pd.concat(bundle)
combo_grid['PointID_copy'] = combo_grid['PointID']
combo_grid = combo_grid.set_index('PointID')
print(len(combo_grid[dest_df.columns].loc[241487].unique()))


region: 56
number of origin nodes in this reigon: 20249
number of destination nodes in this reigon: 34
How many destination facilities in this region? 34
How many origin grid cells in this region? 145348

region: 59
number of origin nodes in this reigon: 1058
number of destination nodes in this reigon: 1
How many destination facilities in this region? 1
How many origin grid cells in this region? 4371

region: 70
number of origin nodes in this reigon: 328
number of destination nodes in this reigon: 2
How many destination facilities in this region? 2
How many origin grid cells in this region? 1172

region: 78
number of origin nodes in this reigon: 14015
number of destination nodes in this reigon: 25
How many destination facilities in this region? 25
How many origin grid cells in this region? 393576

region: 79
number of origin nodes in this reigon: 974
number of destination nodes in this reigon: 2
How many destination facilities in this region? 2
How many origin grid cells in this regio

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.




26


Here, we add the walk time to the network to the on-network time. This is the best way of representing the 'drive time' to each destination

In [727]:
combo_grid2 = combo_grid.copy()

# add on walk time
for i in dest_df.columns:
    combo_grid2[i].loc[combo_grid2[i].isna() == False] = combo_grid2[i].loc[combo_grid2[i].isna() == False] + combo_grid2['walk_time_to_net'].loc[combo_grid2[i].isna() == False]

    
combo_grid2 = combo_grid2.drop(['NN_dist_to_net','walk_time_to_net','walkspeed_to_net'], axis = 1)
grid = combo_grid2

print(len(combo_grid2[dest_df.columns].loc[241487].unique()))

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


26


### Calculate Direct Walking Time (not using road network), vs. network Time

The output of this block was not factored in, in the end. The problem was that, identifying the closest facility to the origin point is not useful when you are trying to retain the access time to ALL facilities in the same homogenous region. Ergo, we do not use this section. 

In [728]:
bundle = []

W = graph_node_gdf.copy()
W['node_ID'] = W['node_ID'].astype(str)
W = W.set_index('node_ID')

locations_gdf = gpd.GeoDataFrame(acceptable_df, geometry = 'geometry', crs = {'init':'epsg:4326'})
locations_container = AggressiveSpatialIntersect(locations_gdf, merged_dists)

for key in origin_container.keys():
    origins = origin_container[key]
    origin_subset = grid.copy()
    origin_subset = origin_subset.loc[origins]
    locations = locations_gdf.loc[locations_container[key]]
    if len(locations) < 1:
        origin_subset['NN'] = None
        origin_subset['NN_dist'] = None
        bundle.append(origin_subset)
    else:
        origin_subset_snapped = gn.pandana_snap_points(origin_subset, 
                                locations, 
                                source_crs = 'epsg:4326', 
                                target_crs = 'epsg:32638', 
                                add_dist_to_node_col = True)
        bundle.append(origin_subset_snapped)

grid_gdf_adjusted = pd.concat(bundle)
grid = grid_gdf_adjusted
print(len(grid[dest_df.columns].loc[241487].unique()))

cutting geometry 56 into 121 pieces
cutting geometry 59 into 47 pieces
cutting geometry 78 into 236 pieces
cutting geometry 79 into 16 pieces
26


### Generate summary by Destination

Here, for each time threshold, we return a binary version of the OD-matrix if the destination facility has a travel time beneath the threshold time, and sum for each facility. We also identify instances of where there is only one valid facility for a given origin - this will happen when the sum of a row is equal to the only value in the row. 

In [729]:
ag1 = grid.fillna(99999999999).copy()

uniques, totals, test_frames_A, test_frames_B = {}, {}, {}, {}

for thresh in [30, 60, 120, 240]:
        
    ag2 = ag1.copy()
    
    def convert(x, thresh):
        if 0 < x < (thresh * 60):
            return 1
        else:
            return 0
    
    # identify pop under thresh
    for i in Dest_IDs:
        ag2[i] = ag2[i].fillna(-1)
        ag2[i] = ag2[i].apply(lambda x: convert(x, thresh))
    
    # add Valid column. counts up number of cells beneath travel time threshold. 
    ag2['VALID'] = ag2[Dest_IDs].sum(axis = 1)
    
    test_frames_A[thresh] = ag2.copy()
    
    # multiply through by population value
    for i in Dest_IDs:
        ag2[i] = ag2[i] * ag2['VALUE']
        
    # generate total of pop that can access a destination in under thresh mins
    ag2['total'] = ag2[Dest_IDs].sum(axis = 1)
    
    test_frames_B[thresh] = ag2.copy()
    
    # compare to total, zero out values where less than total (i.e. not unique)
    res_uniques, res_totals = [], []
    
    for i in Dest_IDs:
        ag3 = ag2.copy()
        res_uniques.append(ag3[i].loc[ag3['VALID'] == 1].sum())
        res_totals.append(ag3[i].loc[ag3['VALID'] > 0].sum())
                 
    uniques[thresh] = res_uniques
    totals[thresh] = res_totals

Finally, we generate the results DataFrame.

In [730]:
res_df = pd.DataFrame({'catchment_30':totals[30],
                      'catchment_60':totals[60],
                      'catchment_120':totals[120],
                      'catchment_240':totals[240],
                      'unique_30': uniques[30], 
                      'unique_60': uniques[60],
                      'unique_120': uniques[120],
                      'unique_240': uniques[240],
                      'NN':Dest_IDs},
                    index = Dest_IDs)

# Generate 'fraction of catchment that is uniquely served by this facility' statistics. 
res_df['pct_unique_30'] = res_df['unique_30'] / res_df['catchment_30']
res_df['pct_unique_60'] = res_df['unique_60'] / res_df['catchment_60']
res_df['pct_unique_120'] = res_df['unique_120'] / res_df['catchment_120']
res_df['pct_unique_240'] = res_df['unique_240'] / res_df['catchment_240']

acceptable_df_res = acceptable_df.copy()
acceptable_df_res['NN'] = acceptable_df_res['NN'].astype('str')
acceptable_df_res = acceptable_df_res.set_index('NN')
acceptable_df_res = acceptable_df_res.merge(res_df, on = 'NN')
acceptable_df_res = acceptable_df_res.sort_values(by = 'pct_unique_30', ascending = False)

We visualize it here to make sure it is what we want

In [735]:
col_list = ['catchment_30','catchment_60','catchment_120','unique_30','unique_60','unique_120','pct_unique_30','pct_unique_60','pct_unique_120']
acceptable_df_res.head(30)

Unnamed: 0.1,NN,Unnamed: 0,FID,Functionality,YEHNP_PHCs,YEHNP_Hospitals,Name of_Health_Facility,Latitude,Longitude,Name of Governorate,...,catchment_120,catchment_240,unique_30,unique_60,unique_120,unique_240,pct_unique_30,pct_unique_60,pct_unique_120,pct_unique_240
0,31142,39,1102001,1.0,,1.0,Yarim .H,14.304533,44.37,Ibb,...,106671.0,218656.0,44244.0,77271.0,106671.0,206411.0,1.0,1.0,1.0,0.943999
1,14344,227,1111011,1.0,,1.0,Aladin H,13.9656,44.008767,Ibb,...,42751.0,138371.0,5098.0,12578.0,42751.0,132366.0,1.0,1.0,1.0,0.956602
32,119219,2540,1924006,1.0,,1.0,Hajar Aljoul rural H (Al Faqeed Al Shadli H),14.460983,48.279683,Hadramout,...,4263.0,9560.0,1321.0,2678.0,4263.0,9560.0,1.0,1.0,1.0,1.0
34,55109,2749,2005004,1.0,,1.0,26September rural H,14.491833,44.016667,Dhamar,...,33511.0,126731.0,3139.0,9390.0,33511.0,126731.0,1.0,1.0,1.0,1.0
35,27210,2869,2008017,1.0,,1.0,Dhamar public H,14.553317,44.391533,Dhamar,...,257069.0,360203.0,89579.0,219683.0,257069.0,358011.0,1.0,1.0,1.0,0.993915
36,51916,3034,2107001,1.0,,1.0,Al Shaheed Al Defaiaah Hospital,14.808703,45.719971,Shabwah,...,29990.0,57195.0,12100.0,19264.0,29990.0,57195.0,1.0,1.0,1.0,1.0
37,78261,3127,2113011,1.0,,1.0,Ataq general Hospital,14.53915,46.830867,Shabwah,...,21445.0,38469.0,7594.0,17307.0,21445.0,38469.0,1.0,1.0,1.0,1.0
38,104045,3171,2116008,1.0,,1.0,Azzan Hospital,14.329533,47.448367,Shabwah,...,16992.0,44241.0,4639.0,7390.0,16992.0,44241.0,1.0,1.0,1.0,1.0
39,79326,3347,2214008,1.0,,1.0,Kitaf Health Center,17.034533,44.1084,Sa'adah,...,4296.0,12847.0,1348.0,2648.0,4296.0,12847.0,1.0,1.0,1.0,1.0
40,82462,3361,2215001,1.0,,1.0,Al Jumhoori Hospital,16.940883,43.768117,Sa'adah,...,121733.0,267776.0,36836.0,63296.0,121733.0,267776.0,1.0,1.0,1.0,1.0


...and we save our output down.

In [736]:
outi = os.path.join(basepth, 'output_layers', 'catchment')
acceptable_df_res.to_csv(os.path.join(outi, subset+'_catchment.csv'))