# Analysis of Emergency Obstetric Care (EmOC) in Kano
> Note: All notebooks need the [environment dependencies](https://github.com/GIScience/openrouteservice-examples#local-installation)
> as well as an [openrouteservice API key](https://openrouteservice.org/dev/#/signup) to run

prepare environment dependencies document

## Abstract
The rapid growth of urban areas has put substantial pressure on local services and infrastructure, particularly in African cities. With migrants moving into cities and transient households moving within cities, traditional means of collecting data (e.g., censuses and household surveys) are inadequate and often overlook informal settlements and households. As a consequence, there is a chronic lack of basic data about deprived households and entire settlements. Given that urban poor residents rely predominantly on private and informal service providers for healthcare and other services, they are rarely captured in routine service data, including health information management systems. This is even more critical for women in need of maternal health care. 

Considering the different phases of maternity: antenatal care, interpartern or delivery, and postnatal care, the team decided to focus on interpartern or delivery phase being the most critical. The intertwined relationship between maternal health care and urban deprivation has been documented and described in the literature [Abascal et al., 2022](https://doi.org/10.1016/j.compenvurbsys.2022.101770). The IDEAMAPS Data Ecosystem team aims to analyse the conditions in which vulnerable communities relate to emergency maternal care (EmOC) in the city of Kano. To do so, the analysis is divided into three main components: 
1. **EmOC Offer**: Based on the geospatial database of travel times [(Macharia et al., 2023)](https://doi.org/10.1038/s41597-023-02651-9) and the team's field validation, we characterised 145 HC facilities offering EmOC in Kano, their service levels and relative costs.
2. **EmOC Accessibility**: The team used different routing services, including the OSM-based openrouteservice API, to calculate the travel times to the nearest EmOC facility for each 100x100m grid cell in Kano. 
3. **EmOC Demand**: The team discussed a set of socio-economic factors that determine the way communities from slums and other deprived areas demand or interact with EmOC services such as available income, employment, education, age, medical practitioners' age and gender as well as religious beliefs and social practices. despite not having access to specific data, the team discussed the potential impacts on demand for EmOC services in Kano based on these factors.



### Workflow:

The notebook gives an overview of the distribution of centres offering EmOC in Kano, their classification and how they can be accessed by car. Open source data from OpenStreetMap and tools (such as the openrouteservice) were used to create accessibility measures such as travel times and isochrones. Spatial analysis and other data analytics functions led to generating outputs within the 100x100m grid cells that categorised them into three levels: low, medium, and high.

* **Preprocessing**: Get data for EmOC facilities.
* **Analysis for Offer**:
    * Filter and classify EmOC facilities based on discussed criteria.
    * Visualise EmOC faccilities in their categories.
* **Analysis for Accessibility**:
    * Compute travel times to facilities using openrouteservice API or other routing services.
    * Generate areas for low, medium and high categories based on discussed criteria.
* **Analysis for Demmand**:
    * Derive socio-economic descriptors based on discussed criteria.
* **Result**: Visualize results as maps and export model outputs.


### Datasets and Tools:
* [A geospatial database of close to reality travel times to obstetric emergency care in 15 Nigerian conurbations](https://figshare.com/s/8868db0bf3fd18a9585d) - A curated list of health care facilities offering EmOC in Nigeria [(Macharia et al., 2023)](https://doi.org/10.1038/s41597-023-02651-9).
* [openrouteservice](https://openrouteservice.org/) - generate isochrones on the OpenStreetMap road network


# Python Workflow

This study integrates various Python geospatial analysis libraries and packages to support spatial data processing, visualization, and isochrone generation. The os module is used to interact with the operating system, managing file paths and reading environment variables such as API keys. folium library along with its MarkerCluster plugin, facilitates the creation of interactive maps for visualizing large-scale geospatial data. The openrouteservice.client serves as an interface to the OpenRouteService API, enabling the extraction of isochrones. pandas library for data analysis, provides functions for analyzing, cleaning, exploring, and manipulating data, while fiona supports reading and writing real-world data using multi-layered GIS formats, such as shapefiles. The shapely package is employed for the manipulation and analysis of planar geometric objects.

## Setting up the virtual environment

```bash
# Create a new virtual environment
python -m venv .venv
activate .venv/bin/activate
pip install -r requirements.txt
```

## To run your notebook in VS Code

```bash
pip install -U ipykernel
python -m ipykernel install --user --name=.venv
```

In [1]:
import os
from IPython.display import display
import requests

import folium
from folium.plugins import MarkerCluster
import openrouteservice
import time

import time
import pandas as pd
import numpy as np
import fiona as fn
import geopandas as gpd
from shapely.geometry import shape, mapping
from shapely.geometry import Point
from shapely.geometry import box
from scipy.spatial import cKDTree
from tqdm import tqdm

import rasterio
from rasterio.transform import xy
from rasterio.mask import mask
import rasterstats as rs

import math



## Preprocessing
In this study, users first requested an ORS Matrix API key from the [OpenRouteService](https://openrouteservice.org/) platform and subsequently interacted with the OpenRouteService API through the instantiation of the OpenRouteService client. This is the OpenRouteService [API documentation](https://openrouteservice.org/dev/#/api-docs/introduction) for ORS Core-Version 9.0.0. 

Generate a [API Key](https://openrouteservice.org/dev/#/home?tab=1) (Token) it is necessary to sign up at the OpenRouteService dashboard by using your E-mail address or sign up with your GitHub. After logging in, go to the Dashboard by clicking on your profile icon and navigate to the API Keys section. Click "Create API Key" to generate a free key and then choose a service plan (the free plan has limited requests per day). Copy the API Key and store it securely. 

OpenRouteService primarily uses API keys for authentication. However, if a token is required for certain endpoints, you can send a request with your API key in the Authorization header. This process facilitated various geospatial analysis functions, including isochrone generation.

### API Key
Make sure you have a .env file in the root directory with the following content:
```bash
    OPENROUTESERVICE_API_KEY='your_api_key'
```

In [3]:
# Read the api key from the .env file
from dotenv import load_dotenv
%load_ext dotenv
%dotenv
api_key = os.getenv('OPENROUTESERVICE_API_KEY')
client = openrouteservice.Client(key=api_key)

cannot find .env file


ValueError: No API key was specified. Please visit https://openrouteservice.org/sign-up to create one.

For this study different kind of data were used. The dataset on healthcare facilities is sourced from a research ([Macharia, P.M. et al., 2023](https://doi.org/10.1038/s41597-023-02651-9)) which provides A geospatial database of close-to-reality travel times to obstetric emergency care in 15 Nigerian conurbations. The dataset were filtered by state name to isolate facilities in Kano and converted CSV file to shapefile based on coordinates using [QGIS](https://qgis.org/). 

The Level 2 administrative boundary data is sourced from [Humanitarian Data Exchange](https://data.humdata.org/) were used to correlate the isochrones and healthcare facility distribution with specific administrative regions. The data were filtered based on the administrative region name (lganame) to focus the analysis on Kano.

Despite being official, administrative boundaries may not reflect the actual patterns of human settlement or economic activity. Therefore, the team used the Functional Urban Area (FUA) as a complementary definition of the study areas. The FUA is defined by [the Joint Research Centre of the European Commission](https://commission.europa.eu/about/departments-and-executive-agencies/joint-research-centre_en) as the actual urban sprawl and human activities, encompassing the core city and economically or socially integrated surrounding regions. The FUA was obtained from [the Global Human Settlement Layer (GHSL) ](https://human-settlement.emergency.copernicus.eu/)dataset, which provides spatial data for functional urban areas worldwide. 

* [Datasets of health facilities](https://doi.org/10.6084/m9.figshare.22689667.v2) (15/07/2023)
* [Shapefile of district boundaries](https://data.humdata.org/dataset/nigeria-admin-level-2) - Admin Level 2 (data from Humanitarian Data Exchange, 25/11/2015)
* [Functional Urban Areas](https://human-settlement.emergency.copernicus.eu/download.php?ds=FUA) - data from Global Human Settlement Layer(2015)

In [2]:
# Set paths to access data
# Define directories
data_inputs = '../scripts/data_inputs/'
data_temp = '../scripts/data_temp/'
data_outputs = '../scripts/data_outputs/'

## 1. Data Collection

### Validated healthcare facilities
note: to describe the process to validate healthcare facilities

In [3]:
healthcare_facilities_validated = gpd.read_file(data_inputs + 'healthcare_facilities_validated_Mar2025.geojson')

### Population Grid Data (1km resolution) from WorldPop
note: explain the rational for female population between 15-49 years old

In [4]:
study_area = gpd.read_file(data_inputs + '100mGrid.gpkg')
raster_path = data_inputs + 'nga_f_15_49_2015_1km.tif'

Clipping the population data to our study area

In [None]:
with rasterio.open(raster_path) as dataset:
    geometries = [study_area.geometry.unary_union.__geo_interface__]
    clipped_image, clipped_transform = mask(dataset, geometries, crop=True)
    band1 = clipped_image[0] # Read the first band of the raster

Calculating the centroids for grid cells

In [None]:
rows, cols = np.where(band1 > 0)
grid_cells = [clipped_transform * (col + 0.5, row + 0.5) for row, col in zip(rows, cols)]
population_values = band1[rows, cols]

In [None]:
grid_df = pd.DataFrame(grid_cells, columns=["longitude", "latitude"])
grid_df["population"] = population_values

grid_df["grid_code"] = np.random.choice(range(10000, 99999), size=len(grid_df), replace=False)
centroids_gdf = gpd.GeoDataFrame(grid_df, geometry=[Point(xy) for xy in zip(grid_df["longitude"], grid_df["latitude"])])
centroids_gdf.set_crs("EPSG:4326", inplace=True)

centroids_gdf.to_file(data_temp + "population_centroids.gpkg", driver="GPKG")

In [None]:
centroids_gdf

Unnamed: 0,longitude,latitude,population,grid_code,geometry
0,8.472917,12.024583,3054.140869,53937,POINT (8.47292 12.02458)
1,8.481250,12.024583,3737.392578,17840,POINT (8.48125 12.02458)
2,8.489583,12.024583,5439.910156,54264,POINT (8.48958 12.02458)
3,8.497917,12.024583,4532.464355,18673,POINT (8.49792 12.02458)
4,8.506250,12.024583,1577.261353,76531,POINT (8.50625 12.02458)
...,...,...,...,...,...
138,8.539583,11.941250,1152.257324,76416,POINT (8.53958 11.94125)
139,8.547917,11.941250,1221.242432,63699,POINT (8.54792 11.94125)
140,8.556250,11.941250,1514.486938,35666,POINT (8.55625 11.94125)
141,8.564583,11.941250,1321.376831,62382,POINT (8.56458 11.94125)


## 2. Spatial Analysis Pipeline 
### Using OpenRouteService (ORS) Matrix API to calculate the travel time and distance from each population grid centroid to the healthcare facility 

note: this will generate a file 'OD_matrix_healthcare_pop_grid'

In [None]:
origin_gdf = centroids_df
origin_name_column = 'grid_code'
destination_gdf = healthcare_facilities.dropna(subset=['geometry'])
destination_name_column = 'facility_name'

In [None]:
origins = list(zip(origin_gdf.geometry.x, origin_gdf.geometry.y))

In [None]:
destinations = list(zip(destination_gdf.geometry.x, destination_gdf.geometry.y))

In [None]:
locations = origins + destinations

In [None]:
origins_index = list(range(0, len(origins)))
destinations_index = list(range(len(origins), len(locations)))

In [None]:
body = {'locations': locations,
       'destinations': destinations_index,
       'sources': origins_index,
       'metrics': ['distance', 'duration']}

headers = {
    'Accept': 'application/json, application/geo+json, application/gpx+xml, img/png; charset=utf-8',
    'Authorization': api_key,
    'Content-Type': 'application/json; charset=utf-8'
}

response = requests.post('https://api.openrouteservice.org/v2/matrix/driving-car', json=body, headers=headers)

In [None]:
distances = response.json().get('distances', [])
durations = response.json().get('durations', [])

In [None]:
distances_duration_matrix = []

# Iterate over each origin (grid)
for origin_index, origin in origin_gdf.iterrows():
    origin_name = origin[origin_name_column]
    origin_x = origin.geometry.x
    origin_y = origin.geometry.y
    origin_distances = distances[origin_index]
    origin_durations = durations[origin_index]

    # find the minimum duration and the index of the minimum duration
    min_duration = min(origin_durations)
    min_index = origin_durations.index(min_duration)
    destination_index = destinations_index[min_index]
    dest_x, dest_y = locations[destination_index]
    filtered = healthcare_facilities[(destination_gdf.geometry.x == dest_x) & (destination_gdf.geometry.y == dest_y) ]
    destination_row = filtered.iloc[0]
    dest_name = destination_row[destination_name_column]

        # Append both the distance and duration for this origin-destination pair
    distances_duration_matrix.append([
            origin_name, origin_y, origin_x,
            dest_name, dest_y, dest_x,
            min_duration
        ])

# Convert the results into a DataFrame
matrix_df = pd.DataFrame(distances_duration_matrix, columns=[
    'grid_code','origin_lat', 'origin_lon',
    'destination_name', 'dest_lat', 'dest_lon','min_duration'
])

In [None]:
# Save to CSV
merged_df = pd.merge(matrix_df, grid_df[['grid_code', 'population']], on='grid_code', how='left')
merged_df.to_csv(data_temp + 'distance_duration_matrix_temp.csv', index=False)

In [None]:
merged_df

Unnamed: 0,grid_code,origin_lat,origin_lon,destination_name,dest_lat,dest_lon,min_duration,population
0,53937,12.024583,8.472917,Sabo Bakin Zuwo General Hospital,12.00065,8.50923,686.79,3054.140869
1,17840,12.024583,8.481250,Sabo Bakin Zuwo General Hospital,12.00065,8.50923,722.17,3737.392578
2,54264,12.024583,8.489583,Sabo Bakin Zuwo General Hospital,12.00065,8.50923,569.70,5439.910156
3,18673,12.024583,8.497917,Sabo Bakin Zuwo General Hospital,12.00065,8.50923,395.79,4532.464355
4,76531,12.024583,8.506250,Sabo Bakin Zuwo General Hospital,12.00065,8.50923,411.43,1577.261353
...,...,...,...,...,...,...,...,...
138,76416,11.941250,8.539583,Maxcare Clinic,11.96792,8.54314,543.74,1152.257324
139,63699,11.941250,8.547917,Maxcare Clinic,11.96792,8.54314,419.18,1221.242432
140,35666,11.941250,8.556250,Maxcare Clinic,11.96792,8.54314,295.33,1514.486938
141,62382,11.941250,8.564583,Maxcare Clinic,11.96792,8.54314,265.40,1321.376831


In [None]:
geometry = [Point(xy) for xy in zip(merged_df['dest_lon'], merged_df['dest_lat'])]
gdf = gpd.GeoDataFrame(merged_df, geometry=geometry, crs="EPSG:4326")

gpkg_path = data_temp + 'distance_duration_matrix_temp.gpkg'
gdf.to_file(gpkg_path, layer="duration_matrix", driver="GPKG")

## Processing OD Matrix

In [5]:
matrix_df = pd.read_csv(data_temp +'OD_matrix_healthcare_pop_grid.csv')

In [6]:
matrix_df

Unnamed: 0,origin_id,destination_id,duration_seconds,distance_km
0,43635923.0,1,1371.80,27.66
1,43635923.0,2,1407.28,28.16
2,43635923.0,3,1326.23,27.03
3,43635923.0,4,1334.81,27.34
4,43635923.0,5,1522.02,29.75
...,...,...,...,...
289995,73052674.0,1996,2661.90,49.59
289996,73052674.0,1997,2541.62,46.77
289997,73052674.0,1998,2427.48,46.60
289998,73052674.0,1999,2469.76,47.05


In [7]:
# NAN to be deleted after we process the OD matrix
matrix_df = matrix_df.dropna(subset=['origin_id'])

In [8]:
matrix_df

Unnamed: 0,origin_id,destination_id,duration_seconds,distance_km
0,43635923.0,1,1371.80,27.66
1,43635923.0,2,1407.28,28.16
2,43635923.0,3,1326.23,27.03
3,43635923.0,4,1334.81,27.34
4,43635923.0,5,1522.02,29.75
...,...,...,...,...
289995,73052674.0,1996,2661.90,49.59
289996,73052674.0,1997,2541.62,46.77
289997,73052674.0,1998,2427.48,46.60
289998,73052674.0,1999,2469.76,47.05


We will select one facility for each gird cell

In [9]:
# 1. Calculate the minimum duration_seconds for each destination_id
shortest_times = matrix_df.groupby("destination_id")["duration_seconds"].min().reset_index()

# 2. Use idxmin() to ensure each destination_id only keeps the row with the shortest duration_seconds
idx_min_duration = matrix_df.groupby('destination_id')['duration_seconds'].idxmin()

# 3. Keep only the rows with the shortest duration_seconds for each destination_id
matrix_df_min = matrix_df.loc[idx_min_duration]

# 4. Merge the shortest times with matrix_df
matrix_df = matrix_df_min.merge(shortest_times, on=["destination_id", "duration_seconds"], how="inner")

In [10]:
matrix_df

Unnamed: 0,origin_id,destination_id,duration_seconds,distance_km
0,47167671.0,1,1010.62,23.32
1,47167671.0,2,1046.10,23.81
2,47167671.0,3,965.04,22.68
3,47167671.0,4,973.63,22.99
4,47167671.0,5,1160.83,25.40
...,...,...,...,...
1995,56453455.0,1996,1427.79,24.33
1996,56453455.0,1997,1307.51,21.51
1997,56453455.0,1998,1193.38,21.34
1998,56453455.0,1999,1235.66,21.79


In [11]:
centroids_df = gpd.read_file(data_temp + 'population_centroids_ROWID.geojson')

In [12]:
centroids_df

Unnamed: 0,longitude,latitude,population,grid_code,rowid,geometry
0,8.514583,12.249583,245.035522,66407,1,POINT (8.51458 12.24958)
1,8.506250,12.241250,172.523621,92837,2,POINT (8.50625 12.24125)
2,8.514583,12.241250,390.546021,78968,3,POINT (8.51458 12.24125)
3,8.522917,12.241250,137.086502,91837,4,POINT (8.52292 12.24125)
4,8.497917,12.232917,119.791389,17598,5,POINT (8.49792 12.23292)
...,...,...,...,...,...,...
1995,8.556250,11.741250,106.068001,94821,1996,POINT (8.55625 11.74125)
1996,8.522917,11.732917,62.181351,77605,1997,POINT (8.52292 11.73292)
1997,8.531250,11.732917,129.968262,37248,1998,POINT (8.53125 11.73292)
1998,8.539583,11.732917,161.736633,86175,1999,POINT (8.53958 11.73292)


In [13]:
pop_centroids_closest_hcf = centroids_df.merge(matrix_df, left_on="rowid", right_on="destination_id", how="left")

In [14]:
pop_centroids_closest_hcf

Unnamed: 0,longitude,latitude,population,grid_code,rowid,geometry,origin_id,destination_id,duration_seconds,distance_km
0,8.514583,12.249583,245.035522,66407,1,POINT (8.51458 12.24958),47167671.0,1,1010.62,23.32
1,8.506250,12.241250,172.523621,92837,2,POINT (8.50625 12.24125),47167671.0,2,1046.10,23.81
2,8.514583,12.241250,390.546021,78968,3,POINT (8.51458 12.24125),47167671.0,3,965.04,22.68
3,8.522917,12.241250,137.086502,91837,4,POINT (8.52292 12.24125),47167671.0,4,973.63,22.99
4,8.497917,12.232917,119.791389,17598,5,POINT (8.49792 12.23292),47167671.0,5,1160.83,25.40
...,...,...,...,...,...,...,...,...,...,...
1995,8.556250,11.741250,106.068001,94821,1996,POINT (8.55625 11.74125),56453455.0,1996,1427.79,24.33
1996,8.522917,11.732917,62.181351,77605,1997,POINT (8.52292 11.73292),56453455.0,1997,1307.51,21.51
1997,8.531250,11.732917,129.968262,37248,1998,POINT (8.53125 11.73292),56453455.0,1998,1193.38,21.34
1998,8.539583,11.732917,161.736633,86175,1999,POINT (8.53958 11.73292),56453455.0,1999,1235.66,21.79


In [15]:
pop_centroids_closest_hcf = pop_centroids_closest_hcf.rename(columns={
    "longitude": "origin_lon",
    "latitude": "origin_lat",
    "rowid": "grid_id",
    "origin_id": "hcf_uid"
})
columns_to_keep = ["origin_lon", "origin_lat", "population", "grid_id", "geometry", "hcf_uid", "duration_seconds", "distance_km"]
pop_centroids_closest_hcf = pop_centroids_closest_hcf[columns_to_keep]

In [16]:
pop_centroids_closest_hcf

Unnamed: 0,origin_lon,origin_lat,population,grid_id,geometry,hcf_uid,duration_seconds,distance_km
0,8.514583,12.249583,245.035522,1,POINT (8.51458 12.24958),47167671.0,1010.62,23.32
1,8.506250,12.241250,172.523621,2,POINT (8.50625 12.24125),47167671.0,1046.10,23.81
2,8.514583,12.241250,390.546021,3,POINT (8.51458 12.24125),47167671.0,965.04,22.68
3,8.522917,12.241250,137.086502,4,POINT (8.52292 12.24125),47167671.0,973.63,22.99
4,8.497917,12.232917,119.791389,5,POINT (8.49792 12.23292),47167671.0,1160.83,25.40
...,...,...,...,...,...,...,...,...
1995,8.556250,11.741250,106.068001,1996,POINT (8.55625 11.74125),56453455.0,1427.79,24.33
1996,8.522917,11.732917,62.181351,1997,POINT (8.52292 11.73292),56453455.0,1307.51,21.51
1997,8.531250,11.732917,129.968262,1998,POINT (8.53125 11.73292),56453455.0,1193.38,21.34
1998,8.539583,11.732917,161.736633,1999,POINT (8.53958 11.73292),56453455.0,1235.66,21.79


In [17]:
distances_duration_matrix = pd.merge(pop_centroids_closest_hcf, healthcare_facilities_validated[['uid', 'facility_name', 'longitude', 'latitude']], 
                     left_on='hcf_uid', right_on='uid', how='left')

In [18]:
distances_duration_matrix = distances_duration_matrix.rename(columns={
    "longitude": "dest_lon",
    "latitude": "dest_lat"
})

In [19]:
distances_duration_matrix

Unnamed: 0,origin_lon,origin_lat,population,grid_id,geometry,hcf_uid,duration_seconds,distance_km,uid,facility_name,dest_lon,dest_lat
0,8.514583,12.249583,245.035522,1,POINT (8.51458 12.24958),47167671.0,1010.62,23.32,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336
1,8.506250,12.241250,172.523621,2,POINT (8.50625 12.24125),47167671.0,1046.10,23.81,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336
2,8.514583,12.241250,390.546021,3,POINT (8.51458 12.24125),47167671.0,965.04,22.68,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336
3,8.522917,12.241250,137.086502,4,POINT (8.52292 12.24125),47167671.0,973.63,22.99,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336
4,8.497917,12.232917,119.791389,5,POINT (8.49792 12.23292),47167671.0,1160.83,25.40,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336
...,...,...,...,...,...,...,...,...,...,...,...,...
1995,8.556250,11.741250,106.068001,1996,POINT (8.55625 11.74125),56453455.0,1427.79,24.33,56453455.0,Kura General Hospital,8.42415,11.778010
1996,8.522917,11.732917,62.181351,1997,POINT (8.52292 11.73292),56453455.0,1307.51,21.51,56453455.0,Kura General Hospital,8.42415,11.778010
1997,8.531250,11.732917,129.968262,1998,POINT (8.53125 11.73292),56453455.0,1193.38,21.34,56453455.0,Kura General Hospital,8.42415,11.778010
1998,8.539583,11.732917,161.736633,1999,POINT (8.53958 11.73292),56453455.0,1235.66,21.79,56453455.0,Kura General Hospital,8.42415,11.778010


In [20]:
# Review and remove
origin_dest = distances_duration_matrix

## Enhanced Two-Step Floating Catchment Area (E2SFCA) method

In [71]:
# Function
from math import *
d = 10 * 60 # try max duration 10mins/30mins car, under estimation of travel time and traffic condition realted to the selected data sourse 
W = 0.5
beta = - d ** 2 / log(W)
print(beta)

519370.2147200268


In [72]:
print(origin_dest.head())

   origin_lon  origin_lat  population  grid_id                  geometry  \
0    8.514583   12.249583  245.035522        1  POINT (8.51458 12.24958)   
1    8.506250   12.241250  172.523621        2  POINT (8.50625 12.24125)   
2    8.514583   12.241250  390.546021        3  POINT (8.51458 12.24125)   
3    8.522917   12.241250  137.086502        4  POINT (8.52292 12.24125)   
4    8.497917   12.232917  119.791389        5  POINT (8.49792 12.23292)   

      hcf_uid  duration_seconds  distance_km         uid  \
0  47167671.0           1010.62        23.32  47167671.0   
1  47167671.0           1046.10        23.81  47167671.0   
2  47167671.0            965.04        22.68  47167671.0   
3  47167671.0            973.63        22.99  47167671.0   
4  47167671.0           1160.83        25.40  47167671.0   

                 facility_name  dest_lon   dest_lat    Weight     Pop_W  \
0  Sultan Clinic and Maternity   8.43129  12.091336  0.001455  0.356563   
1  Sultan Clinic and Maternity  

In [73]:
# Convert 'duration' to numeric, coercing errors to NaN
origin_dest['duration_seconds'] = pd.to_numeric(origin_dest['duration_seconds'], errors='coerce')

# Drop rows with NaN values in 'duration' column
origin_dest = origin_dest.dropna(subset=['duration_seconds'])
origin_dest['grid_id'] = pd.to_numeric(origin_dest['grid_id'], errors='coerce')

origin_dest_acc = origin_dest  # Backup

In [74]:
# Apply Gaussian decay function to calculate the weight of each grid to healthcare 
# facilities based on the travel duration. d is the travel time and beta is the decay 
# parameter previously calculated.
# The weight decreases as the duration increases, meaning facilities that are further away have less impact.
origin_dest_acc['Weight'] = origin_dest_acc['duration_seconds'].apply(lambda d: round(math.exp(-d**2/beta), 8))

In [75]:
# Compute the Weighted Population (Pop_W), the population of each grid cell is multiplied 
# by the corresponding weight to calculate the weighted population.
origin_dest_acc['Pop_W'] = origin_dest_acc['population'] * origin_dest_acc['Weight']

In [76]:
origin_dest_acc

Unnamed: 0,origin_lon,origin_lat,population,grid_id,geometry,hcf_uid,duration_seconds,distance_km,uid,facility_name,dest_lon,dest_lat,Weight,Pop_W,unique_code
0,8.514583,12.249583,245.035522,1,POINT (8.51458 12.24958),47167671.0,1010.62,23.32,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.139943,34.290952,-8248856661633896388
1,8.506250,12.241250,172.523621,2,POINT (8.50625 12.24125),47167671.0,1046.10,23.81,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.121599,20.978765,-8248856661633896388
2,8.514583,12.241250,390.546021,3,POINT (8.51458 12.24125),47167671.0,965.04,22.68,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.166437,65.001371,-8248856661633896388
3,8.522917,12.241250,137.086502,4,POINT (8.52292 12.24125),47167671.0,973.63,22.99,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.161185,22.096307,-8248856661633896388
4,8.497917,12.232917,119.791389,5,POINT (8.49792 12.23292),47167671.0,1160.83,25.40,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.074680,8.946056,-8248856661633896388
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,8.556250,11.741250,106.068001,1996,POINT (8.55625 11.74125),56453455.0,1427.79,24.33,56453455.0,Kura General Hospital,8.42415,11.778010,0.019740,2.093782,-5045608071072975233
1996,8.522917,11.732917,62.181351,1997,POINT (8.52292 11.73292),56453455.0,1307.51,21.51,56453455.0,Kura General Hospital,8.42415,11.778010,0.037193,2.312687,-5045608071072975233
1997,8.531250,11.732917,129.968262,1998,POINT (8.53125 11.73292),56453455.0,1193.38,21.34,56453455.0,Kura General Hospital,8.42415,11.778010,0.064436,8.374640,-5045608071072975233
1998,8.539583,11.732917,161.736633,1999,POINT (8.53958 11.73292),56453455.0,1235.66,21.79,56453455.0,Kura General Hospital,8.42415,11.778010,0.052875,8.551854,-5045608071072975233


In [77]:
# Sum the Weighted Population
origin_dest_sum = origin_dest_acc.groupby(by='hcf_uid')['Pop_W'].sum().reset_index()

In [78]:
origin_dest_sum 

Unnamed: 0,hcf_uid,Pop_W
0,12757068.0,1164.947010
1,13205814.0,2787.920405
2,13448392.0,4705.999974
3,16202756.0,11915.193323
4,16293647.0,1851.375605
...,...,...
86,84503282.0,9114.442327
87,84956460.0,7430.452144
88,86996332.0,6932.792936
89,87191866.0,2026.073968


In [79]:
# Merge the Sum of Weighted Population Back into the Original Data
origin_dest_acc = origin_dest_acc.merge(origin_dest_sum, on='hcf_uid')

In [80]:
origin_dest_acc

Unnamed: 0,origin_lon,origin_lat,population,grid_id,geometry,hcf_uid,duration_seconds,distance_km,uid,facility_name,dest_lon,dest_lat,Weight,Pop_W_x,unique_code,Pop_W_y
0,8.514583,12.249583,245.035522,1,POINT (8.51458 12.24958),47167671.0,1010.62,23.32,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.139943,34.290952,-8248856661633896388,17892.707962
1,8.506250,12.241250,172.523621,2,POINT (8.50625 12.24125),47167671.0,1046.10,23.81,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.121599,20.978765,-8248856661633896388,17892.707962
2,8.514583,12.241250,390.546021,3,POINT (8.51458 12.24125),47167671.0,965.04,22.68,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.166437,65.001371,-8248856661633896388,17892.707962
3,8.522917,12.241250,137.086502,4,POINT (8.52292 12.24125),47167671.0,973.63,22.99,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.161185,22.096307,-8248856661633896388,17892.707962
4,8.497917,12.232917,119.791389,5,POINT (8.49792 12.23292),47167671.0,1160.83,25.40,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.074680,8.946056,-8248856661633896388,17892.707962
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,8.506250,11.757917,91.461205,1974,POINT (8.50625 11.75792),28638942.0,885.24,12.90,28638942.0,Kura Surgery and Maternity Clinic,8.42058,11.771390,0.221165,20.228015,-8549283594930020530,3599.635332
1996,8.422917,11.749583,195.097214,1981,POINT (8.42292 11.74958),28638942.0,103.57,2.45,28638942.0,Kura Surgery and Maternity Clinic,8.42058,11.771390,0.979558,191.109124,-8549283594930020530,3599.635332
1997,8.431250,11.749583,109.939705,1982,POINT (8.43125 11.74958),28638942.0,413.66,9.63,28638942.0,Kura Surgery and Maternity Clinic,8.42058,11.771390,0.719308,79.080516,-8549283594930020530,3599.635332
1998,8.439583,11.749583,87.887894,1983,POINT (8.43958 11.74958),28638942.0,247.91,4.05,28638942.0,Kura Surgery and Maternity Clinic,8.42058,11.771390,0.888399,78.079509,-8549283594930020530,3599.635332


In [81]:
# supply value is set to 1 for simplicity (capacity of HCF)
supply = 1
# in the future, we will link supply with ownership and EmOC service level
origin_dest_acc = origin_dest_acc.rename(columns={'Pop_W_y': 'Pop_W_S'})  # Pop_W_S: Population Weight Sum

In [82]:
# Compute the Supply-Demand Ratio (Rj)
origin_dest_acc['supply_demand_ratio'] = 1 / origin_dest_acc.Pop_W_S
origin_dest_acc['supply_demand_ratio'].replace([np.inf, np.nan], 0, inplace=True)

In [83]:
# Calculate Rj * Weight for Each Grid Cell
origin_dest_acc['supply_W'] = origin_dest_acc['supply_demand_ratio'] * origin_dest_acc.Weight

In [84]:
# Compute Accessibility Index (Ai) for Each Grid Cell
origin_dest_acc['Accessibility'] = origin_dest_acc.groupby('grid_id')['supply_W'].transform('sum')

In [85]:
# Normalize
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
origin_dest_acc['Accessibility_standard'] = scaler.fit_transform(origin_dest_acc[['Accessibility']])

In [86]:
origin_dest_acc

Unnamed: 0,origin_lon,origin_lat,population,grid_id,geometry,hcf_uid,duration_seconds,distance_km,uid,facility_name,dest_lon,dest_lat,Weight,Pop_W_x,unique_code,Pop_W_S,supply_demand_ratio,supply_W,Accessibility,Accessibility_standard
0,8.514583,12.249583,245.035522,1,POINT (8.51458 12.24958),47167671.0,1010.62,23.32,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.139943,34.290952,-8248856661633896388,17892.707962,0.000056,0.000008,0.000008,0.006441
1,8.506250,12.241250,172.523621,2,POINT (8.50625 12.24125),47167671.0,1046.10,23.81,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.121599,20.978765,-8248856661633896388,17892.707962,0.000056,0.000007,0.000007,0.005596
2,8.514583,12.241250,390.546021,3,POINT (8.51458 12.24125),47167671.0,965.04,22.68,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.166437,65.001371,-8248856661633896388,17892.707962,0.000056,0.000009,0.000009,0.007661
3,8.522917,12.241250,137.086502,4,POINT (8.52292 12.24125),47167671.0,973.63,22.99,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.161185,22.096307,-8248856661633896388,17892.707962,0.000056,0.000009,0.000009,0.007419
4,8.497917,12.232917,119.791389,5,POINT (8.49792 12.23292),47167671.0,1160.83,25.40,47167671.0,Sultan Clinic and Maternity,8.43129,12.091336,0.074680,8.946056,-8248856661633896388,17892.707962,0.000056,0.000004,0.000004,0.003435
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,8.506250,11.757917,91.461205,1974,POINT (8.50625 11.75792),28638942.0,885.24,12.90,28638942.0,Kura Surgery and Maternity Clinic,8.42058,11.771390,0.221165,20.228015,-8549283594930020530,3599.635332,0.000278,0.000061,0.000061,0.050631
1996,8.422917,11.749583,195.097214,1981,POINT (8.42292 11.74958),28638942.0,103.57,2.45,28638942.0,Kura Surgery and Maternity Clinic,8.42058,11.771390,0.979558,191.109124,-8549283594930020530,3599.635332,0.000278,0.000272,0.000272,0.224264
1997,8.431250,11.749583,109.939705,1982,POINT (8.43125 11.74958),28638942.0,413.66,9.63,28638942.0,Kura Surgery and Maternity Clinic,8.42058,11.771390,0.719308,79.080516,-8549283594930020530,3599.635332,0.000278,0.000200,0.000200,0.164680
1998,8.439583,11.749583,87.887894,1983,POINT (8.43958 11.74958),28638942.0,247.91,4.05,28638942.0,Kura Surgery and Maternity Clinic,8.42058,11.771390,0.888399,78.079509,-8549283594930020530,3599.635332,0.000278,0.000247,0.000247,0.203393


In [87]:
max(origin_dest_acc.Accessibility_standard)

1.0

In [88]:
geometry = [Point(xy) for xy in zip(origin_dest_acc['origin_lon'], origin_dest_acc['origin_lat'])]
gdf = gpd.GeoDataFrame(origin_dest_acc, geometry=geometry, crs="EPSG:4326")

gpkg_path = data_outputs + 'origin_dest_acc.gpkg'
gdf.to_file(gpkg_path, layer="origin_dest_acc", driver="GPKG")