# Analysis of Emergency Maternal Care Deprivation in Kano and Lagos, Nigeria
> Note: This notebook requires the local environment dependencies listed in our [requirements.txt] (requirements.txt) file. Use this file to install the required packages in a virtual environment.

> To excecute OpenRouteService functions, it is required to install the [library dependencies](https://github.com/GIScience/openrouteservice-examples#local-installation). You should either have an [openrouteservice API key](https://openrouteservice.org/dev/#/signup) or a local ORS environment to complete the analysis.

The model concepts and processes are described in our documentation. The [Dataset-interpretability](https://github.com/urbanbigdatacentre/ideamaps-models/blob/a4084fb650424ac575941cdacb71421aa882bae4/models/emergency-maternal-care/kano/dataset-interpretability.md) file describes the rationale behind this model.

## Workflow:
The notebook is divided into the following sections:

1. Initial Setup
2. Data Preparation
3. Travel time estimates
4. Two-step floating catchment area (2SFCA) analysis
5. Results

## 1. Initial Setup

## Setting up the virtual environment

```bash
# Create a new virtual environment
# It is recommended to create this virtual environment in the scripts folder
python -m venv .venv

# Activate the virtual environment
source .venv/bin/activate
pip install -r requirements.txt
```

## To run your notebook in VS Code

```bash
pip install -U ipykernel
python -m ipykernel install --user --name=.venv
```

In [1]:
import os
from IPython.display import display
import requests

import folium
from folium.plugins import MarkerCluster
import openrouteservice
import time

import pandas as pd
import numpy as np
import fiona as fn
import geopandas as gpd
from shapely.geometry import shape, mapping
from shapely.geometry import Point
from shapely.geometry import box
from scipy.spatial import cKDTree
from tqdm import tqdm

import rasterio
from rasterio.transform import xy
from rasterio.mask import mask
import rasterstats as rs
import math

from pathlib import Path
from shapely.geometry import Polygon

## Preprocessing
In this study, users first requested an ORS Matrix API key from the [OpenRouteService](https://openrouteservice.org/) platform and subsequently interacted with the OpenRouteService API through the instantiation of the OpenRouteService client. This is the OpenRouteService [API documentation](https://openrouteservice.org/dev/#/api-docs/introduction) for ORS Core-Version 9.0.0. 

Generate a [API Key](https://openrouteservice.org/dev/#/home?tab=1) (Token) it is necessary to sign up at the OpenRouteService dashboard by using your E-mail address or sign up with your GitHub. After logging in, go to the Dashboard by clicking on your profile icon and navigate to the API Keys section. Click "Create API Key" to generate a free key and then choose a service plan (the free plan has limited requests per day). Copy the API Key and store it securely. 

OpenRouteService primarily uses API keys for authentication. However, if a token is required for certain endpoints, you can send a request with your API key in the Authorization header. This process facilitated various geospatial analysis functions, including isochrone generation.


### Option 1: Using an ORS API Key
Make sure you have a .env file in the root directory with the following content:
```bash
    OPENROUTESERVICE_API_KEY='your_api_key'
```

In [None]:
# %%
# Read the api key from the .env file
from dotenv import load_dotenv
%load_ext dotenv
%dotenv
api_key = os.getenv('OPENROUTESERVICE_API_KEY')
client = openrouteservice.Client(key=api_key)

### Option 2: Using a local ORS service
Make sure you have set a local service that runs the OSM-based ORS API. 
```r
    # Insert R code from the local ORS service
```

For this study different kind of data were used. The dataset on healthcare facilities is sourced from a research ([Macharia, P.M. et al., 2023](https://doi.org/10.1038/s41597-023-02651-9)) which provides A geospatial database of close-to-reality travel times to obstetric emergency care in 15 Nigerian conurbations. The dataset were filtered by state name to isolate facilities in Kano and converted CSV file to shapefile based on coordinates using [QGIS](https://qgis.org/). 

The Level 2 administrative boundary data is sourced from [Humanitarian Data Exchange](https://data.humdata.org/) were used to correlate the isochrones and healthcare facility distribution with specific administrative regions. The data were filtered based on the administrative region name (lganame) to focus the analysis on Kano.

Despite being official, administrative boundaries may not reflect the actual patterns of human settlement or economic activity. Therefore, the team used the Functional Urban Area (FUA) as a complementary definition of the study areas. The FUA is defined by [the Joint Research Centre of the European Commission](https://commission.europa.eu/about/departments-and-executive-agencies/joint-research-centre_en) as the actual urban sprawl and human activities, encompassing the core city and economically or socially integrated surrounding regions. The FUA was obtained from [the Global Human Settlement Layer (GHSL) ](https://human-settlement.emergency.copernicus.eu/)dataset, which provides spatial data for functional urban areas worldwide. 

* [Datasets of health facilities](https://doi.org/10.6084/m9.figshare.22689667.v2) (15/07/2023)
* [Shapefile of district boundaries](https://data.humdata.org/dataset/nigeria-admin-level-2) - Admin Level 2 (data from Humanitarian Data Exchange, 25/11/2015)
* [Functional Urban Areas](https://human-settlement.emergency.copernicus.eu/download.php?ds=FUA) - data from Global Human Settlement Layer(2015)

### Option 1: Kano
If you want to process data for the city of Kano, use the following code to filter the dataset. 


In [7]:
# Set paths to access Kano data
# Define directories
data_inputs = '../scripts/Kano/data-inputs/'
data_temp = '../scripts/Kano/data-temp/'
model_outputs = '../kano/'

### Option 2: Lagos
If you want to process data for the city of Kano, use the following code to filter the dataset. 

In [2]:
# Set paths to access Lagos data
# Define directories
data_inputs = '../scripts/Lagos/data-inputs/'
data_temp = '../scripts/Lagos/data-temp/'
model_outputs = '../Lagos/'

## Data Collection

### 1. Validated healthcare facilities for Kano
note: to describe the process to validate healthcare facilities

In [3]:
healthcare_facilities_validated = gpd.read_file(data_inputs + 'healthcare_facilities.geojson')

In [4]:
healthcare_facilities_validated

Unnamed: 0,orig_order,state,lga,ward,urban_conurb,uid,facility_code,ontime_code,facility_name,reg_number,...,latitude,longitude,operation_status,registration_status,license_status,created,last_updated,last_updated_ontime,hcf_id,geometry
0,117,1,Agege,Dopemu,1,25250613.0,24/01/1/2/2/0001,100101001,Al-Imaan Hospital and Maternity Home,,...,6.61300,3.31549,Operational,Unknown,Unknown,2018-01-01 01:01:01,2018-01-01 01:01:01,28/09/2022 09:00,1,POINT (3.31549 6.613)
1,400,1,Agege,Keke,1,58886840.0,24/01/1/2/2/0009,100101002,J.A Lashman Hospital,,...,6.62262,3.33436,Operational,Registered,Licensed,2018-01-01 01:01:01,2020-02-28 10:09:17,28/09/2022 09:00,2,POINT (3.33436 6.62262)
2,399,1,Agege,Orile (Agege),1,70566198.0,24/01/1/2/1/0001,100101003,Orile Agege General Hospital,,...,6.63477,3.30360,Operational,Unknown,Unknown,2018-01-01 01:01:01,2018-01-01 01:01:01,28/09/2022 09:00,3,POINT (3.3036 6.63477)
3,118,1,Agege,Powerline,1,81930847.0,24/01/1/2/2/0016,100101004,Molayo Medical Centre,,...,6.63309,3.31291,Operational,Registered,Licensed,2018-01-01 01:01:01,2020-03-04 10:30:39,28/09/2022 09:00,4,POINT (3.31291 6.63309)
4,83,1,Agege,Papa Ashafa,1,18545103.0,24/01/1/1/1/0068,100101005,Mucas Hospital,,...,6.61240,3.30348,Operational,Registered,Licensed,2018-01-01 01:01:01,2020-03-04 11:16:07,28/09/2022 09:00,5,POINT (3.30348 6.6124)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
786,2047,1,Surulere,Nuru Oniwo,1,56048382.0,24/20/1/1/1/0099,100120069,Fajip Hospital (Annex),,...,6.49779,3.34456,Operational,Registered,Licensed,2018-01-01 01:01:01,2020-02-25 11:24:32,28/09/2022 09:00,787,POINT (3.34456 6.49779)
787,2048,1,Surulere,Muniru Baruwa,1,51776203.0,24/08/1/2/2/0051,100120070,St Maria Hospital,,...,6.54848,3.23181,Operational,Unknown,Unknown,2018-01-01 01:01:01,2018-01-01 01:01:01,28/09/2022 09:00,788,POINT (3.23181 6.54848)
788,2049,1,Surulere,Airways,1,52229738.0,24/20/1/2/2/0068,100120071,Peace Way Hospital and Maternity,,...,6.55846,3.24249,Operational,Unknown,Unknown,2018-01-01 01:01:01,2018-01-01 01:01:01,28/09/2022 09:00,789,POINT (3.24249 6.55846)
789,762,4,Egor,Unknown,4,,,100401001,Total Health Trust Medical Centre,,...,6.54187,3.36499,Operational,,,NaT,NaT,28/09/2022 09:00,790,POINT (3.36499 6.54187)


### 2. Healthcare facilities in Lagos
note: Due to the absence of local expert validation in Lagos, the classification for validation is determine based on the ownership provided in the [datasets of health facilities](https://doi.org/10.6084/m9.figshare.22689667.v2).

In [5]:
healthcare_facilities_validated['Validation'] = healthcare_facilities_validated['owner'].map({
    1: 'Public Comprehensive EmOC',
    2: 'Private Comprehensive EmOC'
})

In [6]:
healthcare_facilities_validated

Unnamed: 0,orig_order,state,lga,ward,urban_conurb,uid,facility_code,ontime_code,facility_name,reg_number,...,longitude,operation_status,registration_status,license_status,created,last_updated,last_updated_ontime,hcf_id,geometry,Validation
0,117,1,Agege,Dopemu,1,25250613.0,24/01/1/2/2/0001,100101001,Al-Imaan Hospital and Maternity Home,,...,3.31549,Operational,Unknown,Unknown,2018-01-01 01:01:01,2018-01-01 01:01:01,28/09/2022 09:00,1,POINT (3.31549 6.613),Private Comprehensive EmOC
1,400,1,Agege,Keke,1,58886840.0,24/01/1/2/2/0009,100101002,J.A Lashman Hospital,,...,3.33436,Operational,Registered,Licensed,2018-01-01 01:01:01,2020-02-28 10:09:17,28/09/2022 09:00,2,POINT (3.33436 6.62262),Private Comprehensive EmOC
2,399,1,Agege,Orile (Agege),1,70566198.0,24/01/1/2/1/0001,100101003,Orile Agege General Hospital,,...,3.30360,Operational,Unknown,Unknown,2018-01-01 01:01:01,2018-01-01 01:01:01,28/09/2022 09:00,3,POINT (3.3036 6.63477),Public Comprehensive EmOC
3,118,1,Agege,Powerline,1,81930847.0,24/01/1/2/2/0016,100101004,Molayo Medical Centre,,...,3.31291,Operational,Registered,Licensed,2018-01-01 01:01:01,2020-03-04 10:30:39,28/09/2022 09:00,4,POINT (3.31291 6.63309),Private Comprehensive EmOC
4,83,1,Agege,Papa Ashafa,1,18545103.0,24/01/1/1/1/0068,100101005,Mucas Hospital,,...,3.30348,Operational,Registered,Licensed,2018-01-01 01:01:01,2020-03-04 11:16:07,28/09/2022 09:00,5,POINT (3.30348 6.6124),Private Comprehensive EmOC
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
786,2047,1,Surulere,Nuru Oniwo,1,56048382.0,24/20/1/1/1/0099,100120069,Fajip Hospital (Annex),,...,3.34456,Operational,Registered,Licensed,2018-01-01 01:01:01,2020-02-25 11:24:32,28/09/2022 09:00,787,POINT (3.34456 6.49779),Private Comprehensive EmOC
787,2048,1,Surulere,Muniru Baruwa,1,51776203.0,24/08/1/2/2/0051,100120070,St Maria Hospital,,...,3.23181,Operational,Unknown,Unknown,2018-01-01 01:01:01,2018-01-01 01:01:01,28/09/2022 09:00,788,POINT (3.23181 6.54848),Private Comprehensive EmOC
788,2049,1,Surulere,Airways,1,52229738.0,24/20/1/2/2/0068,100120071,Peace Way Hospital and Maternity,,...,3.24249,Operational,Unknown,Unknown,2018-01-01 01:01:01,2018-01-01 01:01:01,28/09/2022 09:00,789,POINT (3.24249 6.55846),Private Comprehensive EmOC
789,762,4,Egor,Unknown,4,,,100401001,Total Health Trust Medical Centre,,...,3.36499,Operational,,,NaT,NaT,28/09/2022 09:00,790,POINT (3.36499 6.54187),Private Comprehensive EmOC


### Population Grid Data (1km resolution) from WorldPop
note: explain the rational for female population between 15-49 years old

In [8]:
study_area = gpd.read_file(data_inputs + '100mGrid.gpkg')
raster_path = data_inputs + 'nga_f_15_49_2015_1km.tif'

In [9]:
with rasterio.open(raster_path) as dataset:
    geometries = [study_area.geometry.unary_union.__geo_interface__]
    clipped_image, clipped_transform = mask(dataset, geometries, crop=True)
    band1 = clipped_image[0] # Read the first band of the raster

In [11]:
out_meta = dataset.meta.copy()
out_meta.update({
        "height": clipped_image.shape[1],
        "width": clipped_image.shape[2],
        "transform": clipped_transform
    })

In [12]:
with rasterio.open(data_inputs + 'Lagos_nga_f_15_49_2015_1km.tif', "w", **out_meta) as dest:
    dest.write(clipped_image)

## Adding population data at 1km grid to 100m grid

In [4]:
# reading in geotiff file as numpy array
def read_tif(file: Path):
    if not file.exists():
        raise FileNotFoundError(f'File {file} not found')

    with rasterio.open(file) as dataset:
        arr = dataset.read()  # (bands X height X width)
        transform = dataset.transform
        crs = dataset.crs

    return arr.transpose((1, 2, 0)), transform, crs

def raster2vector(arr, transform, crs) -> gpd.GeoDataFrame:
    height, width, bands = arr.shape

    # Generate pixel coordinates
    geometries = []
    pixel_values = []

    for row in range(height):
        for col in range(width):
            x_min, y_max = transform * (col, row)  # Top-left corner
            x_max, y_min = transform * (col + 1, row + 1)  # Bottom-right corner

            pixel_value = arr[row, col].tolist()[0]  # Convert numpy array to list
            polygon = Polygon([(x_min, y_max), (x_max, y_max), (x_max, y_min), (x_min, y_min)])

            geometries.append(polygon)
            pixel_values.append(pixel_value)

    # Convert to DataFrame
    gdf = gpd.GeoDataFrame({'pop_grid_pop': pixel_values, 'geometry': geometries}, crs=crs)

    return gdf

epsg = 'EPSG:32632'

In [5]:
# Preparing grid
grid_file = data_inputs + '100mGrid.gpkg'
grid = gpd.read_file(grid_file)
grid = grid.to_crs(epsg)
grid['grid_id'] = range(len(grid))
grid = grid[['grid_id', 'geometry']].set_geometry('geometry')
grid

Unnamed: 0,grid_id,geometry
0,0,"POLYGON ((-184077.797 713168.196, -183966.398 ..."
1,1,"POLYGON ((-184079.195 713078.067, -183967.795 ..."
2,2,"POLYGON ((-184080.593 712987.938, -183969.193 ..."
3,3,"POLYGON ((-184081.99 712897.809, -183970.59 71..."
4,4,"POLYGON ((-184083.388 712807.68, -183971.988 7..."
...,...,...
325349,325349,"POLYGON ((-74134.769 717795.942, -74023.561 71..."
325350,325350,"POLYGON ((-74136.108 717705.962, -74024.9 7177..."
325351,325351,"POLYGON ((-74137.447 717615.981, -74026.239 71..."
325352,325352,"POLYGON ((-74138.785 717526.001, -74027.577 71..."


In [6]:
# Count buildings per grid cell

# Loading Google building footprints
building_file = data_inputs + 'Lagos_GOBv3.gpkg'
buildings = gpd.read_file(building_file)
buildings = buildings.to_crs(epsg)
buildings['centroid'] = buildings['geometry'].centroid

# Joining buildings to grid
grid_buildings = grid.sjoin(buildings.set_geometry('centroid').drop(columns='geometry'), how='inner', predicate='intersects')
grid_buildings = grid_buildings.groupby('grid_id')

# Counting buildings per grid
building_counts = grid_buildings.size().rename('bcount')

# Adding building count to grid cells
grid = grid.merge(building_counts, on='grid_id', how='left')

# Assign building count 0 to cells with no buildings (NaN)
grid['bcount'] = grid['bcount'].fillna(0)
grid

Unnamed: 0,grid_id,geometry,bcount
0,0,"POLYGON ((-184077.797 713168.196, -183966.398 ...",0.0
1,1,"POLYGON ((-184079.195 713078.067, -183967.795 ...",0.0
2,2,"POLYGON ((-184080.593 712987.938, -183969.193 ...",0.0
3,3,"POLYGON ((-184081.99 712897.809, -183970.59 71...",0.0
4,4,"POLYGON ((-184083.388 712807.68, -183971.988 7...",0.0
...,...,...,...
325349,325349,"POLYGON ((-74134.769 717795.942, -74023.561 71...",0.0
325350,325350,"POLYGON ((-74136.108 717705.962, -74024.9 7177...",1.0
325351,325351,"POLYGON ((-74137.447 717615.981, -74026.239 71...",0.0
325352,325352,"POLYGON ((-74138.785 717526.001, -74027.577 71...",0.0


In [7]:
# Adding population data at 1km grid to finer grid
from pathlib import Path

data_path = Path(data_inputs)

# Loading coarse pop data
pop_file = data_path / 'Lagos_nga_f_15_49_2015_1km.tif'
pop_raster, transform, crs = read_tif(pop_file)

# Converting the raster grid to vector data
pop_grid = raster2vector(pop_raster, transform, crs)
pop_grid = pop_grid.to_crs(epsg)
pop_grid['pop_grid_id'] = range(len(pop_grid))
# pop_grid.to_parquet(data_path / 'sanity_check_pop.parquet')

# Assign coarse population data to finer grid based on the centroid locations of the finer grid cells
grid['centroid'] = grid['geometry'].centroid
grid = gpd.sjoin(grid.set_geometry('centroid'), pop_grid, how='left', predicate='within')
print(grid.columns)
grid = grid[['grid_id', 'bcount', 'pop_grid_id', 'geometry']]
grid.head()

Index(['grid_id', 'geometry', 'bcount', 'centroid', 'index_right',
       'pop_grid_pop', 'pop_grid_id'],
      dtype='object')


Unnamed: 0,grid_id,bcount,pop_grid_id,geometry
0,0,0.0,8520,"POLYGON ((-184077.797 713168.196, -183966.398 ..."
1,1,0.0,8520,"POLYGON ((-184079.195 713078.067, -183967.795 ..."
2,2,0.0,8520,"POLYGON ((-184080.593 712987.938, -183969.193 ..."
3,3,0.0,8640,"POLYGON ((-184081.99 712897.809, -183970.59 71..."
4,4,0.0,8640,"POLYGON ((-184083.388 712807.68, -183971.988 7..."


In [8]:
print(grid['bcount'].sum())

3871094.0


In [9]:
# Calculate population weight (fraction of total population count that should be assigned to cell based on its building count)
grid_grouped_pop = grid.groupby('pop_grid_id')
building_count_pop = grid_grouped_pop['bcount'].sum().rename('pop_grid_bcount')
grid = grid.merge(building_count_pop, on='pop_grid_id', how='left')
grid['pop_weight'] = grid['bcount'] / grid['pop_grid_bcount']

# Compute disaggregated population count based on weight and building count at coarser cell level
grid = grid.merge(pop_grid, on='pop_grid_id', how='left')
grid['pop'] = grid['pop_grid_pop'] * grid['pop_weight']
grid.head()

Unnamed: 0,grid_id,bcount,pop_grid_id,geometry_x,pop_grid_bcount,pop_weight,pop_grid_pop,geometry_y,pop
0,0,0.0,8520,"POLYGON ((-184077.797 713168.196, -183966.398 ...",0.0,,-3.4028230000000003e+38,"POLYGON ((-184838.209 713821.761, -183911.413 ...",
1,1,0.0,8520,"POLYGON ((-184079.195 713078.067, -183967.795 ...",0.0,,-3.4028230000000003e+38,"POLYGON ((-184838.209 713821.761, -183911.413 ...",
2,2,0.0,8520,"POLYGON ((-184080.593 712987.938, -183969.193 ...",0.0,,-3.4028230000000003e+38,"POLYGON ((-184838.209 713821.761, -183911.413 ...",
3,3,0.0,8640,"POLYGON ((-184081.99 712897.809, -183970.59 71...",0.0,,-3.4028230000000003e+38,"POLYGON ((-184849.423 712895.275, -183922.612 ...",
4,4,0.0,8640,"POLYGON ((-184083.388 712807.68, -183971.988 7...",0.0,,-3.4028230000000003e+38,"POLYGON ((-184849.423 712895.275, -183922.612 ...",


In [10]:
# Saving to file
grid = grid.drop(columns=["geometry_y"])
grid.head()


Unnamed: 0,grid_id,bcount,pop_grid_id,geometry_x,pop_grid_bcount,pop_weight,pop_grid_pop,pop
0,0,0.0,8520,"POLYGON ((-184077.797 713168.196, -183966.398 ...",0.0,,-3.4028230000000003e+38,
1,1,0.0,8520,"POLYGON ((-184079.195 713078.067, -183967.795 ...",0.0,,-3.4028230000000003e+38,
2,2,0.0,8520,"POLYGON ((-184080.593 712987.938, -183969.193 ...",0.0,,-3.4028230000000003e+38,
3,3,0.0,8640,"POLYGON ((-184081.99 712897.809, -183970.59 71...",0.0,,-3.4028230000000003e+38,
4,4,0.0,8640,"POLYGON ((-184083.388 712807.68, -183971.988 7...",0.0,,-3.4028230000000003e+38,


In [11]:
grid = grid.set_geometry("geometry_x")
grid = grid.to_crs(4326)
grid.to_file(data_temp + 'pop_grid.gpkg', driver='GPKG')

## Spatial Analysis Pipeline 
### Using OpenRouteService (ORS) Matrix API to calculate the travel time and distance from each population grid centroid to the healthcare facility 

note: this will generate a file 'OD_matrix_healthcare_pop_grid'

In [None]:
origin_gdf = centroids_df
origin_name_column = 'grid_code'
destination_gdf = healthcare_facilities.dropna(subset=['geometry'])
destination_name_column = 'facility_name'

In [None]:
origins = list(zip(origin_gdf.geometry.x, origin_gdf.geometry.y))

In [None]:
destinations = list(zip(destination_gdf.geometry.x, destination_gdf.geometry.y))

In [None]:
locations = origins + destinations

In [None]:
origins_index = list(range(0, len(origins)))
destinations_index = list(range(len(origins), len(locations)))

In [None]:
body = {'locations': locations,
       'destinations': destinations_index,
       'sources': origins_index,
       'metrics': ['distance', 'duration']}

In [None]:
headers = {
    'Accept': 'application/json, application/geo+json, application/gpx+xml, img/png; charset=utf-8',
    'Authorization': api_key,
    'Content-Type': 'application/json; charset=utf-8'
}

In [None]:
response = requests.post('https://api.openrouteservice.org/v2/matrix/driving-car', json=body, headers=headers)

In [None]:
distances = response.json().get('distances', [])
durations = response.json().get('durations', [])

In [None]:
distances_duration_matrix = []

In [None]:
# Iterate over each origin (grid)
for origin_index, origin in origin_gdf.iterrows():
    origin_name = origin[origin_name_column]
    origin_x = origin.geometry.x
    origin_y = origin.geometry.y
    origin_distances = distances[origin_index]
    origin_durations = durations[origin_index]

    # find the minimum duration and the index of the minimum duration
    min_duration = min(origin_durations)
    min_index = origin_durations.index(min_duration)
    destination_index = destinations_index[min_index]
    dest_x, dest_y = locations[destination_index]
    filtered = healthcare_facilities[(destination_gdf.geometry.x == dest_x) & (destination_gdf.geometry.y == dest_y) ]
    destination_row = filtered.iloc[0]
    dest_name = destination_row[destination_name_column]

        # Append both the distance and duration for this origin-destination pair
    distances_duration_matrix.append([
            origin_name, origin_y, origin_x,
            dest_name, dest_y, dest_x,
            min_duration
        ])

In [None]:
# Convert the results into a DataFrame
matrix_df = pd.DataFrame(distances_duration_matrix, columns=[
    'grid_code','origin_lat', 'origin_lon',
    'destination_name', 'dest_lat', 'dest_lon','min_duration'
])

In [None]:
# Save to CSV
merged_df = pd.merge(matrix_df, grid_df[['grid_code', 'population']], on='grid_code', how='left')
merged_df.to_csv(data_temp + 'distance_duration_matrix_temp.csv', index=False)

In [None]:
merged_df

In [None]:
geometry = [Point(xy) for xy in zip(merged_df['dest_lon'], merged_df['dest_lat'])]
gdf = gpd.GeoDataFrame(merged_df, geometry=geometry, crs="EPSG:4326")

In [None]:
gpkg_path = data_temp + 'distance_duration_matrix_temp.gpkg'
gdf.to_file(gpkg_path, layer="duration_matrix", driver="GPKG")

## Processing OD Matrix

In [None]:
matrix_df = pd.read_csv(data_temp +'OD-matrix-100m.csv')

In [None]:
matrix_df

We will select one facility for each gird cell

In [None]:
centroids_df = gpd.read_file(data_temp + 'pop-grid.gpkg')

In [None]:
centroids_df

In [None]:
pop_centroids_hcf = pd.merge(matrix_df, centroids_df[['rowid', 'longitude', 'latitude', 'lon_min', 'lat_min', 'lon_max', 'lat_max','bcount','pop_grid_bcount', 'pop_grid_pop', 'pop', 'geometry']], 
                     left_on='destination_id', right_on='rowid', how='left')

In [None]:
pop_centroids_hcf

In [None]:
pop_centroids_hcf = pop_centroids_hcf.rename(columns={
    "longitude": "origin_lon",
    "latitude": "origin_lat",
    "lon_min": "origin_lon_min",
    "lat_min": "origin_lat_min",
    "lon_max": "origin_lon_max",
    "lat_max": "origin_lat_max",
    "rowid": "grid_id",
    "origin_id": "hcf_uid",
    "pop": "population"
})
columns_to_keep = ["grid_id", "origin_lon", "origin_lat", "origin_lon_min","origin_lat_min","origin_lon_max","origin_lat_max","population", "bcount","pop_grid_bcount", "pop_grid_pop","geometry", "hcf_uid", "duration_seconds", "distance_km"]
pop_centroids_hcf = pop_centroids_hcf[columns_to_keep]

In [None]:
pop_centroids_hcf

In [None]:
distances_duration_matrix = pd.merge(pop_centroids_hcf, healthcare_facilities_validated[['hcf_id','facility_name', 'longitude', 'latitude', 'Local_Validation']], 
                     left_on='hcf_uid', right_on='hcf_id', how='left')

In [None]:
distances_duration_matrix = distances_duration_matrix.rename(columns={
    "longitude": "dest_lon",
    "latitude": "dest_lat"
})
distances_duration_matrix = distances_duration_matrix.drop(columns=['hcf_uid'])

In [None]:
distances_duration_matrix

In [None]:
category_counts = healthcare_facilities_validated['Local_Validation'].value_counts()
print(category_counts)

In [None]:
distances_duration_matrix['Local_Validation'] = distances_duration_matrix['Local_Validation'].replace({
    'Public/Private Basic EmOC': 'Private Basic EmOC',
    'Public/Private comprehensive EmOC (missionary Hospital)': 'Private Comprehensive EmOC'
})

In [None]:
selected_categories = ['Public Comprehensive EmOC', 'Private Comprehensive EmOC', 
                       'Private Basic EmOC', 'Public Basic EmOC']

In [None]:
distances_duration_matrix = distances_duration_matrix[
    distances_duration_matrix['Local_Validation'].isin(selected_categories)
]

In [None]:
distances_duration_matrix

In [None]:
# creat subsets based on categories of 'Validation of HCFs Categorization'
categories = {
    "public_comprehensive_EmOC": ["Public Comprehensive EmOC"],
    "private_comprehensive_EmOC": ["Private Comprehensive EmOC"],
    "private_basic_EmOC": ["Private Basic EmOC"],
    "public_basic_EmOC": ["Public Basic EmOC"]
}

In [None]:
subsets = {
    key: distances_duration_matrix[
        distances_duration_matrix['Local_Validation'].str.contains('|'.join(values), na=False)
    ]
    for key, values in categories.items()
}

In [None]:
public_CEmOC = subsets["public_comprehensive_EmOC"]
private_CEmOC = subsets["private_comprehensive_EmOC"]
public_BEmOC = subsets["public_basic_EmOC"]
private_BEmOC = subsets["private_basic_EmOC"]

In [None]:
public_CEmOC

In [None]:
# Step 2: Define a function to get 3 smallest duration_seconds per grid_id for each category
def get_closest_3(df, n=3):
    return df.groupby('grid_id').apply(lambda x: x.nsmallest(n, 'duration_seconds')).reset_index(drop=True)

In [None]:
# If the subsets are already created for each category, we apply the function to each subset:
public_CEmOC_closest_3 = get_closest_3(public_CEmOC)
private_CEmOC_closest_3 = get_closest_3(private_CEmOC)
public_BEmOC_closest_3 = get_closest_3(public_BEmOC)
private_BEmOC_closest_3 = get_closest_3(private_BEmOC)

In [None]:
# Step 4: Concatenate the filtered results into a single DataFrame
distances_duration_matrix = pd.concat([
    public_CEmOC_closest_3, private_CEmOC_closest_3,
    public_BEmOC_closest_3, private_BEmOC_closest_3
])

distances_duration_matrix = distances_duration_matrix.groupby('grid_id').apply(lambda x: x.nsmallest(3, 'duration_seconds')).reset_index(drop=True)

In [None]:
distances_duration_matrix

In [None]:
geometry = [Point(xy) for xy in zip(distances_duration_matrix['origin_lon'], distances_duration_matrix['origin_lat'])]
gdf = gpd.GeoDataFrame(distances_duration_matrix, geometry=geometry, crs="EPSG:4326")

In [None]:
gpkg_path = data_temp + 'distances_duration_3_closet_Emoc.gpkg'
gdf.to_file(gpkg_path, layer="distances_duration_3_closet_Emoc", driver="GPKG")

In [None]:
# Review and remove
origin_dest = distances_duration_matrix

## Enhanced Two-Step Floating Catchment Area (E2SFCA) method

In [None]:
# Function
from math import *
d = 10 * 60 # try max duration 5/10mins/15mins/20 car, under estimation of travel time and traffic condition realted to the selected data sourse 
W = 0.01 # try 0.1, 0.05, 0.01, 0.75
beta = - d ** 2 / log(W)
print(beta)

In [None]:
print(origin_dest.head())

In [None]:
# Convert 'duration' to numeric, coercing errors to NaN
origin_dest = origin_dest.copy()
origin_dest['duration_seconds'] = pd.to_numeric(origin_dest['duration_seconds'], errors='coerce')

In [None]:
# Drop rows with NaN values in 'duration' column
origin_dest = origin_dest.dropna(subset=['duration_seconds'])
origin_dest['grid_id'] = pd.to_numeric(origin_dest['grid_id'], errors='coerce')

In [None]:
origin_dest_acc = origin_dest

In [None]:
# Apply Gaussian decay function to calculate the weight of each grid to healthcare 
# facilities based on the travel duration. d is the travel time and beta is the decay 
# parameter previously calculated.
# The weight decreases as the duration increases, meaning facilities that are further away have less impact.
origin_dest_acc['Weight'] = origin_dest_acc['duration_seconds'].apply(lambda d: round(math.exp(-d**2/beta), 8))

In [None]:
# Compute the Weighted Population (Pop_W), the population of each grid cell is multiplied 
# by the corresponding weight to calculate the weighted population.
origin_dest_acc['Pop_W'] = origin_dest_acc['population'] * origin_dest_acc['Weight']

In [None]:
origin_dest_acc

In [None]:
# Sum the Weighted Population
origin_dest_sum = origin_dest_acc.groupby(by='hcf_id')['Pop_W'].sum().reset_index()

In [None]:
origin_dest_sum

In [None]:
# Merge the Sum of Weighted Population Back into the Original Data
origin_dest_acc = origin_dest_acc.merge(origin_dest_sum, on='hcf_id')

In [None]:
origin_dest_acc

In [None]:
# supply value is set to 1 for simplicity (capacity of HCF)
# supply = 1
# in the future, we will link supply with ownership and EmOC service level
origin_dest_acc = origin_dest_acc.rename(columns={'Pop_W_y': 'Pop_W_S'})  # Pop_W_S: Population Weight Sum

Compute the Supply-Demand Ratio (Rj)
origin_dest_acc['supply_demand_ratio'] = 1 / origin_dest_acc.Pop_W_S
origin_dest_acc['supply_demand_ratio'].replace([np.inf, np.nan], 0, inplace=True)

In [None]:
supply_map = {
    'Public Comprehensive EmOC': 1,
    'Private Comprehensive EmOC': 0.7,
    'Public Basic EmOC': 0.5,
    'Private Basic EmOC': 0.35
}

In [None]:
origin_dest_acc['supply'] = origin_dest_acc['Local_Validation'].map(supply_map)
origin_dest_acc['supply_demand_ratio'] = origin_dest_acc['supply'] / origin_dest_acc['Pop_W_S']
origin_dest_acc['supply_demand_ratio'].replace([np.inf, -np.inf, np.nan], 0, inplace=True)

In [None]:
# Calculate Rj * Weight for Each Grid Cell
origin_dest_acc['supply_W'] = origin_dest_acc['supply_demand_ratio'] * origin_dest_acc.Weight

In [None]:
# Compute Accessibility Index (Ai) for Each Grid Cell
origin_dest_acc['Accessibility'] = origin_dest_acc.groupby('grid_id')['supply_W'].transform('sum')

In [None]:
# Normalize
from sklearn.preprocessing import MinMaxScaler

In [None]:
scaler = MinMaxScaler()
origin_dest_acc['Accessibility_standard'] = scaler.fit_transform(origin_dest_acc[['Accessibility']])

In [None]:
origin_dest_acc

In [None]:
max(origin_dest_acc.Accessibility_standard)

In [None]:
gdf = gpd.GeoDataFrame(origin_dest_acc, geometry='geometry', crs="EPSG:4326")
gpkg_path = data_outputs + 'acc_score_3_closet_Emoc_d10_w0.01_supply_weighted.gpkg'
gdf.to_file(gpkg_path, layer="acc_score_3_closet_Emoc_d10_w0.01_supply_weighted", driver="GPKG")

### Distribution Diagram

In [None]:
origin_dest_acc = gpd.read_file(data_outputs + 'acc_score_3_closet_Emoc_d10_w0.01_supply_weighted.gpkg')

In [None]:
# 1. distribution plot of duration
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
sns.displot(origin_dest_acc['duration_seconds']/60, kde=True)

In [None]:
plt.title('Distribution of Duration')
plt.xlabel('Travel time')
plt.ylabel('Frequency')
plt.show()          

In [None]:
# 3. distribution plot of population/duration
sns.scatterplot(x='Accessibility_standard', y='population', data=origin_dest_acc)

In [None]:
plt.xlabel('Accessibility Score')
plt.ylabel('Population')

In [None]:
plt.show()

In [None]:
plt.figure(figsize=(10, 6))
sns.histplot(data=origin_dest_acc, x='Local_Validation')

In [None]:
plt.title('Histogram')
plt.xlabel('Local Validation')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(20, 7))
sns.histplot(
    data=origin_dest_acc,
    x='facility_name',
    discrete=True,
    color='skyblue',
    edgecolor='black'
)

In [None]:
plt.title('Facility Name Distribution', fontsize=16)
plt.xlabel('Facility Name', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.xticks(rotation=60, ha='right', fontsize=10)
plt.tight_layout()
plt.show()

In [None]:
import geopandas as gpd
import seaborn as sns
import matplotlib.pyplot as plt
import re
import os
from matplotlib import gridspec

In [None]:
data_folder = 'data_outputs/acc_score_3_closest_Emoc'

In [None]:
file_names = [f for f in os.listdir(data_folder) if f.endswith('.gpkg')] 

In [None]:
fig = plt.figure(figsize=(20, 15))
gs = gridspec.GridSpec(4, 5, figure=fig)

In [None]:
y_labels = ['5min', '10min', '15min', '20min']
x_labels = [0.75, 0.5, 0.1, 0.05, 0.01]

In [None]:
for i, file_name in enumerate(file_names):
    file_path = os.path.join(data_folder, file_name)
    gdf = gpd.read_file(file_path)

    match = re.search(r'd(\d+)', file_name)
    if match:
        d_value = int(match.group(1))
        y_position = 3 - (d_value // 5 - 1)

    match = re.search(r'w(\d+\.\d+)', file_name)
    if match:
        w_value = float(match.group(1))
        x_position = x_labels.index(w_value)

    ax = fig.add_subplot(gs[y_position, x_position])
    
    sns.scatterplot(x='Accessibility_standard', y='population', data=gdf, ax=ax)
    ax.set_xlabel('Accessibility Score')
    ax.set_ylabel('Population')

In [None]:
plt.tight_layout()
output_image_path = 'data_outputs/output_image.png'
plt.savefig(output_image_path, bbox_inches='tight', dpi=300)

In [None]:
plt.show()

# 4. Grouping by grid ID to prepare the final output file
There is a need to update this part of the code

In [None]:
# Read the GeoPackage file (if starting from this section)
results_grid = gpd.read_file(data_outputs + 'acc_score_3_closet_Emoc_d10_w0.01_supply_weighted.gpkg')
results_grid = results_grid[['grid_id', 'origin_lon', 'origin_lat', 'origin_lon_min', 'origin_lat_min', 'origin_lon_max', 'origin_lat_max', 'Accessibility_standard', 'geometry']]

Group by multiple columns and calculate the mean for numeric columns
results_grid = results_grid.groupby(['grid_id', 'origin_lon', 'origin_lat', 'origin_lon_min', 'origin_lat_min', 'origin_lon_max', 'origin_lat_max', 'Accessibility_standard']).count().reset_index()

In [None]:
results_grid = results_grid.drop_duplicates(['grid_id', 'origin_lon', 'origin_lat', 'origin_lon_min', 'origin_lat_min', 'origin_lon_max', 'origin_lat_max', 'Accessibility_standard', 'geometry'])

In [None]:
type(results_grid)

In [None]:
results_grid

### Setting values for Low medium and High categories

We started by defining equal value division, and modified the thesholds to a value that is more legible and easier to interpret. Every model should have their own thresholds based on the data distribution of the three categories. 

Note: For Kano, we excluded grid cells with index values below 0.000001 that indicated very low population and a small number of buildings.  

In [None]:
results_grid['result'] = -1
results_grid.loc[results_grid['Accessibility_standard'] > 0.000001, 'result'] = 2
results_grid.loc[results_grid['Accessibility_standard'] > 0.005, 'result'] = 1
results_grid.loc[results_grid['Accessibility_standard'] > 0.02, 'result'] = 0

### Setting values for focus areas

We defined the focus areas based on values for the different thresholds. We aim at participants helping us to confirm the selection of the city-specific thresholds.

In [None]:
results_grid['focused'] = 0
# Focus areas between the Low category and the excluded cells due to low population or no buildings
results_grid.loc[(results_grid['Accessibility_standard'] > 0.000001) & (results_grid['Accessibility_standard'] < 0.0000015), 'focused'] = 1
# Focus areas between the Medium and High categories
results_grid.loc[(results_grid['Accessibility_standard'] > 0.003) & (results_grid['Accessibility_standard'] < 0.006), 'focused'] = 1
# Focus areas between the Low and Medium categories
results_grid.loc[(results_grid['Accessibility_standard'] > 0.019) & (results_grid['Accessibility_standard'] < 0.03), 'focused'] = 1

In [None]:
results_grid

In [None]:
results_grid = results_grid.loc[results_grid['result'] != -1]

In [None]:
results_grid = results_grid.rename(columns={
    'origin_lon': 'longitude',
    'origin_lat': 'latitude',
    'origin_lon_min': 'lon_min',
    'origin_lat_min': 'lat_min',
    'origin_lon_max': 'lon_max',
    'origin_lat_max': 'lat_max'
})

In [None]:
results_grid

In [None]:
# Save the results to a new GeoPackage file
output_gpkg_path = data_outputs + 'emergency-maternal-care-deprivation-access.gpkg'
results_grid.to_file(output_gpkg_path, layer='emergency-maternal-care-deprivation-access', driver='GPKG')

In [None]:
# Save the results to a CSV file in the format required by the IDEAMAPS data ecosystem
results_table = results_grid.drop(columns=['Accessibility_standard', 'grid_id', 'geometry'])
results_table.to_csv(model_outputs + 'model-output.csv', index=False)

In [None]:
results_table