# Analysis of Emergency Obstetric Care (EmOC) in Nairobi, Kenya
> Note: This notebook requires the [environment dependencies](requirements.txt) to be installed
> as well as either an [openrouteservice API key](https://openrouteservice.org/dev/#/signup) or a local instance of the ORS server.


## Model Summary:

This notebook provides the means to generate a dataset that is described in the [model documentation](../nairobi/dataset-interpretability.md).

## Workflow Summary:

The notebook gives an overview of the distribution of centres offering EmOC in the city, their classification and how they can be accessed during an emergency. Open source data from OpenStreetMap and tools (such as the openrouteservice) were used to create accessibility measures. Spatial analysis and other data analytics functions led to generating outputs within the 100x100m grid cells that categorised them into three levels: low, medium, and high.

* **Preprocessing**: Get data for EmOC facilities.
* **Analysis for Offer**:
    * Filter or classify EmOC facilities based on discussed criteria.
    * Visualise EmOC faccilities in their categories.
* **Analysis for Accessibility**:
    * Compute travel times to facilities using openrouteservice API or other routing services.
    * Generate areas for low, medium and high categories based on discussed criteria.
* **Analysis for Demmand**:
    * Downscale the popluation data to the 100x100m grid cells.
    * Derive socio-economic descriptors based on discussed criteria.

* **Result**: Generate results as GIS-compatible files.


### Datasets and Tools:
* [openrouteservice](https://openrouteservice.org/) - generate isochrones on the OpenStreetMap road network


#  Workflow

Make sure you have the required packages installed. You can install them using pip:

```bash
pip install -r requirements.txt
```

This study integrates various Python geospatial analysis libraries and packages to support spatial data processing, visualization, and isochrone generation. The os module is used to interact with the operating system, managing file paths and reading environment variables such as API keys. folium library along with its MarkerCluster plugin, facilitates the creation of interactive maps for visualizing large-scale geospatial data. The openrouteservice.client serves as an interface to the OpenRouteService API, enabling the extraction of isochrones. pandas library for data analysis, provides functions for analyzing, cleaning, exploring, and manipulating data, while fiona supports reading and writing real-world data using multi-layered GIS formats, such as shapefiles. The shapely package is employed for the manipulation and analysis of planar geometric objects.

## Setting up the virtual environment

```bash
# Create a new virtual environment
python -m venv .venv
activate .venv/bin/activate
pip install -r requirements.txt
```

## To run your notebook in VS Code

```bash
pip install -U ipykernel
python -m ipykernel install --user --name=.venv
```

In [7]:
import geopandas as gpd
import os
import numpy as np
import pandas as pd

import openrouteservice
from dotenv import load_dotenv

import rasterio
from rasterio.mask import mask

from shapely.geometry import Point

from pathlib import Path
from shapely.geometry import Polygon

import requests
import math
from math import *
from sklearn.preprocessing import MinMaxScaler

## Preprocessing


### Setting up the public API Key from OpenRouteService
In this study, users must obtain an ORS Matrix API key from the [OpenRouteService](https://openrouteservice.org/) platform and subsequently interacted with the OpenRouteService API through the instantiation of the OpenRouteService client. This is the OpenRouteService [API documentation](https://openrouteservice.org/dev/#/api-docs/introduction) for ORS Core-Version 9.0.0. 

Generate a [API Key](https://openrouteservice.org/dev/#/home?tab=1) (Token) it is necessary to sign up at the OpenRouteService dashboard by using your E-mail address or sign up with your GitHub. After logging in, go to the Dashboard by clicking on your profile icon and navigate to the API Keys section. Click "Create API Key" to generate a free key and then choose a service plan (the free plan has limited requests per day). Copy the API Key and store it securely. 

OpenRouteService primarily uses API keys for authentication. However, if a token is required for certain endpoints, you can send a request with your API key in the Authorization header. This process facilitated various geospatial analysis functions, including isochrone generation.

### Option 1: Using an ORS API Key
Make sure you have a .env file in the root directory with the following content:
```bash
    OPENROUTESERVICE_API_KEY='your_api_key'
```

In [None]:
# Read the api key from the .env file
%load_ext dotenv
%dotenv
api_key = os.getenv('OPENROUTESERVICE_API_KEY')
client = openrouteservice.Client(key=api_key)

### Setting up relevant processing folders

There are different data sources used across the notebook. To handle these data sets, it is recommended to use three directories for input, temp and output data. Some of the files are related to healthcare facilities, population data. The healthcare facilities data is usualy the result of gathering global or national datasets and then carrying out local validation according to the local context. 

Despite being official, administrative boundaries may not reflect the actual patterns of human settlement or economic activity. Therefore, the team used the Functional Urban Area (FUA) as a complementary definition of the study areas. The FUA is defined by [the Joint Research Centre of the European Commission](https://commission.europa.eu/about/departments-and-executive-agencies/joint-research-centre_en) as the actual urban sprawl and human activities, encompassing the core city and economically or socially integrated surrounding regions. The FUA was obtained from [the Global Human Settlement Layer (GHSL) ](https://human-settlement.emergency.copernicus.eu/)dataset, which provides spatial data for functional urban areas worldwide. 

The following datasets are considered as input data for the analysis:

* [Datasets of health facilities](../scripts/Nairobi/data-inputs/helthcare_facilities.geojson) 
* [Population: Women in childbearing age](../scripts/Nairobi/data-inputs/population.geojson) from [WorldPop](https://hub.worldpop.org/geodata/summary?id=18401)
* [Study Area](../../../docs/study-areas/grid-boundary-nairobi.gpkg) defined by the IDEAMAPS team

In [3]:
# Set paths to access data
# Define directories
data_inputs = '../scripts/Nairobi/data-inputs/'
data_temp = '../scripts/Nairobi/data-temp/'
data_outputs = '../nairobi/'

## 1. Data Collection

### Validated healthcare facilities - (Supply/Offer)
For Nairobi, the classification for validation was determined with the assistance of local experts, based on data obtained from from [the Kenya Master Health Facility Registry (KMHFR)](https://kmhfr.health.go.ke/) website.

In [4]:
healthcare_facilities_validated = gpd.read_file(data_inputs + 'helthcare_facilities.geojson')

In [5]:
healthcare_facilities_validated

Unnamed: 0,fid,field_1_1,Admin1,Facility_n,Facility_t,Ownership,Lat,Long,LL_source,Ward,Sub County,Level,Offer EmOC (Y/N),Type of EmOC (basic/comprehensive),Total In-patient beds,Maternity beds,Maternity theatres,Operating time,Comments,geometry
0,1,Kenya,Nairobi,Dandora II Health Centre,Health Centre,Local authority,-1.257610,36.887750,GPS,Dandora Area III,Embakasi North,Level 3,True,BMOC,15,,,24hrs,,POINT (36.88775 -1.25761)
1,2,Kenya,Nairobi,Embakasi Health Centre,Health Centre,Local authority,-1.306200,36.914700,GPS,Embakasi,Embakasi East,Level 3,True,CeMOC & BMOC,,,,24hrs,,POINT (36.9147 -1.3062)
2,3,Kenya,Nairobi,Imara Health Centre,Health Centre,FBO,-1.324300,36.888700,Google Earth,Kware,Embakasi South,Level 3,True,BMOC,6,,,24hrs & weekends,,POINT (36.8887 -1.3243)
3,4,Kenya,Nairobi,Jonalifa Clinic,Clinic,NGO,-1.285900,36.739880,Google Earth,Kawangware,Dagoretti South,Level 2,True,BMOC,8,,,24hrs & weekends,,POINT (36.73988 -1.2859)
4,5,Kenya,Nairobi,Kangemi Health Centre,Health Centre,Local authority,-1.264600,36.749300,GPS,Kitisuru,Westlands,Level 3,True,BMOC,,,,24hrs,,POINT (36.7493 -1.2646)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
148,150,Kenya,Nairobi,Nairobi Hospital,Comprehensive Teaching & Tertiary Referral Hos...,Private,-1.294743,36.800247,KMHFR,Kilimani,Dagoretti North,Level 5,True,CeMOC&BMOC,350,,,24hrs+weekends,,POINT (36.80025 -1.29474)
149,151,Kenya,Nairobi,Al-Basrah Hospital Limited,Primary care hospitals,Private,-1.255881,36.876857,KMHFR,Kiamaiko,Mathare,Level 4,True,BMOC,,10,1,24hrs+weekends,,POINT (36.87686 -1.25588)
150,152,Kenya,Nairobi,St. Oswald Medical Centre,Medical Center,Private,-1.290367,36.729885,KMHFR,Riruta,Dagoretti South,Level 3,True,BMOC,12,2,,24hrs+weekends,,POINT (36.72989 -1.29037)
151,153,Kenya,Nairobi,Nairobi West Hospital,Primary Care hospitals,Private,-1.306840,36.825760,KMHFR,Nairobi West,Langata,Level 4,True,BMOC,152,,,24hrs+weekends,,POINT (36.82576 -1.30684)


### Population Grid Data (Demand)
This data originally comes as a grid (1km resolution) from [WorldPop](https://hub.worldpop.org/geodata/summary?id=18401) to transform it into a 100x100m grid, we use a procedure explained below. 

Note: explain the process to scale down the population data. 
note: explain the rational for female population between 15-49 years old

In [None]:
study_area = gpd.read_file(data_inputs + '100mGrid.gpkg')
raster_path = data_inputs + 'ken_f_15_49_2015_1km.tif'

Clipping the population data to our study area

In [None]:
with rasterio.open(raster_path) as dataset:
    geometries = [study_area.geometry.unary_union.__geo_interface__]
    clipped_image, clipped_transform = mask(dataset, geometries, crop=True)
    band1 = clipped_image[0] # Read the first band of the raster

In [None]:
out_meta = dataset.meta.copy()
out_meta.update({
        "height": clipped_image.shape[1],
        "width": clipped_image.shape[2],
        "transform": clipped_transform
    })

In [None]:
with rasterio.open(data_inputs + 'nairobi_nga_f_15_49_2015_1km.tif', "w", **out_meta) as dest:
    dest.write(clipped_image)

Calculating the centroids for grid cells

In [None]:
rows, cols = np.where(band1 > 0)
grid_cells = [clipped_transform * (col + 0.5, row + 0.5) for row, col in zip(rows, cols)]
population_values = band1[rows, cols]

In [None]:
grid_df = pd.DataFrame(grid_cells, columns=["longitude", "latitude"])
grid_df["population"] = population_values

grid_df["grid_code"] = range(len(grid_df))
population_centroids_gdf = gpd.GeoDataFrame(grid_df, geometry=[Point(xy) for xy in zip(grid_df["longitude"], grid_df["latitude"])])
population_centroids_gdf.set_crs("EPSG:4326", inplace=True)

population_centroids_gdf.to_file(data_temp + "population_centroids.gpkg", driver="GPKG")

In [None]:
population_centroids_gdf

Unnamed: 0,longitude,latitude,population,grid_code,geometry
0,8.472917,12.024583,3054.140869,53937,POINT (8.47292 12.02458)
1,8.481250,12.024583,3737.392578,17840,POINT (8.48125 12.02458)
2,8.489583,12.024583,5439.910156,54264,POINT (8.48958 12.02458)
3,8.497917,12.024583,4532.464355,18673,POINT (8.49792 12.02458)
4,8.506250,12.024583,1577.261353,76531,POINT (8.50625 12.02458)
...,...,...,...,...,...
138,8.539583,11.941250,1152.257324,76416,POINT (8.53958 11.94125)
139,8.547917,11.941250,1221.242432,63699,POINT (8.54792 11.94125)
140,8.556250,11.941250,1514.486938,35666,POINT (8.55625 11.94125)
141,8.564583,11.941250,1321.376831,62382,POINT (8.56458 11.94125)


### Adding population data at 1km grid to 100m grid

In [None]:
# reading in geotiff file as numpy array
def read_tif(file: Path):
    if not file.exists():
        raise FileNotFoundError(f'File {file} not found')

    with rasterio.open(file) as dataset:
        arr = dataset.read()  # (bands X height X width)
        nodata = dataset.nodata
        transform = dataset.transform
        crs = dataset.crs

    # Replace NoData value with NaN
    if nodata is not None:
        arr[arr == nodata] = np.nan

    return arr.transpose((1, 2, 0)), transform, crs

def raster2vector(arr, transform, crs) -> gpd.GeoDataFrame:
    height, width, bands = arr.shape

    # Generate pixel coordinates
    geometries = []
    pixel_values = []

    for row in range(height):
        for col in range(width):
            x_min, y_max = transform * (col, row)  # Top-left corner
            x_max, y_min = transform * (col + 1, row + 1)  # Bottom-right corner

            pixel_value = arr[row, col].tolist()[0]  # Convert numpy array to list
            polygon = Polygon([(x_min, y_max), (x_max, y_max), (x_max, y_min), (x_min, y_min)])

            geometries.append(polygon)
            pixel_values.append(pixel_value)

    # Convert to DataFrame
    gdf = gpd.GeoDataFrame({'pop_grid_pop': pixel_values, 'geometry': geometries}, crs=crs)

    return gdf

epsg = 'EPSG:32632'

In [None]:
# Preparing grid
grid_file = data_inputs + '100mGrid.gpkg'
grid = gpd.read_file(grid_file)
grid = grid.to_crs(epsg)
grid['grid_id'] = range(len(grid))
grid = grid[['grid_id', 'geometry','rowid', 'latitude', 'lat_min', 'lat_max', 'longitude', 'lon_min','lon_max']].set_geometry('geometry')
grid

Building footprint data is used to estimate population distribution within each 1km cell. Building centroids are spatially joined to 100m grid cells. The number of buildings per 100m cell (bcount) is calculated.

In [None]:
# Count buildings per grid cell

# Loading Google building footprints
building_file = data_inputs + 'Nairobi_GOBv3.gpkg'
buildings = gpd.read_file(building_file)
buildings = buildings.to_crs(epsg)
buildings['centroid'] = buildings['geometry'].centroid

# Joining buildings to grid
grid_buildings = grid.sjoin(buildings.set_geometry('centroid').drop(columns='geometry'), how='inner', predicate='intersects')
grid_buildings = grid_buildings.groupby('grid_id')

# Counting buildings per grid
building_counts = grid_buildings.size().rename('bcount')

# Adding building count to grid cells
grid = grid.merge(building_counts, on='grid_id', how='left')

# Assign building count 0 to cells with no buildings (NaN)
grid['bcount'] = grid['bcount'].fillna(0)
grid

The population of each 1km grid is distributed to underlying 100m cells proportionally based on building density. Each 100m grid is assigned a weight equal to its share of the total building count within the 1km grid.

In [None]:
# Adding population data at 1km grid to finer grid
data_path = Path(data_inputs)

# Loading coarse pop data
pop_file = data_path / 'nairobi_nga_f_15_49_2015_1km.tif'
pop_raster, transform, crs = read_tif(pop_file)

# Converting the raster grid to vector data
pop_grid = raster2vector(pop_raster, transform, crs)
pop_grid = pop_grid.to_crs(epsg)
pop_grid['pop_grid_id'] = range(len(pop_grid))

# Assign coarse population data to finer grid based on the centroid locations of the finer grid cells
grid['centroid'] = grid['geometry'].centroid
grid = gpd.sjoin(grid.set_geometry('centroid'), pop_grid, how='left', predicate='within')
print(grid.columns)
grid = grid[['grid_id', 'bcount', 'pop_grid_id', 'geometry','rowid', 'latitude', 'lat_min', 'lat_max',
       'longitude', 'lon_min', 'lon_max']]
grid.head()

In [None]:
# Calculate population weight (fraction of total population count that should be assigned to cell based on its building count)
grid_grouped_pop = grid.groupby('pop_grid_id')
building_count_pop = grid_grouped_pop['bcount'].sum().rename('pop_grid_bcount')
grid = grid.merge(building_count_pop, on='pop_grid_id', how='left')
grid['pop_weight'] = grid['bcount'] / grid['pop_grid_bcount']

# Compute disaggregated population count based on weight and building count at coarser cell level
grid = grid.merge(pop_grid, on='pop_grid_id', how='left')
grid['pop'] = grid['pop_grid_pop'] * grid['pop_weight']
grid.head()

In [None]:
# Saving to file
grid = grid.drop(columns=["geometry_y"])
grid.head()

In [None]:
grid = grid.set_geometry("geometry_x")
grid = grid.to_crs(4326)
grid.to_file(data_temp + 'pop-grid-nairobi-centroids.gpkg', driver='GPKG')

## 2. Spatial Analysis Pipeline 



### Travel time and dista calculation using OpenRouteService (ORS)
Using OpenRouteService (ORS) Matrix API to calculate the travel time and distance from each population grid centroid to the healthcare facility. There are two options to process the time and distance calculations: Using the public ORS API or using a local instance of the ORS server.

note: this will generate a file 'OD_matrix_healthcare_pop_grid'

In [None]:
origin_gdf = population_centroids_gdf
origin_name_column = 'grid_code'
destination_gdf = healthcare_facilities_validated.dropna(subset=['geometry'])
destination_name_column = 'facility_name'

In [None]:
origins = list(zip(origin_gdf.geometry.x, origin_gdf.geometry.y))

In [None]:
destinations = list(zip(destination_gdf.geometry.x, destination_gdf.geometry.y))

In [None]:
locations = origins + destinations

In [None]:
origins_index = list(range(0, len(origins)))
destinations_index = list(range(len(origins), len(locations)))

In [None]:
body = {'locations': locations,
       'destinations': destinations_index,
       'sources': origins_index,
       'metrics': ['distance', 'duration']}

headers = {
    'Accept': 'application/json, application/geo+json, application/gpx+xml, img/png; charset=utf-8',
    'Authorization': api_key,
    'Content-Type': 'application/json; charset=utf-8'
}

response = requests.post('https://api.openrouteservice.org/v2/matrix/driving-car', json=body, headers=headers)

In [None]:
distances = response.json().get('distances', [])
durations = response.json().get('durations', [])

In [None]:
distances_duration_matrix = []

# Iterate over each origin (grid)
for origin_index, origin in origin_gdf.iterrows():
    origin_name = origin[origin_name_column]
    origin_x = origin.geometry.x
    origin_y = origin.geometry.y
    origin_distances = distances[origin_index]
    origin_durations = durations[origin_index]

    # find the minimum duration and the index of the minimum duration
    min_duration = min(origin_durations)
    min_index = origin_durations.index(min_duration)
    destination_index = destinations_index[min_index]
    dest_x, dest_y = locations[destination_index]
    filtered = healthcare_facilities_validated[(destination_gdf.geometry.x == dest_x) & (destination_gdf.geometry.y == dest_y) ]
    destination_row = filtered.iloc[0]
    dest_name = destination_row[destination_name_column]

        # Append both the distance and duration for this origin-destination pair
    distances_duration_matrix.append([
            origin_name, origin_y, origin_x,
            dest_name, dest_y, dest_x,
            min_duration
        ])

In [None]:
# Convert the results into a DataFrame
matrix_df = pd.DataFrame(distances_duration_matrix, columns=[
    'grid_code','origin_lat', 'origin_lon',
    'destination_name', 'dest_lat', 'dest_lon','min_duration'
])

In [None]:
# Save to CSV
merged_df = pd.merge(matrix_df, grid_df[['grid_code', 'population']], on='grid_code', how='left')
merged_df.to_csv(data_temp + 'distance_duration_matrix_temp.csv', index=False)

In [None]:
merged_df

Unnamed: 0,grid_code,origin_lat,origin_lon,destination_name,dest_lat,dest_lon,min_duration,population
0,53937,12.024583,8.472917,Sabo Bakin Zuwo General Hospital,12.00065,8.50923,686.79,3054.140869
1,17840,12.024583,8.481250,Sabo Bakin Zuwo General Hospital,12.00065,8.50923,722.17,3737.392578
2,54264,12.024583,8.489583,Sabo Bakin Zuwo General Hospital,12.00065,8.50923,569.70,5439.910156
3,18673,12.024583,8.497917,Sabo Bakin Zuwo General Hospital,12.00065,8.50923,395.79,4532.464355
4,76531,12.024583,8.506250,Sabo Bakin Zuwo General Hospital,12.00065,8.50923,411.43,1577.261353
...,...,...,...,...,...,...,...,...
138,76416,11.941250,8.539583,Maxcare Clinic,11.96792,8.54314,543.74,1152.257324
139,63699,11.941250,8.547917,Maxcare Clinic,11.96792,8.54314,419.18,1221.242432
140,35666,11.941250,8.556250,Maxcare Clinic,11.96792,8.54314,295.33,1514.486938
141,62382,11.941250,8.564583,Maxcare Clinic,11.96792,8.54314,265.40,1321.376831


In [None]:
geometry = [Point(xy) for xy in zip(merged_df['dest_lon'], merged_df['dest_lat'])]
gdf = gpd.GeoDataFrame(merged_df, geometry=geometry, crs="EPSG:4326")

gpkg_path = data_temp + 'distance_duration_matrix_temp.gpkg'
gdf.to_file(gpkg_path, layer="duration_matrix", driver="GPKG")

### Option 2: Using a local ORS service
Make sure you have set a local service that runs the OSM-based ORS API. 
```r
# Insert R code from the local ORS service
```


### Procedure for Computing the OD Matrix Using a Local Docker Environment

This section outlines the steps required to compute the Origin-Destination (OD) matrix using a local Docker environment. 

1. **Set Up Docker Environment**:

2. **Prepare Input Data**:

3. **Run the OD Matrix Computation Script**:

4. **Monitor the Process**:

5. **Retrieve and Validate Output**:

### Diego please add description here

## Processing OD Matrix

Population data is the result of combining 1km grid data with 100m grid data. See [Section 2]() for more details.

In [17]:
# If not loaded yet, read from the temporary folder
centroids_df = gpd.read_file(data_temp +'pop-grid-nairobi-centroids.gpkg')
centroids_df

Unnamed: 0,grid_id,bcount,pop_grid_id,pop_grid_bcount,pop_weight,pop_grid_pop,pop,geometry
0,0.0,0.0,756225.0,72.0,0.000000,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613..."
1,1.0,0.0,756225.0,72.0,0.000000,13.660351,0.000000,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613..."
2,2.0,0.0,756225.0,72.0,0.000000,13.660351,0.000000,"POLYGON ((36.94105 -1.11694, 36.94105 -1.11613..."
3,3.0,0.0,756225.0,72.0,0.000000,13.660351,0.000000,"POLYGON ((36.94205 -1.11694, 36.94204 -1.11613..."
4,4.0,3.0,756225.0,72.0,0.041667,13.660351,0.569181,"POLYGON ((36.94305 -1.11694, 36.94304 -1.11613..."
...,...,...,...,...,...,...,...,...
110458,110458.0,0.0,790694.0,56.0,0.000000,10.455681,0.000000,"POLYGON ((36.784 -1.4154, 36.78399 -1.41459, 3..."
110459,110459.0,2.0,790694.0,56.0,0.035714,10.455681,0.373417,"POLYGON ((36.785 -1.4154, 36.78499 -1.41459, 3..."
110460,110460.0,10.0,790695.0,116.0,0.086207,9.803167,0.845101,"POLYGON ((36.786 -1.4154, 36.78599 -1.41459, 3..."
110461,110461.0,0.0,790695.0,116.0,0.000000,9.803167,0.000000,"POLYGON ((36.787 -1.4154, 36.78699 -1.41459, 3..."


In [19]:
# If not loaded yet, read from the temporary folder
matrix_df = pd.read_csv(data_temp + 'OD-matrix-nairobi-access-emoc.csv')
matrix_df

Unnamed: 0,origin_id,destination_id,duration_seconds,distance_km
0,0.0,1,1879.01,23.50
1,0.0,2,2324.84,30.55
2,0.0,3,2550.01,39.14
3,0.0,4,2545.13,39.87
4,0.0,5,2140.52,37.05
...,...,...,...,...
16900834,110462.0,150,2372.74,23.04
16900835,110462.0,151,3031.99,36.25
16900836,110462.0,152,2740.29,28.02
16900837,110462.0,153,2337.02,22.80


**GRID CELLS WITHOUT TRAVEL TIME ESTIMATE**

If a grid cell has a NULL value in the travel estimate, we will remove it from the analysis. This is because we cannot calculate the 2SFCA without a travel time estimate.

In [None]:
# Removing rows with NaN values in the 'duration_seconds' column
matrix_df = matrix_df.dropna(subset=['duration_seconds'])
matrix_df

To process the OD Matrix we need merge it to create an integrated dataset that combines data from the healthcare facilities and population grid.For doing so, we will use the pandas library and join functions based on the id columns of all datasets.

In [None]:
pop_centroids_hcf = pd.merge(matrix_df, centroids_df[['grid_id', 'bcount', 'pop_grid_bcount', 'pop_grid_pop', 'pop', 'geometry']], 
                     left_on='origin_id', right_on='grid_id', how='left')
pop_centroids_hcf

Unnamed: 0,origin_id,destination_id,duration_seconds,distance_km,grid_id,bcount,pop_grid_bcount,pop_grid_pop,pop,geometry
0,0.0,1,1879.01,23.50,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613..."
1,0.0,2,2324.84,30.55,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613..."
2,0.0,3,2550.01,39.14,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613..."
3,0.0,4,2545.13,39.87,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613..."
4,0.0,5,2140.52,37.05,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613..."
...,...,...,...,...,...,...,...,...,...,...
16900834,110462.0,150,2372.74,23.04,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,..."
16900835,110462.0,151,3031.99,36.25,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,..."
16900836,110462.0,152,2740.29,28.02,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,..."
16900837,110462.0,153,2337.02,22.80,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,..."


Merging the dataframe than contains the od matrix (with the healthcare facility class) and the population data with the full information about health care facilities.

In [55]:
distances_duration_matrix = pd.merge(pop_centroids_hcf, 
                                     healthcare_facilities_validated[['fid', 'Facility_n', 'Facility_t', 'Ownership', 
                                                                      'Lat', 'Long', 'Level', 'Offer EmOC (Y/N)',
                                                                      'Total In-patient beds', 'Maternity beds', 
                                                                      'Maternity theatres', 'Maternity beds', 
                                                                      'Maternity theatres', 'Operating time', 
                                                                      'Type of EmOC (basic/comprehensive)', 'geometry']], 
                     left_on='destination_id', right_on='fid', how='left')
distances_duration_matrix

Unnamed: 0,origin_id,destination_id,duration_seconds,distance_km,grid_id,bcount,pop_grid_bcount,pop_grid_pop,pop,geometry_x,...,Level,Offer EmOC (Y/N),Total In-patient beds,Maternity beds,Maternity theatres,Maternity beds.1,Maternity theatres.1,Operating time,Type of EmOC (basic/comprehensive),geometry_y
0,0.0,1,1879.01,23.50,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",...,Level 3,True,15,,,,,24hrs,BMOC,POINT (36.88775 -1.25761)
1,0.0,2,2324.84,30.55,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",...,Level 3,True,,,,,,24hrs,CeMOC & BMOC,POINT (36.9147 -1.3062)
2,0.0,3,2550.01,39.14,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",...,Level 3,True,6,,,,,24hrs & weekends,BMOC,POINT (36.8887 -1.3243)
3,0.0,4,2545.13,39.87,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",...,Level 2,True,8,,,,,24hrs & weekends,BMOC,POINT (36.73988 -1.2859)
4,0.0,5,2140.52,37.05,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",...,Level 3,True,,,,,,24hrs,BMOC,POINT (36.7493 -1.2646)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16900834,110462.0,150,2372.74,23.04,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",...,Level 5,True,350,,,,,24hrs+weekends,CeMOC&BMOC,POINT (36.80025 -1.29474)
16900835,110462.0,151,3031.99,36.25,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",...,Level 4,True,,10,1,10,1,24hrs+weekends,BMOC,POINT (36.87686 -1.25588)
16900836,110462.0,152,2740.29,28.02,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",...,Level 3,True,12,2,,2,,24hrs+weekends,BMOC,POINT (36.72989 -1.29037)
16900837,110462.0,153,2337.02,22.80,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",...,Level 4,True,152,,,,,24hrs+weekends,BMOC,POINT (36.82576 -1.30684)


In [57]:
def local_validation(row):
    if row["Ownership"] in ["Local authority", "MoH"] and row["Type of EmOC (basic/comprehensive)"] == "BMOC":
        return "Public Basic EmOC"
    elif row["Ownership"] in ["Local authority", "MoH"] and row["Type of EmOC (basic/comprehensive)"] != "BMOC":
        return "Public Comprehensive EmOC"
    elif row["Ownership"] not in ["Local authority", "MoH"] and row["Type of EmOC (basic/comprehensive)"] == "BMOC":
        return "Private Basic EmOC"
    elif row["Ownership"] not in ["Local authority", "MoH"] and row["Type of EmOC (basic/comprehensive)"] != "BMOC":
        return "Private Comprehensive EmOC"
    else:
        return "Other"

distances_duration_matrix["Local_Validation"] = distances_duration_matrix.apply(local_validation, axis=1)

In [58]:
distances_duration_matrix

Unnamed: 0,origin_id,destination_id,duration_seconds,distance_km,grid_id,bcount,pop_grid_bcount,pop_grid_pop,pop,geometry_x,...,Offer EmOC (Y/N),Total In-patient beds,Maternity beds,Maternity theatres,Maternity beds.1,Maternity theatres.1,Operating time,Type of EmOC (basic/comprehensive),geometry_y,Local_Validation
0,0.0,1,1879.01,23.50,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",...,True,15,,,,,24hrs,BMOC,POINT (36.88775 -1.25761),Public Basic EmOC
1,0.0,2,2324.84,30.55,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",...,True,,,,,,24hrs,CeMOC & BMOC,POINT (36.9147 -1.3062),Public Comprehensive EmOC
2,0.0,3,2550.01,39.14,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",...,True,6,,,,,24hrs & weekends,BMOC,POINT (36.8887 -1.3243),Private Basic EmOC
3,0.0,4,2545.13,39.87,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",...,True,8,,,,,24hrs & weekends,BMOC,POINT (36.73988 -1.2859),Private Basic EmOC
4,0.0,5,2140.52,37.05,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",...,True,,,,,,24hrs,BMOC,POINT (36.7493 -1.2646),Public Basic EmOC
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16900834,110462.0,150,2372.74,23.04,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",...,True,350,,,,,24hrs+weekends,CeMOC&BMOC,POINT (36.80025 -1.29474),Private Comprehensive EmOC
16900835,110462.0,151,3031.99,36.25,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",...,True,,10,1,10,1,24hrs+weekends,BMOC,POINT (36.87686 -1.25588),Private Basic EmOC
16900836,110462.0,152,2740.29,28.02,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",...,True,12,2,,2,,24hrs+weekends,BMOC,POINT (36.72989 -1.29037),Private Basic EmOC
16900837,110462.0,153,2337.02,22.80,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",...,True,152,,,,,24hrs+weekends,BMOC,POINT (36.82576 -1.30684),Private Basic EmOC


In [59]:
# Step 1: creat subsets based on categories of 'Validation of HCFs Categorization'
categories = {
    "public_comprehensive_EmOC": ["Public Comprehensive EmOC"],
    "private_comprehensive_EmOC": ["Private Comprehensive EmOC"],
    "private_basic_EmOC": ["Private Basic EmOC"],
    "public_basic_EmOC": ["Public Basic EmOC"]
}

subsets = {
    key: distances_duration_matrix[
        distances_duration_matrix['Local_Validation'].str.contains('|'.join(values), na=False)
    ]
    for key, values in categories.items()
}

public_CEmOC = subsets["public_comprehensive_EmOC"]
private_CEmOC = subsets["private_comprehensive_EmOC"]
public_BEmOC = subsets["public_basic_EmOC"]
private_BEmOC = subsets["private_basic_EmOC"]

We will select 3 facilities for each gird cell

In [None]:
# Step 2: Define a function to get 3 smallest duration_seconds per grid_id for each category
def get_closest_3(df, n=3):
    return df.groupby('grid_id').apply(lambda x: x.nsmallest(n, 'duration_seconds')).reset_index(drop=True)
                      
# If the subsets are already created for each category, we apply the function to each subset:
public_CEmOC_closest_3 = get_closest_3(public_CEmOC)
private_CEmOC_closest_3 = get_closest_3(private_CEmOC)
public_BEmOC_closest_3 = get_closest_3(public_BEmOC)
private_BEmOC_closest_3 = get_closest_3(private_BEmOC)

# Step 4: Concatenate the filtered results into a single DataFrame
distances_duration_matrix = pd.concat([
    public_CEmOC_closest_3, private_CEmOC_closest_3,
    public_BEmOC_closest_3, private_BEmOC_closest_3
])

In [65]:
distances_duration_matrix

Unnamed: 0,origin_id,destination_id,duration_seconds,distance_km,grid_id,bcount,pop_grid_bcount,pop_grid_pop,pop,geometry_x,...,Offer EmOC (Y/N),Total In-patient beds,Maternity beds,Maternity theatres,Maternity beds.1,Maternity theatres.1,Operating time,Type of EmOC (basic/comprehensive),geometry_y,Local_Validation
0,0.0,28,1756.83,28.36,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",...,True,268,,2,,2,24hrs+weekends,CeMOC & BMOC,POINT (36.84552 -1.28028),Public Comprehensive EmOC
1,0.0,11,1925.29,27.34,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",...,True,188,17,3,17,3,24hrs+weekends,CeMOC & BMOC,POINT (36.89899 -1.27385),Public Comprehensive EmOC
2,0.0,14,2151.95,33.91,0.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",...,True,200,20,1,20,1,24hrs+weekends,CeMOC & BMOC,POINT (36.8022 -1.3079),Public Comprehensive EmOC
3,1.0,28,1757.05,28.36,1.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613...",...,True,268,,2,,2,24hrs+weekends,CeMOC & BMOC,POINT (36.84552 -1.28028),Public Comprehensive EmOC
4,1.0,11,1925.52,27.34,1.0,0.0,72.0,13.660351,0.000000,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613...",...,True,188,17,3,17,3,24hrs+weekends,CeMOC & BMOC,POINT (36.89899 -1.27385),Public Comprehensive EmOC
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
331384,110461.0,47,2189.94,21.97,110461.0,0.0,116.0,9.803167,0.000000,"POLYGON ((36.787 -1.4154, 36.78699 -1.41459, 3...",...,True,,,,,,24hrs & weekends,BMOC,POINT (36.8135 -1.31681),Private Basic EmOC
331385,110461.0,103,2254.74,21.56,110461.0,0.0,116.0,9.803167,0.000000,"POLYGON ((36.787 -1.4154, 36.78699 -1.41459, 3...",...,True,4,4,0,4,0,24hrs & weekends,BMOC,POINT (36.786 -1.29655),Private Basic EmOC
331386,110462.0,38,2149.30,19.90,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",...,True,15,,,,,24hrs+weekends,BMOC,POINT (36.77929 -1.31234),Private Basic EmOC
331387,110462.0,47,2167.98,21.88,110462.0,15.0,116.0,9.803167,1.267651,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",...,True,,,,,,24hrs & weekends,BMOC,POINT (36.8135 -1.31681),Private Basic EmOC


In [91]:
distances_duration_matrix = distances_duration_matrix.rename(columns={
    "geometry_x": "geometry",
    "Long": "longitude_hcf",
    "Lat": "latitude_hcf",
    "Facility_n": "facility_name",
    "pop": "population",
    "Local_Validation": "local_validation",
    "geometry_y": "geometry_hcf",
    "fid": "hcf_id"
})
columns_to_keep = ["grid_id", "hcf_id",  "longitude_hcf", "latitude_hcf", "facility_name", "local_validation", 
                   "population", "duration_seconds", "distance_km", "geometry"]
pop_centroids_closest_hcf = distances_duration_matrix[columns_to_keep]

In [87]:
pop_centroids_closest_hcf

Unnamed: 0,grid_id,hcf_id,longitude_hcf,latitude_hcf,facility_name,local_validation,population,duration_seconds,distance_km,geometry
0,0.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1756.83,28.36,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613..."
1,0.0,11,36.898990,-1.273848,Mama Lucy Kibaki Hospital,Public Comprehensive EmOC,0.000000,1925.29,27.34,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613..."
2,0.0,14,36.802200,-1.307900,Mbagathi County Referral Hospital,Public Comprehensive EmOC,0.000000,2151.95,33.91,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613..."
3,1.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1757.05,28.36,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613..."
4,1.0,11,36.898990,-1.273848,Mama Lucy Kibaki Hospital,Public Comprehensive EmOC,0.000000,1925.52,27.34,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613..."
...,...,...,...,...,...,...,...,...,...,...
331384,110461.0,47,36.813500,-1.316810,Amref Medical Centre (Wilson Airport),Private Basic EmOC,0.000000,2189.94,21.97,"POLYGON ((36.787 -1.4154, 36.78699 -1.41459, 3..."
331385,110461.0,103,36.785995,-1.296547,Marie Stopes Kenya-Kilimani Premier Clinic,Private Basic EmOC,0.000000,2254.74,21.56,"POLYGON ((36.787 -1.4154, 36.78699 -1.41459, 3..."
331386,110462.0,38,36.779293,-1.312338,CFK Tabitha Maternity Home,Private Basic EmOC,1.267651,2149.30,19.90,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,..."
331387,110462.0,47,36.813500,-1.316810,Amref Medical Centre (Wilson Airport),Private Basic EmOC,1.267651,2167.98,21.88,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,..."


In [92]:
# Review and remove
origin_dest = pop_centroids_closest_hcf

## Enhanced Two-Step Floating Catchment Area (E2SFCA) method

In [93]:
# Function
d = 10 * 60 # try max duration 10mins/30mins car, under estimation of travel time and traffic condition realted to the selected data sourse 
W = 0.5
beta = - d ** 2 / log(W)
print(beta)

519370.2147200268


In [94]:
print(origin_dest.head())

   grid_id  hcf_id  longitude_hcf  latitude_hcf  \
0      0.0      28       36.84552     -1.280280   
1      0.0      11       36.89899     -1.273848   
2      0.0      14       36.80220     -1.307900   
3      1.0      28       36.84552     -1.280280   
4      1.0      11       36.89899     -1.273848   

                       facility_name           local_validation  population  \
0         Pumwani Maternity Hospital  Public Comprehensive EmOC         0.0   
1          Mama Lucy Kibaki Hospital  Public Comprehensive EmOC         0.0   
2  Mbagathi County Referral Hospital  Public Comprehensive EmOC         0.0   
3         Pumwani Maternity Hospital  Public Comprehensive EmOC         0.0   
4          Mama Lucy Kibaki Hospital  Public Comprehensive EmOC         0.0   

   duration_seconds  distance_km  \
0           1756.83        28.36   
1           1925.29        27.34   
2           2151.95        33.91   
3           1757.05        28.36   
4           1925.52        27.34   

 

In [None]:
# Convert 'duration' to numeric, coercing errors to NaN
origin_dest = origin_dest.copy()
origin_dest['duration_seconds'] = pd.to_numeric(origin_dest['duration_seconds'], errors='coerce')

In [None]:
# Drop rows with NaN values in 'duration' column
origin_dest = origin_dest.dropna(subset=['duration_seconds'])
origin_dest['grid_id'] = pd.to_numeric(origin_dest['grid_id'], errors='coerce')
origin_dest_acc = origin_dest  # Backup

In [96]:
# Apply Gaussian decay function to calculate the weight of each grid to healthcare 
# facilities based on the travel duration. d is the travel time and beta is the decay 
# parameter previously calculated.
# The weight decreases as the duration increases, meaning facilities that are further away have less impact.
origin_dest_acc['Weight'] = origin_dest_acc['duration_seconds'].apply(lambda d: round(math.exp(-d**2/beta), 8))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  origin_dest_acc['Weight'] = origin_dest_acc['duration_seconds'].apply(lambda d: round(math.exp(-d**2/beta), 8))


In [97]:
# Compute the Weighted Population (Pop_W), the population of each grid cell is multiplied 
# by the corresponding weight to calculate the weighted population.
origin_dest_acc['Pop_W'] = origin_dest_acc['population'] * origin_dest_acc['Weight']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  origin_dest_acc['Pop_W'] = origin_dest_acc['population'] * origin_dest_acc['Weight']


In [98]:
origin_dest_acc

Unnamed: 0,grid_id,hcf_id,longitude_hcf,latitude_hcf,facility_name,local_validation,population,duration_seconds,distance_km,geometry,Weight,Pop_W
0,0.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1756.83,28.36,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",0.002625,0.000000
1,0.0,11,36.898990,-1.273848,Mama Lucy Kibaki Hospital,Public Comprehensive EmOC,0.000000,1925.29,27.34,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",0.000795,0.000000
2,0.0,14,36.802200,-1.307900,Mbagathi County Referral Hospital,Public Comprehensive EmOC,0.000000,2151.95,33.91,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",0.000134,0.000000
3,1.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1757.05,28.36,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613...",0.002621,0.000000
4,1.0,11,36.898990,-1.273848,Mama Lucy Kibaki Hospital,Public Comprehensive EmOC,0.000000,1925.52,27.34,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613...",0.000794,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...
331384,110461.0,47,36.813500,-1.316810,Amref Medical Centre (Wilson Airport),Private Basic EmOC,0.000000,2189.94,21.97,"POLYGON ((36.787 -1.4154, 36.78699 -1.41459, 3...",0.000098,0.000000
331385,110461.0,103,36.785995,-1.296547,Marie Stopes Kenya-Kilimani Premier Clinic,Private Basic EmOC,0.000000,2254.74,21.56,"POLYGON ((36.787 -1.4154, 36.78699 -1.41459, 3...",0.000056,0.000000
331386,110462.0,38,36.779293,-1.312338,CFK Tabitha Maternity Home,Private Basic EmOC,1.267651,2149.30,19.90,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",0.000137,0.000174
331387,110462.0,47,36.813500,-1.316810,Amref Medical Centre (Wilson Airport),Private Basic EmOC,1.267651,2167.98,21.88,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",0.000117,0.000149


In [99]:
# Sum the Weighted Population
origin_dest_sum = origin_dest_acc.groupby(by='hcf_id')['Pop_W'].sum().reset_index()

In [100]:
origin_dest_sum

Unnamed: 0,hcf_id,Pop_W
0,1,397197.585039
1,2,184086.822093
2,3,25551.013508
3,4,65381.088394
4,5,172161.883581
...,...,...
148,150,116847.832777
149,151,78333.911869
150,152,23463.020524
151,153,27716.077094


In [101]:
# Merge the Sum of Weighted Population Back into the Original Data
origin_dest_acc = origin_dest_acc.merge(origin_dest_sum, on='hcf_id')

In [102]:
origin_dest_acc

Unnamed: 0,grid_id,hcf_id,longitude_hcf,latitude_hcf,facility_name,local_validation,population,duration_seconds,distance_km,geometry,Weight,Pop_W_x,Pop_W_y
0,0.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1756.83,28.36,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",0.002625,0.000000,409797.554863
1,0.0,11,36.898990,-1.273848,Mama Lucy Kibaki Hospital,Public Comprehensive EmOC,0.000000,1925.29,27.34,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",0.000795,0.000000,388074.310705
2,0.0,14,36.802200,-1.307900,Mbagathi County Referral Hospital,Public Comprehensive EmOC,0.000000,2151.95,33.91,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",0.000134,0.000000,193653.908873
3,1.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1757.05,28.36,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613...",0.002621,0.000000,409797.554863
4,1.0,11,36.898990,-1.273848,Mama Lucy Kibaki Hospital,Public Comprehensive EmOC,0.000000,1925.52,27.34,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613...",0.000794,0.000000,388074.310705
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1324791,110461.0,47,36.813500,-1.316810,Amref Medical Centre (Wilson Airport),Private Basic EmOC,0.000000,2189.94,21.97,"POLYGON ((36.787 -1.4154, 36.78699 -1.41459, 3...",0.000098,0.000000,48066.357072
1324792,110461.0,103,36.785995,-1.296547,Marie Stopes Kenya-Kilimani Premier Clinic,Private Basic EmOC,0.000000,2254.74,21.56,"POLYGON ((36.787 -1.4154, 36.78699 -1.41459, 3...",0.000056,0.000000,46508.529351
1324793,110462.0,38,36.779293,-1.312338,CFK Tabitha Maternity Home,Private Basic EmOC,1.267651,2149.30,19.90,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",0.000137,0.000174,43405.489442
1324794,110462.0,47,36.813500,-1.316810,Amref Medical Centre (Wilson Airport),Private Basic EmOC,1.267651,2167.98,21.88,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",0.000117,0.000149,48066.357072


In [None]:

# In the future, we will link supply with ownership and EmOC service level
origin_dest_acc = origin_dest_acc.rename(columns={'Pop_W_y': 'Pop_W_S'})  # Pop_W_S: Population Weight Sum

In [None]:
# Suppy is based on the categories of ownership and service level of the healthcare facilities
supply_map = {
    'Public Comprehensive EmOC': 1,
    'Private Comprehensive EmOC': 0.7,
    'Public Basic EmOC': 0.5,
    'Private Basic EmOC': 0.35
}

In [112]:
# Compute the Supply-Demand Ratio (Rj)
origin_dest_acc['supply'] = origin_dest_acc['local_validation'].map(supply_map)
origin_dest_acc['supply_demand_ratio'] = 1 / origin_dest_acc.Pop_W_S
origin_dest_acc['supply_demand_ratio'].replace([np.inf, np.nan], 0, inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  origin_dest_acc['supply_demand_ratio'].replace([np.inf, np.nan], 0, inplace=True)


In [113]:
# Calculate Rj * Weight for Each Grid Cell
origin_dest_acc['supply_W'] = origin_dest_acc['supply_demand_ratio'] * origin_dest_acc.Weight

In [114]:
# Compute Accessibility Index (Ai) for Each Grid Cell
origin_dest_acc['Accessibility'] = origin_dest_acc.groupby('grid_id')['supply_W'].transform('sum')

In [None]:
# Normalize

scaler = MinMaxScaler()
origin_dest_acc['Accessibility_standard'] = scaler.fit_transform(origin_dest_acc[['Accessibility']])

In [120]:
origin_dest_acc

Unnamed: 0,grid_id,hcf_id,longitude_hcf,latitude_hcf,facility_name,local_validation,population,duration_seconds,distance_km,geometry,Weight,Pop_W_x,Pop_W_S,supply_demand_ratio,supply_W,Accessibility,supply,Accessibility_standard
0,0.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1756.83,28.36,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",0.002625,0.000000,409797.554863,0.000002,6.405553e-09,3.139204e-06,1.00,0.001373
1,0.0,11,36.898990,-1.273848,Mama Lucy Kibaki Hospital,Public Comprehensive EmOC,0.000000,1925.29,27.34,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",0.000795,0.000000,388074.310705,0.000003,2.048937e-09,3.139204e-06,1.00,0.001373
2,0.0,14,36.802200,-1.307900,Mbagathi County Referral Hospital,Public Comprehensive EmOC,0.000000,2151.95,33.91,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",0.000134,0.000000,193653.908873,0.000005,6.928856e-10,3.139204e-06,1.00,0.001373
3,1.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1757.05,28.36,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613...",0.002621,0.000000,409797.554863,0.000002,6.396036e-09,3.135401e-06,1.00,0.001372
4,1.0,11,36.898990,-1.273848,Mama Lucy Kibaki Hospital,Public Comprehensive EmOC,0.000000,1925.52,27.34,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613...",0.000794,0.000000,388074.310705,0.000003,2.045433e-09,3.135401e-06,1.00,0.001372
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1324791,110461.0,47,36.813500,-1.316810,Amref Medical Centre (Wilson Airport),Private Basic EmOC,0.000000,2189.94,21.97,"POLYGON ((36.787 -1.4154, 36.78699 -1.41459, 3...",0.000098,0.000000,48066.357072,0.000021,2.031983e-09,4.996821e-08,0.35,0.000022
1324792,110461.0,103,36.785995,-1.296547,Marie Stopes Kenya-Kilimani Premier Clinic,Private Basic EmOC,0.000000,2254.74,21.56,"POLYGON ((36.787 -1.4154, 36.78699 -1.41459, 3...",0.000056,0.000000,46508.529351,0.000022,1.206015e-09,4.996821e-08,0.35,0.000022
1324793,110462.0,38,36.779293,-1.312338,CFK Tabitha Maternity Home,Private Basic EmOC,1.267651,2149.30,19.90,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",0.000137,0.000174,43405.489442,0.000023,3.159739e-09,5.893964e-08,0.35,0.000026
1324794,110462.0,47,36.813500,-1.316810,Amref Medical Centre (Wilson Airport),Private Basic EmOC,1.267651,2167.98,21.88,"POLYGON ((36.78799 -1.4154, 36.78799 -1.41459,...",0.000117,0.000149,48066.357072,0.000021,2.443081e-09,5.893964e-08,0.35,0.000026


In [121]:
max(origin_dest_acc.Accessibility_standard)

0.9999999999999999

In [124]:

origin_dest_acc_gdf = gpd.GeoDataFrame(origin_dest_acc, geometry='geometry', crs="EPSG:4326")
gpkg_path = data_temp + 'acc_score_3closest.gpkg'
origin_dest_acc_gdf.to_file(gpkg_path, layer="acc_score_3closest", driver="GPKG")

# 4. Grouping by grid ID to prepare the final output file

In [252]:
# Read the GeoPackage file (if starting from this section)
results_grid = gpd.read_file(data_temp + 'acc_score_3closest.gpkg')


In [253]:
# Group by multiple columns and calculate the mean for numeric columns
# results_grid = results_grid.groupby(['grid_id', 'origin_lon', 'origin_lat', 'origin_lon_min', 'origin_lat_min', 'origin_lon_max', 'origin_lat_max', 'Accessibility_standard']).count().reset_index()

results_grid = results_grid.drop_duplicates(['grid_id', 'Accessibility_standard', 'geometry'])
type(results_grid)

geopandas.geodataframe.GeoDataFrame

In [254]:
results_grid

Unnamed: 0,grid_id,hcf_id,longitude_hcf,latitude_hcf,facility_name,local_validation,population,duration_seconds,distance_km,Weight,Pop_W_x,Pop_W_S,supply_demand_ratio,supply_W,Accessibility,supply,Accessibility_standard,geometry
0,0.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1756.83,28.36,0.002625,0.000000,409797.554863,0.000002,6.405553e-09,0.000003,1.00,0.001373,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613..."
3,1.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1757.05,28.36,0.002621,0.000000,409797.554863,0.000002,6.396036e-09,0.000003,1.00,0.001372,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613..."
6,2.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1598.22,28.11,0.007313,0.000000,409797.554863,0.000002,1.784569e-08,0.000007,1.00,0.003060,"POLYGON ((36.94105 -1.11694, 36.94105 -1.11613..."
9,3.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1606.37,28.18,0.006954,0.000000,409797.554863,0.000002,1.697048e-08,0.000007,1.00,0.002943,"POLYGON ((36.94205 -1.11694, 36.94204 -1.11613..."
12,4.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.569181,1789.88,28.55,0.002095,0.001192,409797.554863,0.000002,5.111402e-09,0.000003,1.00,0.001148,"POLYGON ((36.94305 -1.11694, 36.94304 -1.11613..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1238191,81593.0,99,36.905864,-1.309319,Summit Orthopaedic and Surgical Hospital,Private Basic EmOC,12.591538,132.76,0.56,0.966634,12.171404,874.715736,0.001143,1.105083e-03,0.002210,0.35,0.966892,"POLYGON ((36.90878 -1.31348, 36.90877 -1.31267..."
1239165,81918.0,99,36.905864,-1.309319,Summit Orthopaedic and Surgical Hospital,Private Basic EmOC,18.187777,174.57,0.77,0.943012,17.151293,874.715736,0.001143,1.078078e-03,0.002156,0.35,0.943264,"POLYGON ((36.90779 -1.31429, 36.90778 -1.31348..."
1239167,81919.0,99,36.905864,-1.309319,Summit Orthopaedic and Surgical Hospital,Private Basic EmOC,15.389658,160.52,0.70,0.951599,14.644786,874.715736,0.001143,1.087895e-03,0.002176,0.35,0.951854,"POLYGON ((36.90879 -1.31429, 36.90878 -1.31348..."
1239169,81920.0,99,36.905864,-1.309319,Summit Orthopaedic and Surgical Hospital,Private Basic EmOC,16.788718,168.34,0.71,0.946899,15.897219,874.715736,0.001143,1.082522e-03,0.002165,0.35,0.947152,"POLYGON ((36.90979 -1.31429, 36.90978 -1.31348..."


### Setting values for Low medium and High categories

We started by defining equal value division, and modified the thesholds to a value that is more legible and easier to interpret. Every model should have their own thresholds based on the data distribution of the three categories. 

Note: For Nairobi, we excluded grid cells with index values equal to or below 0 that indicated very low population and a small number of buildings.  

In [255]:
results_grid['result'] = -1
results_grid.loc[results_grid['Accessibility_standard'] > 0, 'result'] = 2
results_grid.loc[results_grid['Accessibility_standard'] > pow(10, -1.3605), 'result'] = 1
results_grid.loc[results_grid['Accessibility_standard'] > pow(10, -1.1293), 'result'] = 0

In [256]:
category_counts = results_grid['result'].value_counts()
print(category_counts)

result
 2    80881
 1    16114
 0    13431
-1       37
Name: count, dtype: int64


### Setting values for focus areas

We defined the focus areas based on values for the different thresholds. We aim at participants helping us to confirm the selection of the city-specific thresholds.

In [280]:
category_counts = results_grid['focused'].value_counts()
print(category_counts)

focused
0    94624
1    15802
Name: count, dtype: int64


In [279]:
results_grid['focused'] = 0
# Focus areas between the High and medium categories
results_grid.loc[(results_grid['Accessibility_standard'] < pow(10, -1.26)) & (results_grid['Accessibility_standard'] > pow(10, -1.46)), 'focused'] = 1

results_grid

Unnamed: 0,grid_id,hcf_id,longitude_hcf,latitude_hcf,facility_name,local_validation,population,duration_seconds,distance_km,Weight,...,Accessibility,supply,Accessibility_standard,geometry,result,focused,lon_min,lat_min,lon_max,lat_max
0,0.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1756.83,28.36,0.002625,...,0.000003,1.00,0.001373,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",2,0,36.938052,-1.116936,36.939056,-1.116127
3,1.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1757.05,28.36,0.002621,...,0.000003,1.00,0.001372,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613...",2,0,36.939050,-1.116936,36.940054,-1.116127
6,2.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1598.22,28.11,0.007313,...,0.000007,1.00,0.003060,"POLYGON ((36.94105 -1.11694, 36.94105 -1.11613...",2,0,36.940048,-1.116936,36.941052,-1.116127
9,3.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1606.37,28.18,0.006954,...,0.000007,1.00,0.002943,"POLYGON ((36.94205 -1.11694, 36.94204 -1.11613...",2,0,36.941046,-1.116936,36.942050,-1.116127
12,4.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.569181,1789.88,28.55,0.002095,...,0.000003,1.00,0.001148,"POLYGON ((36.94305 -1.11694, 36.94304 -1.11613...",2,0,36.942044,-1.116936,36.943048,-1.116127
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1238191,81593.0,99,36.905864,-1.309319,Summit Orthopaedic and Surgical Hospital,Private Basic EmOC,12.591538,132.76,0.56,0.966634,...,0.002210,0.35,0.966892,"POLYGON ((36.90878 -1.31348, 36.90877 -1.31267...",0,0,36.907775,-1.313483,36.908780,-1.312674
1239165,81918.0,99,36.905864,-1.309319,Summit Orthopaedic and Surgical Hospital,Private Basic EmOC,18.187777,174.57,0.77,0.943012,...,0.002156,0.35,0.943264,"POLYGON ((36.90779 -1.31429, 36.90778 -1.31348...",0,0,36.906784,-1.314292,36.907790,-1.313483
1239167,81919.0,99,36.905864,-1.309319,Summit Orthopaedic and Surgical Hospital,Private Basic EmOC,15.389658,160.52,0.70,0.951599,...,0.002176,0.35,0.951854,"POLYGON ((36.90879 -1.31429, 36.90878 -1.31348...",0,0,36.907782,-1.314292,36.908788,-1.313483
1239169,81920.0,99,36.905864,-1.309319,Summit Orthopaedic and Surgical Hospital,Private Basic EmOC,16.788718,168.34,0.71,0.946899,...,0.002165,0.35,0.947152,"POLYGON ((36.90979 -1.31429, 36.90978 -1.31348...",0,0,36.908780,-1.314292,36.909785,-1.313483


In [263]:
results_grid = results_grid.loc[results_grid['result'] != -1]


In [None]:
# Extract bounding box coordinates and create columns for min and max coordinates
results_grid['origin_lon_min'] = results_grid.geometry.apply(lambda geom: geom.bounds[0])
results_grid['origin_lat_min'] = results_grid.geometry.apply(lambda geom: geom.bounds[1])
results_grid['origin_lon_max'] = results_grid.geometry.apply(lambda geom: geom.bounds[2])
results_grid['origin_lat_max'] = results_grid.geometry.apply(lambda geom: geom.bounds[3])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = 

In [283]:
# Calculate centroid coordinates for each geometry and add as columns
results_grid['longitude'] = results_grid.geometry.centroid.x
results_grid['latitude'] = results_grid.geometry.centroid.y


  results_grid['longitude'] = results_grid.geometry.centroid.x

  results_grid['latitude'] = results_grid.geometry.centroid.y


In [260]:
results_grid = results_grid.rename(columns={
    'origin_lon': 'longitude',
    'origin_lat': 'latitude',
    'origin_lon_min': 'lon_min',
    'origin_lat_min': 'lat_min',
    'origin_lon_max': 'lon_max',
    'origin_lat_max': 'lat_max'
})

results_grid

Unnamed: 0,grid_id,hcf_id,longitude_hcf,latitude_hcf,facility_name,local_validation,population,duration_seconds,distance_km,Weight,...,Accessibility,supply,Accessibility_standard,geometry,result,focused,lon_min,lat_min,lon_max,lat_max
0,0.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1756.83,28.36,0.002625,...,0.000003,1.00,0.001373,"POLYGON ((36.93906 -1.11694, 36.93905 -1.11613...",2,0,36.938052,-1.116936,36.939056,-1.116127
3,1.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1757.05,28.36,0.002621,...,0.000003,1.00,0.001372,"POLYGON ((36.94005 -1.11694, 36.94005 -1.11613...",2,0,36.939050,-1.116936,36.940054,-1.116127
6,2.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1598.22,28.11,0.007313,...,0.000007,1.00,0.003060,"POLYGON ((36.94105 -1.11694, 36.94105 -1.11613...",2,0,36.940048,-1.116936,36.941052,-1.116127
9,3.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.000000,1606.37,28.18,0.006954,...,0.000007,1.00,0.002943,"POLYGON ((36.94205 -1.11694, 36.94204 -1.11613...",2,0,36.941046,-1.116936,36.942050,-1.116127
12,4.0,28,36.845520,-1.280280,Pumwani Maternity Hospital,Public Comprehensive EmOC,0.569181,1789.88,28.55,0.002095,...,0.000003,1.00,0.001148,"POLYGON ((36.94305 -1.11694, 36.94304 -1.11613...",2,0,36.942044,-1.116936,36.943048,-1.116127
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1238191,81593.0,99,36.905864,-1.309319,Summit Orthopaedic and Surgical Hospital,Private Basic EmOC,12.591538,132.76,0.56,0.966634,...,0.002210,0.35,0.966892,"POLYGON ((36.90878 -1.31348, 36.90877 -1.31267...",0,0,36.907775,-1.313483,36.908780,-1.312674
1239165,81918.0,99,36.905864,-1.309319,Summit Orthopaedic and Surgical Hospital,Private Basic EmOC,18.187777,174.57,0.77,0.943012,...,0.002156,0.35,0.943264,"POLYGON ((36.90779 -1.31429, 36.90778 -1.31348...",0,0,36.906784,-1.314292,36.907790,-1.313483
1239167,81919.0,99,36.905864,-1.309319,Summit Orthopaedic and Surgical Hospital,Private Basic EmOC,15.389658,160.52,0.70,0.951599,...,0.002176,0.35,0.951854,"POLYGON ((36.90879 -1.31429, 36.90878 -1.31348...",0,0,36.907782,-1.314292,36.908788,-1.313483
1239169,81920.0,99,36.905864,-1.309319,Summit Orthopaedic and Surgical Hospital,Private Basic EmOC,16.788718,168.34,0.71,0.946899,...,0.002165,0.35,0.947152,"POLYGON ((36.90979 -1.31429, 36.90978 -1.31348...",0,0,36.908780,-1.314292,36.909785,-1.313483


In [281]:
# Save the results to a new GeoPackage file
output_gpkg_path = data_temp + 'emergency-maternal-care-deprivation-access.gpkg'
results_grid.to_file(output_gpkg_path, layer='emergency-maternal-care-deprivation-access', driver='GPKG')



In [284]:
# Save the results to a CSV file in the format required by the IDEAMAPS data ecosystem
columns_to_keep = ["longitude", "latitude",  "lon_min", "lat_min", "lon_max", "lat_max", 
                   "focused", "result",]
results_table = results_grid[columns_to_keep]

results_table.to_csv(data_outputs + 'model-output.csv', index=False)