# Automating Study Area Generation for Expansion Cities

## Model Summary:

This notebook provides a workflow to generate the study area dataset described in the [Study Area definition](../docs/study-areas/README.md).

## Workflow Summary:

The notebook provides a workflow for generating study areas of expansion cities by integrating population data with urban boundary definitions. Rather than relying on administrative boundaries, the target city is selected based on its Functional Urban Area (FUA) boundary, which better reflects actual patterns of human settlement and economic activity. A 1-kilometre buffer is applied around the FUA to avoid discontinuity in setting urban boundaries. The GHS-POP reference grid, which provides a framework enabling interoperability for datasets on deprivation, is clipped to the buffered boundaries of the selected FUA. The result is a harmonised geospatial layer for each expansion city, which can be further analysed for demographic and urban expansion studies.

* **Preprocessing**: Select the target city based on FUA boundaries, which better reflect actual patterns of human settlement and economic activity than administrative boundaries, and load the GHS-POP reference grid.

* **Study Area Definition**: Apply the FUA boundary with a 1-kilometre buffer to ensure spatial continuity.

* **Spatial Clipping**: Clip the population reference grid to the buffered FUA boundary.

* **Outputs**: Export the reference grid of the selected city as GeoPackage and CSV.


### Datasets:
* [Administrative Area](https://gadm.org/) - an initial input to define the study areas and to provide a framework for the analysis.
* [Functional Urban Area (FUA)](https://human-settlement.emergency.copernicus.eu/) - the actual urban sprawl and human activities.
* [GHS-POP grid](https://data.jrc.ec.europa.eu/dataset/2ff68a52-5b5b-4a22-8f40-c41da8332cfe) - Global Human Settlement Grid Multitemporal (1975-2030) grid.

In [2]:
# Import Libraries
import geopandas as gpd
import rasterio
from rasterio.mask import mask
from rasterio.features import shapes
from shapely.geometry import mapping, shape, box
import os
import numpy as np

In [3]:
# Set paths to access data
# Define directories
data_inputs = '../scripts/data-inputs/'

### Functional Urban Area (FUA)
The FUA was obtained from the Global Human Settlement Layer (GHSL) dataset, which provides spatial data for functional urban areas worldwide. 

In [4]:
# Load Functional Urban Areas (FUA) data
GHS_FUA = gpd.read_file(data_inputs + 'GHS_FUA_2015_GLOBE.gpkg')

In [5]:
# Select Expansion City
CITY = "Kisumu" # Replace CITY with the name of the city
COUNTRY = "Kenya"

In [6]:
# Filter FUA for target city
fua_city = GHS_FUA[GHS_FUA["eFUA_name"] == CITY]
fua_city

Unnamed: 0,eFUA_ID,UC_num,UC_IDs,eFUA_name,Commuting,Cntry_ISO,Cntry_name,FUA_area,UC_area,FUA_p_2015,UC_p_2015,Com_p_2015,geometry
5934,974.0,1.0,4608,Kisumu,1.0,KEN,Kenya,134.0,43.0,369934.301125,308524.08395,61410.217175,"MULTIPOLYGON (((3483000 -2000, 3484000 -2000, ..."


In [7]:
# Create a buffered of the FUA
fua_city_buffered = fua_city.copy()
fua_city_buffered["geometry"] = fua_city_buffered.geometry.buffer(1000, join_style=2) # Buffer by 1000 meters
fua_union = fua_city_buffered.geometry.union_all()

### GHS-POP Reference Grid

In [8]:
# Load GHS population raster
GHS_POP = data_inputs + 'GHS_POP_2025_GLOBE.tif'

In [9]:
with rasterio.open(GHS_POP) as pop_raster:

    # Ensure the FUA geometry is in the same CRS as the population raster
    fua_union = gpd.GeoSeries([fua_union], crs=fua_city.crs).to_crs(pop_raster.crs).iloc[0] 
    
    # Mask the population raster with the buffered FUA geometry
    out_image, out_transform = mask(pop_raster, [mapping(fua_union)], crop=True)

    nodata = pop_raster.nodata # keep the nodata value from the original raster
    height, width = out_image.shape[1], out_image.shape[2]
    crs = pop_raster.crs

In [10]:
pixels = []

# Iterate over each column in the raster for the current row
for row in range(height):
    for col in range(width):
        val = out_image[0, row, col] # Extract the pixel value
        if val == nodata:
            continue
        
        # Compute the spatial coordinates of the pixel corners
        minx, maxy = rasterio.transform.xy(out_transform, row, col, offset='ul')
        maxx, miny = rasterio.transform.xy(out_transform, row, col, offset='lr')
        pixel_geom = box(minx, miny, maxx, maxy)

        # Only keep pixels that intersect with the FUA polygon
        if not pixel_geom.intersects(fua_union):
            continue
        
        pixels.append({
            'geometry': pixel_geom, 
            'longitude': pixel_geom.centroid.x,  # longitude of pixel centroid
            'latitude': pixel_geom.centroid.y,
            'lon_min': minx,
            'lon_max': maxx,
            'lat_min': miny,
            'lat_max': maxy
        })

# Convert the list of pixels to a GeoDataFrame
study_area_gdf = gpd.GeoDataFrame(pixels, crs=crs)
study_area_gdf


Unnamed: 0,geometry,longitude,latitude,lon_min,lon_max,lat_min,lat_max
0,"POLYGON ((34.73375 -0.00042, 34.73375 0.00792,...",34.729583,0.003750,34.725417,34.733750,-0.000416,0.007917
1,"POLYGON ((34.74208 -0.00042, 34.74208 0.00792,...",34.737917,0.003750,34.733750,34.742083,-0.000416,0.007917
2,"POLYGON ((34.75042 -0.00042, 34.75042 0.00792,...",34.746250,0.003750,34.742083,34.750417,-0.000416,0.007917
3,"POLYGON ((34.75875 -0.00042, 34.75875 0.00792,...",34.754583,0.003750,34.750417,34.758750,-0.000416,0.007917
4,"POLYGON ((34.76708 -0.00042, 34.76708 0.00792,...",34.762917,0.003750,34.758750,34.767083,-0.000416,0.007917
...,...,...,...,...,...,...,...
263,"POLYGON ((34.73375 -0.15875, 34.73375 -0.15042...",34.729583,-0.154583,34.725417,34.733750,-0.158750,-0.150416
264,"POLYGON ((34.74208 -0.15875, 34.74208 -0.15042...",34.737917,-0.154583,34.733750,34.742083,-0.158750,-0.150416
265,"POLYGON ((34.75042 -0.15875, 34.75042 -0.15042...",34.746250,-0.154583,34.742083,34.750417,-0.158750,-0.150416
266,"POLYGON ((34.75875 -0.15875, 34.75875 -0.15042...",34.754583,-0.154583,34.750417,34.758750,-0.158750,-0.150416


In [10]:
# Convert CRS to EPSG:4326
study_area = study_area_gdf.to_crs(epsg=4326)

# Save the study area as a GeoPackage
study_area.to_file(f'../grid-boundary-{CITY.lower()}.gpkg', driver="GPKG")

In [None]:
# Save the study area as a CSV file without geometry
study_area_gdf.drop(columns='geometry').to_csv(f'../grid-boundary-{CITY.lower()}.gpkg', index=False)