# Methodology

This notebook shows the methodology for the paper "Measuring OpenStreetMap building footprint completeness using human settlement layers".

## Setup

We import all of the relevant packages as well as download the datasets.

For reference, here are the original download links for the datasets:
1. High Resolution Settlement Layer (HRSL) ([Philippines](https://data.humdata.org/dataset/philippines-high-resolution-population-density-maps-demographic-estimates)) ([Madagascar](https://data.humdata.org/dataset/highresolutionpopulationdensitymaps-mdg))
2. Administrative Boundaries ([Philippines](https://data.humdata.org/dataset/philippines-administrative-levels-0-to-3)) ([Madagascar](https://data.humdata.org/dataset/madagascar-administrative-level-0-4-boundaries))
3. OpenStreetMap (OSM) ([Philippines](https://download.geofabrik.de/asia/philippines.html)) ([Madagascar](https://download.geofabrik.de/africa/madagascar.html))

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import shapely
import geopandas as gpd
import rasterio
import rasterio.features

import wget

import os
import glob
from zipfile import ZipFile

In [None]:
try:
    os.mkdir("../download_data")
except Exception:
    pass

### HRSL download

Uncomment the cells below if you have not yet downloaded the HRSL datasets.

In [None]:
# hrsl_mdg_url = "https://data.humdata.org/dataset/9e7ff424-7b9c-42cc-b869-5756fcad0956/resource/1fafdd04-8e0b-4c2a-b4dc-8f3ff39e3015/download/population_mdg_2018-10-01.zip"
# hrsl_phl_men_url = "https://data.humdata.org/dataset/6d9f35c0-4764-49ee-b364-329db0b7a47d/resource/5a13bb60-4506-42a5-a08a-7ccf20413179/download/phl_men_2019-06-01_geotiff.zip"
# hrsl_phl_women_url = "https://data.humdata.org/dataset/6d9f35c0-4764-49ee-b364-329db0b7a47d/resource/4aff438c-43d9-47d0-853f-5a6b6ae28223/download/phl_women_2019-06-01_geotiff.zip"

In [None]:
# wget.download(hrsl_mdg_url, '../download_data/mdg_hrsl_oct_2018.zip')

In [None]:
# wget.download(hrsl_phl_men_url, '../download_data/phl_hrsl_men_jun_2019.zip')

In [None]:
# wget.download(hrsl_phl_women_url, '../download_data/phl_hrsl_women_jun_2019.zip')

### Admin boundary download

Uncomment the cells below if you have not yet downloaded the admin boundary datasets.

In [None]:
# adm_mdg_url = "https://data.humdata.org/dataset/26fa506b-0727-4d9d-a590-d2abee21ee22/resource/ed94d52e-349e-41be-80cb-62dc0435bd34/download/mdg_adm_bngrc_ocha_20181031_shp.zip"
# adm_phl_url = "https://data.humdata.org/dataset/caf116df-f984-4deb-85ca-41b349d3f313/resource/0340dc57-7563-4d3c-9d15-09e3c7f6dfdc/download/phl_admbnda_adm1_psa_namria_20180130.zip"

In [None]:
# wget.download(adm_mdg_url, '../download_data/mdg_adm_all.zip')

In [None]:
# wget.download(adm_phl_url, '../download_data/phl_adm_level1.zip')

### OSM download

We note that the OSM download links used here are different from the OSM download links listed above. This is due to the methodology using OSM datasets from a previous date and the OSM download links listed above only being available for the current date.

Uncomment the cells below if you have not yet downloaded the OSM datasets.

In [None]:
# osm_mdg_url = "https://storage.googleapis.com/osm-completeness-thinkingmachines/mdg_osm_jan_2020_buildings.gpkg.zip"
# osm_phl_url = "https://storage.googleapis.com/osm-completeness-thinkingmachines/phl_osm_jan_2020_buildings.gpkg.zip"

In [None]:
# wget.download(osm_mdg_url, '../download_data/mdg_osm_jan_2020_buildings.gpkg.zip')

In [None]:
# wget.download(osm_phl_url, '../download_data/phl_osm_jan_2020_buildings.gpkg.zip')

## Methodology

### Unzip all datasets

In [None]:
for i in glob.glob("../download_data/*.zip"):
    if os.path.isdir(os.path.splitext(i)[0]):
        pass
    else:
        with ZipFile(i) as myzip:
            myzip.extractall(os.path.splitext(i)[0])

### Madagascar

#### Load HRSL dataset

In [None]:
# hrsl_mdg = rasterio.open(
#     "../download_data/mdg_hrsl_oct_2018/population_mdg_2018-10-01.tif"
# )

In [None]:
# hrsl_mdg_crs = hrsl_mdg.crs

In [None]:
# hrsl_mdg_band1_mask = hrsl_mdg.read_masks(1)

In [None]:
# hrsl_mdg_rand = np.random.rand(
#     np.shape(hrsl_mdg_band1_mask)[0], np.shape(hrsl_mdg_band1_mask)[1]
# )
# hrsl_mdg_rand = hrsl_mdg_rand.astype("float32")

In [None]:
# hrsl_mdg_band1_poly = list(
#     rasterio.features.shapes(
#         hrsl_mdg_rand, transform=hrsl_mdg.transform, mask=hrsl_mdg_band1_mask
#     )
# )

In [None]:
# hrsl_mdg_geom = []
# for geom, value in hrsl_mdg_band1_poly:
#     geom = shapely.geometry.shape(geom)
#     hrsl_mdg_geom.append(geom)

In [None]:
# hrsl_mdg_gdf = pd.DataFrame(hrsl_mdg_geom)
# hrsl_mdg_gdf = gpd.GeoDataFrame(hrsl_mdg_gdf, geometry=hrsl_mdg_gdf[0], crs="EPSG:4326")
# hrsl_mdg_gdf = hrsl_mdg_gdf[["geometry"]]
# hrsl_mdg_gdf.reset_index(level=0, inplace=True)

In [None]:
# hrsl_mdg_gdf.to_file('../data/hrsl_mdg.gpkg', driver='GPKG')

In [None]:
# hrsl_mdg_gdf.head()

Just run this if you already ran the commands above.

In [None]:
hrsl_mdg_gdf = gpd.read_file('../data/hrsl_mdg.gpkg', driver='GPKG')

#### Load OSM dataset

In [None]:
# osm_mdg = gpd.read_file(
#     "../download_data/mdg_osm_jan_2020_buildings.gpkg/mdg_osm_jan_2020_buildings.gpkg",
#     driver="GPKG",
# )

#### Spatial join and get mapped pixels

In [None]:
# mdg_pixels_with_buildings = gpd.sjoin(
#     hrsl_mdg_gdf, osm_mdg, how="inner", op="intersects"
# )

In [None]:
# mdg_pixels_with_buildings.drop_duplicates(subset='index')

In [None]:
# mdg_pixels_with_buildings = mdg_pixels_with_buildings[['index', 'geometry']]

In [None]:
# mdg_pixels_with_buildings.to_file('../data/mdg_pixels_with_buildings.gpkg', driver='GPKG')

Just run this if you already ran the commands above.

In [None]:
mdg_pixels_with_buildings = gpd.read_file('../data/mdg_pixels_with_buildings.gpkg', driver='GPKG')

#### Get unmapped pixels

In [None]:
# mdg_pixels_no_buildings = pd.merge(hrsl_mdg_gdf, mdg_pixels_with_buildings, how='outer', indicator=True)

In [None]:
# mdg_pixels_no_buildings = mdg_pixels_no_buildings[mdg_pixels_no_buildings['_merge'] == 'left_only']

In [None]:
# mdg_pixels_no_buildings = mdg_pixels_no_buildings[['index', 'geometry']]

In [None]:
# mdg_pixels_no_buildings.to_file('../data/mdg_pixels_no_buildings.gpkg', driver='GPKG')

Just run this if you already ran the commands above.

In [None]:
mdg_pixels_no_buildings = gpd.read_file('../data/mdg_pixels_no_buildings.gpkg', driver='GPKG')

#### Calculate percentage completeness

In [26]:
len(mdg_pixels_with_buildings) / (len(mdg_pixels_with_buildings) + len(mdg_pixels_no_buildings)) * 100

10.890343831145868

### Philippines

#### Load HRSL dataset

#### Load OSM dataset

In [None]:
osm_phl = gpd.read_file(
    "../download_data/phl_osm_jan_2020_buildings.gpkg/phl_osm_jan_2020_buildings.gpkg",
    driver="GPKG",
)

#### Spatial join

#### Calculate percentage completeness