# Methodology

This notebook shows the methodology for the paper "Measuring OpenStreetMap building footprint completeness using human settlement layers".

The methodology has four main parts
1. Download relevant data
2. Get intersection of HRSL pixels and OSM buildings
3. Aggregate mapped/unmapped pixels to admin boundaries
4. Calculate percent completeness on admin boundaries

## Setup

We import all of the relevant packages as well as download the datasets.

For reference, here are the original download links for the datasets:
1. *High Resolution Settlement Layer* (HRSL) ([Philippines](https://data.humdata.org/dataset/philippines-high-resolution-population-density-maps-demographic-estimates)) ([Madagascar](https://data.humdata.org/dataset/highresolutionpopulationdensitymaps-mdg))
2. *Administrative Boundaries* ([Philippines](https://data.humdata.org/dataset/philippines-administrative-levels-0-to-3)) ([Madagascar](https://data.humdata.org/dataset/madagascar-administrative-level-0-4-boundaries))
- To search for other countries, you may find latest HRSL / Admin Boundaries datasets can be found in the Humanitarian Data Exchange [website](https://data.humdata.org/)
3. *OpenStreetMap (OSM)* ([Philippines](https://download.geofabrik.de/asia/philippines.html)) ([Madagascar](https://download.geofabrik.de/africa/madagascar.html))

The notebook will use Madagascar/Philippines data for the download links. You may change this for your own use case!

In [40]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import shapely
import geopandas as gpd
import rasterio
import rasterio.features

import wget

import os
import glob
from zipfile import ZipFile

from datetime import datetime

Create two folders in your working directory: `download_data` which will contains all raw downloaded data, and `data` which will contain all processed data.

In [3]:
try:
    os.mkdir("../download_data")
except Exception:
    pass

try:
    os.mkdir("../data")
except Exception:
    pass

### Filename set-up

In the following cell, define filename labels for the country, admin level, month, and year. This will be reflected in the output files.

In [48]:
# Change these to your desired filename labels
country_code = "mdg"
adm_code = "adm4"

# Time labels
# Add leading zero as needed
month = "july"
month_num = "07" #Two digits expected
year = "2021" 
day = "12" # Two digits expected

### HRSL download

For the HRSL layer, we get distribution of the whole population of Madagascar for 2018. Change this to the relevant HRSL layer for your target country.

Uncomment the cells below if you have not yet downloaded the HRSL datasets.

In [7]:
# Change this url for your own country
hrsl_url = "https://data.humdata.org/dataset/9e7ff424-7b9c-42cc-b869-5756fcad0956/resource/1fafdd04-8e0b-4c2a-b4dc-8f3ff39e3015/download/population_mdg_2018-10-01.zip"

In [16]:
# Download and save to ../download_data
wget.download(hrsl_url, '../download_data/hrsl_{}_{}_{}.zip'.format(country_code,month,year))

'../download_data/hrsl_mdg_july_2021.zip'

### Admin boundary download

Uncomment the cells below if you have not yet downloaded the admin boundary datasets.

In [2]:
# Change this url for your own country
adm_url = "https://data.humdata.org/dataset/26fa506b-0727-4d9d-a590-d2abee21ee22/resource/ed94d52e-349e-41be-80cb-62dc0435bd34/download/mdg_adm_bngrc_ocha_20181031_shp.zip"

In [11]:
# Download and save to ../download_data
wget.download(adm_url, '../download_data/{}_adm_all.zip'.format(country_code))

'../download_data/mdg_adm_all (1).zip'

### OSM download

Uncomment the cells below if you have not yet downloaded the OSM datasets.

In [12]:
# Change this url for your own country
osm_url = "https://download.geofabrik.de/africa/madagascar-latest-free.shp.zip"

In [14]:
# Download and save to ../download_data
wget.download(osm_url, '../download_data/{}-latest-free.shp.zip'.format(country_code))

'../download_data/mdg-latest-free.shp.zip'

### Unzip all datasets

In [18]:
for i in glob.glob("../download_data/*.zip"):
    if os.path.isdir(os.path.splitext(i)[0]):
        pass
    else:
        with ZipFile(i) as myzip:
            myzip.extractall(os.path.splitext(i)[0])

## Get intersection of HRSL pixels and OSM buildings

We use Facebook’s High Resolution Settlement Layer (HRSL) as a proxy ground truth for building footprints. We then measure data completeness by getting the “percentage completeness” of pixels which is computed using the total percentage of pixels within the intersection of the human settlement layer and the OSM building footprints.

![](../assets/formula.png)

Pixels that intersect OSM buildings are *mapped*.

Pixels that do not intersect OSM buildings are *unmapped*.

In [19]:
# Open tiff file
hrsl_layer = rasterio.open(
    "../download_data/hrsl_{}_{}_{}/population_mdg_2018-10-01.tif"\
    .format(country_code,month,year)
)

In [20]:
# Read first band of tiff file
hrsl_band1_mask = hrsl_layer.read_masks(1)

### Convert HRSL dataset from raster to vector

In [21]:
hrsl_rand = np.random.rand(
    np.shape(hrsl_band1_mask)[0], np.shape(hrsl_band1_mask)[1]
)
hrsl_rand = hrsl_rand.astype("float32")

In [22]:
hrsl_band1_poly = list(
    rasterio.features.shapes(
        hrsl_rand, transform=hrsl_layer.transform, mask=hrsl_band1_mask
    )
)

In [23]:
hrsl_geom = []
for geom, value in hrsl_band1_poly:
    geom = shapely.geometry.shape(geom)
    hrsl_geom.append(geom)

In [24]:
hrsl_gdf = pd.DataFrame(hrsl_geom)
hrsl_gdf = gpd.GeoDataFrame(hrsl_gdf, geometry=hrsl_gdf[0], crs="EPSG:4326")
hrsl_gdf.drop(columns=[0], inplace=True)
hrsl_gdf.reset_index(level=0, inplace=True)

In [25]:
hrsl_gdf.to_file('../data/hrsl_{}.gpkg'.format(country_code), driver='GPKG')

### Load OSM dataset

In [26]:
osm_gdf = gpd.read_file(
    "../download_data/{}-latest-free.shp/gis_osm_buildings_a_free_1.shp".format(country_code),
    driver="shp",
)

In [27]:
osm_gdf.to_file('../data/{}-latest-free.gpkg'.format(country_code),driver = 'GPKG')

### Get mapped pixels

In [28]:
# Just run this so you don't have to rerun everything up top
hrsl_gdf = gpd.read_file('../data/hrsl_{}.gpkg'.format(country_code), driver='GPKG')

In [29]:
# 2021 OSM File
osm_gdf = gpd.read_file('../data/{}-latest-free.gpkg'.format(country_code),driver = 'GPKG')

In [30]:
pixels_with_buildings = gpd.sjoin(
    hrsl_gdf, osm_gdf, how="inner", op="intersects"
)

In [31]:
pixels_with_buildings = pixels_with_buildings.drop_duplicates(subset='index')

In [35]:
# Show the result
pixels_with_buildings.head(5)

Unnamed: 0,index,geometry
1,1,"POLYGON ((49.27153 -11.95597, 49.27153 -11.956..."
5,5,"POLYGON ((49.25264 -11.99097, 49.25264 -11.991..."
6,6,"POLYGON ((49.22847 -12.04597, 49.22847 -12.046..."
7,7,"POLYGON ((49.22847 -12.04625, 49.22847 -12.046..."
9,9,"POLYGON ((49.23264 -12.05403, 49.23264 -12.054..."


In [34]:
# Drop unnecessary columns
pixels_with_buildings.drop(columns=['index_right', 'osm_id', 'code', 'fclass', 'name', 'type'], inplace=True)

In [36]:
# Save to file
pixels_with_buildings.to_file('../data/{}_pixels_with_buildings.gpkg'.format(country_code), driver='GPKG')

### Get unmapped pixels

In [37]:
pixels_no_buildings = pd.merge(hrsl_gdf, pixels_with_buildings, how='outer', indicator=True)

In [38]:
pixels_no_buildings = pixels_no_buildings[pixels_no_buildings['_merge'] == 'left_only']

In [40]:
pixels_no_buildings.drop(columns=['_merge'], inplace=True)

In [41]:
# 2021
pixels_no_buildings.to_file('../data/{}_pixels_no_buildings.gpkg'.format(country_code), driver='GPKG')

### Calculate percent completeness

In [42]:
len(pixels_with_buildings) / (len(pixels_with_buildings) + len(pixels_no_buildings)) * 100

17.415377698182457

## Aggregate to different admin boundaries

In this section, we label each pixels based on which administrative region they belong to. This data will allows us to calculate percent completeness per region

### Turn mapped pixels from a polygon layer to a point layer

In [8]:
# # Just run this so you don't have to rerun everything up top
# pixels_with_buildings = gpd.read_file(
#     "../data/{}_pixels_with_buildings.gpkg".format(country_code), driver="GPKG"
# )

In [43]:
pixels_with_buildings["geometry"] = pixels_with_buildings["geometry"].centroid


  """Entry point for launching an IPython kernel.


### Turn unmapped pixels from a polygon layer to a point layer

In [10]:
# # Just run this so you don't have to rerun everything up top
# pixels_no_buildings = gpd.read_file(
#     "../data/{}_pixels_no_buildings.gpkg".format(country_code), driver="GPKG"
# )

In [44]:
pixels_no_buildings["geometry"] = pixels_no_buildings["geometry"].centroid


  """Entry point for launching an IPython kernel.


### Load admin boundary

For this section, we aggregate it using the level 4 admin boundaries.

Note: It is generally preferrable to use the **lowest/most granular**  boundary level. For admin boundaries available in HDX, the lowest level boundary also contains data on which higher level boundaries they belong to. This allows us tag each pixel based multiple admin levels in one `.sjoin()`. However, code runtime will be much longer compared to using higher levels.

In [46]:
# Change this link depending on the file name of your 
# chosen admin boundary
adm_gdf = gpd.read_file(
    "../download_data/{}_adm_all/mdg_admbnda_adm4_BNGRC_OCHA_20181031.shp".format(country_code)
)

In [48]:
# Show the admin boundaries structure
adm_gdf.head()

Unnamed: 0,ADM0_PCODE,ADM0_EN,ADM1_PCODE,ADM1_EN,ADM1_TYPE,ADM2_PCODE,ADM2_EN,ADM2_TYPE,ADM3_PCODE,ADM3_EN,ADM3_TYPE,ADM4_PCODE,ADM4_EN,ADM4_TYPE,PROV_CODE_,OLD_PROVIN,PROV_TYPE,NOTES,SOURCE,geometry
0,MG,Madagascar,MG22,Amoron I Mania,Region,MG22203,Ambositra,District,MG22203090,Andina,Commune,MG22203090008,Ampasina,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((47.10540 -20.51447, 47.11485 -20.514..."
1,MG,Madagascar,MG22,Amoron I Mania,Region,MG22203,Ambositra,District,MG22203090,Andina,Commune,MG22203090011,Antanifotsy,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((47.13580 -20.54133, 47.13600 -20.544..."
2,MG,Madagascar,MG24,Ihorombe,Region,MG24216,Ihosy,District,MG24216192,Andiolava,Commune,MG24216192006,Amboloando,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((45.58350 -22.53003, 45.59102 -22.534..."
3,MG,Madagascar,MG24,Ihorombe,Region,MG24216,Ihosy,District,MG24216192,Andiolava,Commune,MG24216192002,Vatambe Nanarena,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((45.51825 -22.27433, 45.51900 -22.274..."
4,MG,Madagascar,MG24,Ihorombe,Region,MG24216,Ihosy,District,MG24216192,Andiolava,Commune,MG24216192003,Vohimary,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((45.62246 -22.40084, 45.62173 -22.414..."


In [49]:
# Check for unique values
adm_gdf.nunique()

ADM0_PCODE        1
ADM0_EN           1
ADM1_PCODE       22
ADM1_EN          22
ADM1_TYPE         1
ADM2_PCODE      119
ADM2_EN         119
ADM2_TYPE         1
ADM3_PCODE     1579
ADM3_EN        1429
ADM3_TYPE         1
ADM4_PCODE    17465
ADM4_EN       10977
ADM4_TYPE         1
PROV_CODE_        6
OLD_PROVIN        6
PROV_TYPE         1
NOTES             1
SOURCE           10
geometry      17465
dtype: int64

Note: Make sure that the target PCODEs for the geometries are unique or 1:1. If not, you may need to create a unique index to label the boundaries.

### Find intersection of mapped pixels and level 4 admin boundary

In [50]:
pixels_with_buildings_sjoin_adm = gpd.sjoin(
    pixels_with_buildings, adm_gdf, how="left", op="within"
)

In [55]:
# Drop all columns except for geometry and PCODEs
pixels_with_buildings_sjoin_adm.drop(
    columns=[
        'index_right', 
        'ADM0_PCODE',
        'ADM0_EN',
        'ADM1_EN',
        'ADM1_TYPE',
        'ADM2_EN',
        'ADM2_TYPE',
        'ADM3_EN',
        'ADM3_TYPE',
        'ADM4_EN',
        'ADM4_TYPE',
        'PROV_CODE_',
        'OLD_PROVIN',
        'PROV_TYPE',
        'NOTES',
        'SOURCE'
    ],
    inplace=True,
)

In [56]:
# Show the output
pixels_with_buildings_sjoin_adm.head()

Unnamed: 0,index,geometry,ADM1_PCODE,ADM2_PCODE,ADM3_PCODE,ADM4_PCODE
1,1,POINT (49.27167 -11.95611),MG71,MG71713,MG71713170,MG71713170002
5,5,POINT (49.25278 -11.99111),MG71,MG71713,MG71713170,MG71713170002
6,6,POINT (49.22861 -12.04611),MG71,MG71713,MG71713170,MG71713170001
7,7,POINT (49.22861 -12.04639),MG71,MG71713,MG71713170,MG71713170001
9,9,POINT (49.23278 -12.05417),MG71,MG71713,MG71713170,MG71713170001


In [58]:
pixels_with_buildings_sjoin_adm.to_file(
    "../data/{}_pixels_with_buildings_sjoin_{}.gpkg".format(country_code,adm_code), driver="GPKG"
)

### Find intersection of mapped pixels and level 4 admin boundary

In [60]:
pixels_no_buildings_sjoin_adm = gpd.sjoin(
    pixels_no_buildings, adm_gdf, how="left", op="within"
)

In [62]:
# Drop all columns except for geometry and PCODEs
pixels_no_buildings_sjoin_adm.drop(
    columns=[
        'index_right', 
        'ADM0_PCODE',
        'ADM0_EN',
        'ADM1_EN',
        'ADM1_TYPE',
        'ADM2_EN',
        'ADM2_TYPE',
        'ADM3_EN',
        'ADM3_TYPE',
        'ADM4_EN',
        'ADM4_TYPE',
        'PROV_CODE_',
        'OLD_PROVIN',
        'PROV_TYPE',
        'NOTES',
        'SOURCE'
    ],
    inplace=True,
)

In [63]:
# Show the output
pixels_no_buildings_sjoin_adm.head()

Unnamed: 0,index,geometry,ADM1_PCODE,ADM2_PCODE,ADM3_PCODE,ADM4_PCODE
0,0,POINT (49.24500 -11.95583),MG71,MG71713,MG71713170,MG71713170002
2,2,POINT (49.26056 -11.97472),MG71,MG71713,MG71713170,MG71713170002
3,3,POINT (49.26472 -11.98000),MG71,MG71713,MG71713170,MG71713170002
4,4,POINT (49.26472 -11.98028),MG71,MG71713,MG71713170,MG71713170002
8,8,POINT (49.22889 -12.04639),MG71,MG71713,MG71713170,MG71713170001


In [64]:
pixels_no_buildings_sjoin_adm.to_file(
    "../data/{}_pixels_no_buildings_sjoin_{}.gpkg".format(country_code,adm_code), driver="GPKG"
)

## Saving pixel output

The final pixel output will have each row as a pixel, labelled by admin boundary PCODE and mapped/unmapped status.

In [None]:
# # Just run this so you don't have to rerun everything up top
# pixels_with_buildings_sjoin_adm.to_file(
#     "../data/{}_pixels_with_buildings_sjoin_{}.gpkg".format(country_code,adm_code), driver="GPKG"
# )

In [None]:
# # Just run this so you don't have to rerun everything up top
# pixels_no_buildings_sjoin_adm.to_file(
#     "../data/{}_pixels_no_buildings_sjoin_{}.gpkg".format(country_code,adm_code), driver="GPKG"
# )

In [65]:
# Drop nan values in the dataframe
# Specify columns you want to check
pixels_no_buildings_sjoin_adm.dropna(
    subset=["ADM4_PCODE","ADM3_PCODE","ADM2_PCODE","ADM1_PCODE"], inplace=True
)

pixels_with_buildings_sjoin_adm.dropna(
    subset=["ADM4_PCODE","ADM3_PCODE","ADM2_PCODE","ADM1_PCODE"], inplace=True
)

In [66]:
# Adding a column to indicate that these pixels are mapped/unmapped
pixels_with_buildings_sjoin_adm["status"] = "mapped"
pixels_no_buildings_sjoin_adm["status"] = "unmapped"

In [67]:
# Join the mapped and unppaed pixels together
pixels_all = pd.concat(
    [pixels_with_buildings_sjoin_adm, pixels_no_buildings_sjoin_adm]
)

The final output looks like this:

In [68]:
pixels_all.head()

Unnamed: 0,index,geometry,ADM1_PCODE,ADM2_PCODE,ADM3_PCODE,ADM4_PCODE,status
1,1,POINT (49.27167 -11.95611),MG71,MG71713,MG71713170,MG71713170002,mapped
5,5,POINT (49.25278 -11.99111),MG71,MG71713,MG71713170,MG71713170002,mapped
6,6,POINT (49.22861 -12.04611),MG71,MG71713,MG71713170,MG71713170001,mapped
7,7,POINT (49.22861 -12.04639),MG71,MG71713,MG71713170,MG71713170001,mapped
9,9,POINT (49.23278 -12.05417),MG71,MG71713,MG71713170,MG71713170001,mapped


In [70]:
# 2020
# Save all columns (including geometry) to gkpg
pixels_all.to_file("../data/{}_pixels_all_{}_{}.gpkg".format(country_code,month,year), driver="GPKG")

In [71]:
# 2020
# Drop geometry column
pixels_all.drop(columns=["geometry"], inplace=True)

In [72]:
# 2020
# Save remaining columns as csv 
pixels_all.to_csv("../data/{}_pixels_all_{}_{}.csv".format(country_code,month,year), index=False)

## Calculate percent completeness per admin boundary level

In the following code snippets, we use a pivot table to get the number of mapped and unmapped pixels for each administrative region, then calculate the percent completeness

We use `ADM4_PCODE` as the index for each admin region. Change this depending on your own use case.

In [5]:
# Just run this so you don't have to rerun everything up top
pixels_all = pd.read_csv("../data/{}_pixels_all_{}_{}.csv".format(country_code,month,year))

In [7]:
pixels_all.head()

Unnamed: 0,index,ADM1_PCODE,ADM2_PCODE,ADM3_PCODE,ADM4_PCODE,status
0,1,MG71,MG71713,MG71713170,MG71713170002,mapped
1,5,MG71,MG71713,MG71713170,MG71713170002,mapped
2,6,MG71,MG71713,MG71713170,MG71713170001,mapped
3,7,MG71,MG71713,MG71713170,MG71713170001,mapped
4,9,MG71,MG71713,MG71713170,MG71713170001,mapped


In [14]:
# Load admin boundaries with relevant information
adm_gdf = gpd.read_file(
    "../download_data/{}_adm_all/mdg_admbnda_adm4_BNGRC_OCHA_20181031.shp".format(country_code)
)

In [26]:
adm_gdf.head()

Unnamed: 0,ADM0_PCODE,ADM0_EN,ADM1_PCODE,ADM1_EN,ADM1_TYPE,ADM2_PCODE,ADM2_EN,ADM2_TYPE,ADM3_PCODE,ADM3_EN,ADM3_TYPE,ADM4_PCODE,ADM4_EN,ADM4_TYPE,PROV_CODE_,OLD_PROVIN,PROV_TYPE,NOTES,SOURCE,geometry
0,MG,Madagascar,MG22,Amoron I Mania,Region,MG22203,Ambositra,District,MG22203090,Andina,Commune,MG22203090008,Ampasina,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((47.10540 -20.51447, 47.11485 -20.514..."
1,MG,Madagascar,MG22,Amoron I Mania,Region,MG22203,Ambositra,District,MG22203090,Andina,Commune,MG22203090011,Antanifotsy,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((47.13580 -20.54133, 47.13600 -20.544..."
2,MG,Madagascar,MG24,Ihorombe,Region,MG24216,Ihosy,District,MG24216192,Andiolava,Commune,MG24216192006,Amboloando,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((45.58350 -22.53003, 45.59102 -22.534..."
3,MG,Madagascar,MG24,Ihorombe,Region,MG24216,Ihosy,District,MG24216192,Andiolava,Commune,MG24216192002,Vatambe Nanarena,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((45.51825 -22.27433, 45.51900 -22.274..."
4,MG,Madagascar,MG24,Ihorombe,Region,MG24216,Ihosy,District,MG24216192,Andiolava,Commune,MG24216192003,Vohimary,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((45.62246 -22.40084, 45.62173 -22.414..."


### Pivot table

In [11]:
adm_pivot = pd.pivot_table(pixels_all[["ADM4_PCODE","status", "index"]], index = ["ADM4_PCODE"], columns = ["status"], aggfunc="count", fill_value=0)

In [12]:
adm_pivot.head()

Unnamed: 0_level_0,index,index
status,mapped,unmapped
ADM4_PCODE,Unnamed: 1_level_2,Unnamed: 2_level_2
MG11101001001,110,7
MG11101001002,28,6
MG11101001003,69,22
MG11101001004,98,35
MG11101001005,47,39


In [13]:
adm_pivot.info()

<class 'pandas.core.frame.DataFrame'>
Index: 17395 entries, MG11101001001 to MG72716350005
Data columns (total 2 columns):
 #   Column             Non-Null Count  Dtype
---  ------             --------------  -----
 0   (index, mapped)    17395 non-null  int64
 1   (index, unmapped)  17395 non-null  int64
dtypes: int64(2)
memory usage: 407.7+ KB


The resulting dataframe has a multiindex, which we will fix in the next code blocks

In [16]:
# Reset the multiindex
adm_pivot.columns = adm_pivot.columns.get_level_values(1)
adm_pivot = adm_pivot.reset_index().reset_index()

In [19]:
# Drop the index column 
adm_pivot.drop(["index"],axis = 1,inplace=True)

In [21]:
# Dataframe now has a regular index!
adm_pivot.head()

status,ADM4_PCODE,mapped,unmapped
0,MG11101001001,110,7
1,MG11101001002,28,6
2,MG11101001003,69,22
3,MG11101001004,98,35
4,MG11101001005,47,39


In [23]:
# Renaming mapped and unmapped columns
adm_pivot.rename(columns = {
    'mapped':'pixels_withbuilding_{}{}'.format(month,year),
    'unmapped':'pixels_nobuilding_{}{}'.format(month,year)
    },
    inplace=True
)

# Adding a column for percent completeness
adm_pivot['percentage_completeness_{}{}'.format(month,year)] = (adm_pivot['pixels_withbuilding_{}{}'.format(month,year)]/(adm_pivot['pixels_withbuilding_{}{}'.format(month,year)] + adm_pivot['pixels_nobuilding_{}{}'.format(month,year)])) * 100

In [24]:
adm_pivot.head()

status,ADM4_PCODE,pixels_withbuilding_july2021,pixels_nobuilding_july2021,percentage_completeness_july2021
0,MG11101001001,110,7,94.017094
1,MG11101001002,28,6,82.352941
2,MG11101001003,69,22,75.824176
3,MG11101001004,98,35,73.684211
4,MG11101001005,47,39,54.651163


By this point, we've successfully calculated the percentage completeness for each region! Each row corresponds to a unique boundary, with the columns corresponding to mapped/unmapped pixels with corresponding percetn completeness.

Finally, we will join the calculated data with the admin boundaries information. Take note of which columns you would require for the final output (usually PCODE, name, and geometry).

In [35]:
# Create new dataframe that will store adm_gdf
# with percentage completness output
adm_gdf_with_output = adm_gdf

# Get only the columns we need to identify region
# Add/subtract from the columns list as needed
adm_gdf_with_output = adm_gdf_with_output[[
    'ADM4_EN',
    'ADM4_PCODE',
    'geometry'
 ]]

adm_gdf_with_output.head()

Unnamed: 0,ADM4_EN,ADM4_PCODE,geometry
0,Ampasina,MG22203090008,"POLYGON ((47.10540 -20.51447, 47.11485 -20.514..."
1,Antanifotsy,MG22203090011,"POLYGON ((47.13580 -20.54133, 47.13600 -20.544..."
2,Amboloando,MG24216192006,"POLYGON ((45.58350 -22.53003, 45.59102 -22.534..."
3,Vatambe Nanarena,MG24216192002,"POLYGON ((45.51825 -22.27433, 45.51900 -22.274..."
4,Vohimary,MG24216192003,"POLYGON ((45.62246 -22.40084, 45.62173 -22.414..."


In [36]:
# Left joining percentage completness values 
# to their respective regions
adm_gdf_with_output = pd.merge(adm_gdf_with_output,adm_pivot,how="left", on = "ADM4_PCODE")

In [37]:
adm_gdf_with_output.head()

Unnamed: 0,ADM4_EN,ADM4_PCODE,geometry,pixels_withbuilding_july2021,pixels_nobuilding_july2021,percentage_completeness_july2021
0,Ampasina,MG22203090008,"POLYGON ((47.10540 -20.51447, 47.11485 -20.514...",13.0,54.0,19.402985
1,Antanifotsy,MG22203090011,"POLYGON ((47.13580 -20.54133, 47.13600 -20.544...",0.0,45.0,0.0
2,Amboloando,MG24216192006,"POLYGON ((45.58350 -22.53003, 45.59102 -22.534...",0.0,221.0,0.0
3,Vatambe Nanarena,MG24216192002,"POLYGON ((45.51825 -22.27433, 45.51900 -22.274...",0.0,140.0,0.0
4,Vohimary,MG24216192003,"POLYGON ((45.62246 -22.40084, 45.62173 -22.414...",0.0,46.0,0.0


In [38]:
adm_gdf_with_output.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 17465 entries, 0 to 17464
Data columns (total 6 columns):
 #   Column                            Non-Null Count  Dtype   
---  ------                            --------------  -----   
 0   ADM4_EN                           17465 non-null  object  
 1   ADM4_PCODE                        17465 non-null  object  
 2   geometry                          17465 non-null  geometry
 3   pixels_withbuilding_july2021      17395 non-null  float64 
 4   pixels_nobuilding_july2021        17395 non-null  float64 
 5   percentage_completeness_july2021  17395 non-null  float64 
dtypes: float64(3), geometry(1), object(2)
memory usage: 955.1+ KB


### Saving percent completeness per region

In [51]:
# To gpkg
filename = "../data/mapthegap-{}-{}-{}-{}-{}.gpkg"\
.format(country_code,adm_code,year,month_num,day)
adm_gdf_with_output.to_file(filename,driver='GPKG')
filename

'../data/mapthegap-mdg-adm4-2021-07-12.gpkg'

In [52]:
# To csv
filename = "../data/mapthegap-{}-{}-{}-{}-{}.csv"\
.format(country_code,adm_code,year,month_num,day)
adm_gdf_with_output.to_csv(filename,index=False)
filename

'../data/mapthegap-mdg-adm4-2021-07-12.csv'

# Finish!
By this point, you should already have the percent completeness for the whole country and per admin region.

To convert this into tilesets for plotting, refer to `2_CreateTileset.ipynb`