# Methodology

This notebook shows the methodology for the paper "Measuring OpenStreetMap building footprint completeness using human settlement layers".

## Setup

We import all of the relevant packages as well as download the datasets.

For reference, here are the original download links for the datasets:
1. High Resolution Settlement Layer (HRSL) ([Philippines](https://data.humdata.org/dataset/philippines-high-resolution-population-density-maps-demographic-estimates)) ([Madagascar](https://data.humdata.org/dataset/highresolutionpopulationdensitymaps-mdg))
2. Administrative Boundaries ([Philippines](https://data.humdata.org/dataset/philippines-administrative-levels-0-to-3)) ([Madagascar](https://data.humdata.org/dataset/madagascar-administrative-level-0-4-boundaries))
3. OpenStreetMap (OSM) ([Philippines](https://download.geofabrik.de/asia/philippines.html)) ([Madagascar](https://download.geofabrik.de/africa/madagascar.html))

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import shapely
import geopandas as gpd
import rasterio
import rasterio.features

import wget

import os
import glob
from zipfile import ZipFile

ModuleNotFoundError: No module named 'shapely'

In [6]:
!which python

/opt/conda/bin/python


In [3]:
try:
    os.mkdir("../download_data")
except Exception:
    pass

### HRSL download

Uncomment the cells below if you have not yet downloaded the HRSL datasets.

In [5]:
hrsl_phl_men_url = "https://data.humdata.org/dataset/6d9f35c0-4764-49ee-b364-329db0b7a47d/resource/5a13bb60-4506-42a5-a08a-7ccf20413179/download/phl_men_2019-06-01_geotiff.zip"
hrsl_phl_women_url = "https://data.humdata.org/dataset/6d9f35c0-4764-49ee-b364-329db0b7a47d/resource/4aff438c-43d9-47d0-853f-5a6b6ae28223/download/phl_women_2019-06-01_geotiff.zip"

In [6]:
wget.download(hrsl_phl_men_url, '../download_data/phl_hrsl_men_jun_2019.zip')

'../download_data/phl_hrsl_men_jun_2019 (1).zip'

In [7]:
wget.download(hrsl_phl_women_url, '../download_data/phl_hrsl_women_jun_2019.zip')

'../download_data/phl_hrsl_women_jun_2019 (1).zip'

### Admin boundary download

We note that the level 4 (barangay) admin boundary dataset for the Philippines is not available in the HDX website. We provide an external download link so that our results can still be reproduced.

Uncomment the cells below if you have not yet downloaded the admin boundary datasets.

In [8]:
adm_phl_url = "https://data.humdata.org/dataset/caf116df-f984-4deb-85ca-41b349d3f313/resource/12457689-6a86-4474-8032-5ca9464d38a8/download/phl_adm_psa_namria_20200529_shp.zip"

In [9]:
wget.download(adm_phl_url, '../download_data/phl_adm_all.zip')

'../download_data/phl_adm_all (1).zip'

In [10]:
adm4_phl_url = "https://storage.googleapis.com/osm-completeness-thinkingmachines/phl_adm_2015_level4_barangay.gpkg.zip"

In [11]:
wget.download(adm4_phl_url, '../download_data/phl_adm_2015_level4_barangay.gpkg.zip')

'../download_data/phl_adm_2015_level4_barangay.gpkg (1).zip'

### OSM download

Uncomment the cells below if you have not yet downloaded the OSM datasets.

In [19]:
# #### 2020 OSM Dataset
# osm_phl_url = "https://storage.googleapis.com/osm-completeness-thinkingmachines/phl_osm_jan_2020_buildings.gpkg.zip"

In [13]:
##### 2021 OSM Dataset
osm_phl_url = "https://download.geofabrik.de/asia/philippines-latest-free.shp.zip"

In [20]:
# wget.download(osm_phl_url, '../download_data/phl_osm_jan_2020_buildings.gpkg.zip')

'../download_data/phl_osm_jan_2020_buildings.gpkg.zip'

In [15]:
wget.download(osm_phl_url, '../download_data/philippines-latest-free.shp.zip')

'../download_data/philippines-latest-free.shp (1).zip'

### Unzip all datasets

In [21]:
for i in glob.glob("../download_data/*.zip"):
    if os.path.isdir(os.path.splitext(i)[0]):
        pass
    else:
        with ZipFile(i) as myzip:
            myzip.extractall(os.path.splitext(i)[0])

## What do we want our final output to look like?

In [17]:
wget.download('https://storage.googleapis.com/osm-completeness-thinkingmachines/mapthegap-phl-adm2-2021-05-29.csv', '../download_data/mapthegap-phl-adm2-2021-05-29.csv')
wget.download('https://storage.googleapis.com/osm-completeness-thinkingmachines/mapthegap-phl-adm3-2021-05-29.csv', '../download_data/mapthegap-phl-adm3-2021-05-29.csv')
wget.download('https://storage.googleapis.com/osm-completeness-thinkingmachines/mapthegap-phl-adm4-2021-05-29.csv', '../download_data/mapthegap-phl-adm4-2021-05-29.csv')

'../download_data/mapthegap-phl-adm4-2021-05-29 (1).csv'

In [18]:
province_output = pd.read_csv('../download_data/mapthegap-phl-adm2-2021-05-29.csv')

In [19]:
province_output.head()

Unnamed: 0,ADM2_EN,ADM2_PCODE,ADM2_REF,ADM2ALT1EN,ADM2ALT2EN,pixels_withbuilding_june2020,pixels_nobuilding_june2020,percentage_completeness_june2020,pixels_withbuilding_may2021,pixels_nobuilding_may2021,percentage_completeness_may2021
0,Abra,PH140100000,,,,13790,5152,72.801183,13862,5080,73.18129
1,Agusan del Norte,PH160200000,,,,4202,35688,10.533968,15586,24304,39.072449
2,Agusan del Sur,PH160300000,,,,2953,38147,7.184915,3422,37678,8.326034
3,Aklan,PH060400000,,,,14693,31978,31.482077,21519,25152,46.107861
4,Albay,PH050500000,,,,27168,31532,46.282794,32756,25944,55.802385


In [20]:
citymuni_output = pd.read_csv('../download_data/mapthegap-phl-adm3-2021-05-29.csv')

In [21]:
citymuni_output.head()

Unnamed: 0,ADM3_EN,ADM3_PCODE,ADM3_REF,ADM3ALT1EN,ADM3ALT2EN,pixels_withbuilding_june2020,pixels_nobuilding_june2020,percentage_completeness_june2020,pixels_withbuilding_may2021,pixels_nobuilding_may2021,percentage_completeness_may2021
0,Aborlan,PH175301000,,,,594,3599,14.166468,652,3541,15.549726
1,Abra de Ilog,PH175101000,,,,1315,729,64.334638,1316,728,64.383562
2,Abucay,PH030801000,,,,1996,646,75.548827,1993,649,75.435276
3,Abulug,PH021501000,,,,3396,927,78.556558,3405,918,78.764747
4,Abuyog,PH083701000,,,,1456,1005,59.162942,1473,988,59.853718


In [22]:
brgy_output = pd.read_csv('../download_data/mapthegap-phl-adm4-2021-05-29.csv')

In [23]:
brgy_output.head()

Unnamed: 0,Reg_Code,Reg_Name,Pro_Code,Pro_Name,Mun_Code,Mun_Name,Bgy_Code,Bgy_Name,RURBAN,ADM4_PCODE_NAME,pixels_withbuilding_june2020,pixels_nobuilding_june2020,percentage_completeness_june2020,ADM4_PCODE,pixels_withbuilding_may2021,pixels_nobuilding_may2021,percentage_completeness_may2021
0,110000000,REGION XI (DAVAO REGION),118200000,COMPOSTELA VALLEY,118206000.0,MAWAB,118206010.0,Sawangan,R,118206010_Sawangan,3,71,4.054054,118206010.0,6,68,8.108108
1,180000000,NEGROS ISLAND REGION (NIR),184500000,NEGROS OCCIDENTAL,184501000.0,BACOLOD CITY (Capital),184501048.0,Felisa,U,184501048_Felisa,2,272,0.729927,184501048.0,3,271,1.094891
2,120000000,REGION XII (SOCCSKSARGEN),126300000,SOUTH COTABATO,126311000.0,NORALA,126311020.0,Simsiman,R,126311020_Simsiman,116,90,56.31068,126311020.0,147,59,71.359223
3,150000000,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),157000000,TAWI-TAWI,157001000.0,PANGLIMA SUGALA (BALIMBING),157001001.0,Balimbing Proper,R,157001001_Balimbing Proper,0,93,0.0,157001001.0,5,88,5.376344
4,150000000,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),157000000,TAWI-TAWI,157001000.0,PANGLIMA SUGALA (BALIMBING),157001002.0,Batu-batu (Pob.),R,157001002_Batu-batu (Pob.),11,182,5.699482,157001002.0,25,168,12.953368


## Get intersection of HRSL pixels and OSM buildings

We use Facebook’s High Resolution Settlement Layer (HRSL) as a proxy ground truth for building footprints. We then measure data completeness by getting the “percentage completeness” of pixels which is computed using the total percentage of pixels within the intersection of the human settlement layer and the OSM building footprints.

![](../assets/formula.png)

Pixels that intersect OSM buildings are *mapped*.

Pixels that do not intersect OSM buildings are *unmapped*.

### Philippines

Uncomment the cells down below if you have not yet saved the `with_buildings` and `no_buildings` GPKG files.

#### Load HRSL dataset

In [24]:
hrsl_phl_men = rasterio.open(
    "../download_data/phl_hrsl_men_jun_2019/PHL_men_2019-06-01.tif"
)

hrsl_phl_women = rasterio.open(
    "../download_data/phl_hrsl_women_jun_2019/PHL_women_2019-06-01.tif"
)

In [25]:
# hrsl_phl_crs = hrsl_phl_men.crs

#### Add male and female population to get total population

In [26]:
hrsl_phl_men_band1_mask = hrsl_phl_men.read_masks(1)

In [27]:
hrsl_phl_women_band1_mask = hrsl_phl_women.read_masks(1)

In [28]:
hrsl_phl_band1_mask = hrsl_phl_men_band1_mask + hrsl_phl_women_band1_mask

#### Convert HRSL dataset from raster to vector

In [29]:
hrsl_phl_rand = np.random.rand(
    np.shape(hrsl_phl_band1_mask)[0], np.shape(hrsl_phl_band1_mask)[1]
)
hrsl_phl_rand = hrsl_phl_rand.astype("float32")

In [30]:
hrsl_phl_band1_poly = list(
    rasterio.features.shapes(
        hrsl_phl_rand, transform=hrsl_phl_men.transform, mask=hrsl_phl_band1_mask
    )
)

In [31]:
hrsl_phl_geom = []
for geom, value in hrsl_phl_band1_poly:
    geom = shapely.geometry.shape(geom)
    hrsl_phl_geom.append(geom)

In [32]:
hrsl_phl_gdf = pd.DataFrame(hrsl_phl_geom)
hrsl_phl_gdf = gpd.GeoDataFrame(hrsl_phl_gdf, geometry=hrsl_phl_gdf[0], crs="EPSG:4326")
hrsl_phl_gdf.drop(columns=[0], inplace=True)
hrsl_phl_gdf.reset_index(level=0, inplace=True)

In [33]:
## TODO: make mkdir command for ../data

In [34]:
hrsl_phl_gdf.to_file('../data/hrsl_phl.gpkg', driver='GPKG')

#### Load OSM dataset

In [23]:
# The 2021 zip file contains many shapefiles for points of interests, landuse, waterways, etc. 
# We are only interested in loading the building footprint data
osm_phl = gpd.read_file(
    "../download_data/philippines-latest-free.shp/gis_osm_buildings_a_free_1.shp",
    driver="shp",
)

In [24]:
osm_phl

Unnamed: 0,osm_id,code,fclass,name,type,geometry
0,4350725,1500,building,,,"POLYGON ((120.99286 14.59423, 120.99300 14.594..."
1,4392200,1500,building,,,"POLYGON ((121.09786 14.63354, 121.09815 14.633..."
2,4418475,1500,building,Greenbelt 3,,"POLYGON ((121.02102 14.55166, 121.02105 14.552..."
3,4872564,1500,building,Manila Film Center,,"POLYGON ((120.98150 14.55072, 120.98236 14.551..."
4,4913806,1500,building,,,"POLYGON ((121.00756 14.45639, 121.00758 14.456..."
...,...,...,...,...,...,...
9501609,955087886,1500,building,,,"POLYGON ((124.59483 12.07848, 124.59488 12.078..."
9501610,955087887,1500,building,,,"POLYGON ((124.59450 12.07758, 124.59457 12.077..."
9501611,955087888,1500,building,,,"POLYGON ((124.59394 12.07685, 124.59402 12.076..."
9501612,955087889,1500,building,,,"POLYGON ((124.59396 12.07692, 124.59405 12.076..."


In [25]:
osm_phl.to_file('../data/philippines-latest-free.gpkg',driver = 'GPKG')

#### Get mapped pixels

In [5]:
# Just run this so you don't have to rerun everything up top
hrsl_phl_gdf = gpd.read_file('../data/hrsl_phl.gpkg', driver='GPKG')
osm_phl = gpd.read_file('../data/philippines-latest-free.gpkg',driver = 'GPKG')

In [33]:
phl_pixels_with_buildings = gpd.sjoin(
    hrsl_phl_gdf, osm_phl, how="inner", op="intersects"
)

In [34]:
# Uncomment .head() commands in final output
phl_pixels_with_buildings.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 7769975 entries, 1 to 5515916
Data columns (total 8 columns):
 #   Column       Dtype   
---  ------       -----   
 0   index        int64   
 1   geometry     geometry
 2   index_right  int64   
 3   osm_id       object  
 4   code         int64   
 5   fclass       object  
 6   name         object  
 7   type         object  
dtypes: geometry(1), int64(3), object(4)
memory usage: 533.5+ MB


In [35]:
phl_pixels_with_buildings = phl_pixels_with_buildings.drop_duplicates(subset='index')

In [37]:
phl_pixels_with_buildings.drop(columns=['index_right', 'osm_id', 'code', 'fclass', 'name', 'type'], inplace=True)

In [38]:
# Uncomment .head() commands in final output
phl_pixels_with_buildings.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 2155197 entries, 1 to 5515916
Data columns (total 2 columns):
 #   Column    Dtype   
---  ------    -----   
 0   index     int64   
 1   geometry  geometry
dtypes: geometry(1), int64(1)
memory usage: 49.3 MB


In [39]:
phl_pixels_with_buildings.to_file('../data/phl_pixels_with_buildings.gpkg', driver='GPKG')

#### Get unmapped pixels

In [4]:
# Just run this so you don't have to rerun everything up top
phl_pixels_with_buildings = gpd.read_file(
    "../data/phl_pixels_with_buildings.gpkg", driver="GPKG"
)

In [6]:
phl_pixels_no_buildings = pd.merge(hrsl_phl_gdf, phl_pixels_with_buildings, how='outer', indicator=True)

In [7]:
phl_pixels_no_buildings = phl_pixels_no_buildings[phl_pixels_no_buildings['_merge'] == 'left_only']

In [8]:
phl_pixels_no_buildings.drop(columns=['_merge'], inplace=True)

In [9]:
phl_pixels_no_buildings.to_file('../data/phl_pixels_no_buildings.gpkg', driver='GPKG')

#### Calculate percentage completeness

In 2020, it was 31.39%. What is it now?

In [10]:
len(phl_pixels_with_buildings) / (len(phl_pixels_with_buildings) + len(phl_pixels_no_buildings)) * 100

39.072296358124056

## Aggregate to different admin boundaries

### Philippines

#### Turn mapped pixels from a polygon layer to a point layer

In [5]:
# Just run this so you don't have to rerun everything up top
phl_pixels_with_buildings = gpd.read_file(
    "../data/phl_pixels_with_buildings.gpkg", driver="GPKG"
)

In [3]:
# The centroid command will give a warning but we may safely ignore it
phl_pixels_with_buildings["geometry"] = phl_pixels_with_buildings["geometry"].centroid


  """Entry point for launching an IPython kernel.


#### Turn unmapped pixels from a polygon layer to a point layer

In [4]:
# Just run this so you don't have to rerun everything up top
phl_pixels_no_buildings = gpd.read_file(
    "../data/phl_pixels_no_buildings.gpkg", driver="GPKG"
)

In [5]:
phl_pixels_no_buildings["geometry"] = phl_pixels_no_buildings["geometry"].centroid


  """Entry point for launching an IPython kernel.


#### Load level 4 admin boundary

In [3]:
phl_adm4 = gpd.read_file(
    "../download_data/phl_adm_2015_level4_barangay.gpkg/phl_adm_2015_level4_barangay.gpkg"
)

In [5]:
phl_adm4.nunique()

Reg_Code       18
Reg_Name       18
Pro_Code       87
Pro_Name       87
Mun_Code     1648
Mun_Name     1445
Bgy_Code    42037
Bgy_Name    27092
RURBAN          4
geometry    42058
dtype: int64

#### Create index for level 4 admin boundary

As shown by `nunique()` result, `Bgy_Code` and `geometry` are not 1:1, indicating that barangay codes cannot be used as a unique index. We work around this by adding a new column combining `Bgy_Code` and `Bgy_Name`

In [7]:
# Create new column indicating both the pcode and name of the barangay
phl_adm4["ADM4_PCODE_NAME"] = phl_adm4["Bgy_Code"] + "_" + phl_adm4["Bgy_Name"]

#### Find intersection of mapped pixels and level 4 admin boundary

In [8]:
# Get all the pixels that are found within the level 4 admin boundaries
phl_pixels_with_buildings_sjoin_adm4 = gpd.sjoin(
    phl_pixels_with_buildings, phl_adm4, how="left", op="within"
)

In [9]:
# Remove all columns besides index, geometry, RURBAN, and ADM4_PCODE_NAME
phl_pixels_with_buildings_sjoin_adm4.drop(
    columns=[
        "index_right",
        "Reg_Code",
        "Reg_Name",
        "Pro_Code",
        "Pro_Name",
        "Mun_Code",
        "Mun_Name",
        "Bgy_Code",
        "Bgy_Name",
    ],
    inplace=True,
)

In [10]:
# Save to file
phl_pixels_with_buildings_sjoin_adm4.to_file(
    "../data/phl_pixels_with_buildings_sjoin_adm4.gpkg", driver="GPKG"
)

#### Find intersection of unmapped pixels and level 4 admin boundary

In [11]:
phl_pixels_no_buildings_sjoin_adm4 = gpd.sjoin(
    phl_pixels_no_buildings, phl_adm4, how="left", op="within"
)

In [12]:
phl_pixels_no_buildings_sjoin_adm4.drop(
    columns=[
        "index_right",
        "Reg_Code",
        "Reg_Name",
        "Pro_Code",
        "Pro_Name",
        "Mun_Code",
        "Mun_Name",
        "Bgy_Code",
        "Bgy_Name",
    ],
    inplace=True,
)

In [27]:
phl_pixels_with_buildings_sjoin_adm4.head()

Unnamed: 0,index,geometry,RURBAN,ADM4_PCODE_NAME
0,1,POINT (121.84083 20.79833),R,020902010_Santa Rosa (Kaynatuan)
1,3,POINT (121.84389 20.79083),R,020902010_Santa Rosa (Kaynatuan)
2,4,POINT (121.84389 20.79056),R,020902010_Santa Rosa (Kaynatuan)
3,5,POINT (121.84222 20.79028),R,020902010_Santa Rosa (Kaynatuan)
4,8,POINT (121.84306 20.79000),R,020902010_Santa Rosa (Kaynatuan)


In [13]:
phl_pixels_no_buildings_sjoin_adm4.to_file(
    "../data/phl_pixels_no_buildings_sjoin_adm4.gpkg", driver="GPKG"
)

#### Load level 3 admin boundary

In [14]:
phl_adm3 = gpd.read_file(
    "../download_data/phl_adm_all/phl_admbnda_adm3_psa_namria_20200529.shp"
)

#### Find intersection of mapped pixels and level 3 admin boundary

In [15]:
phl_pixels_with_buildings_sjoin_adm3 = gpd.sjoin(
    phl_pixels_with_buildings, phl_adm3, how="left", op="within"
)

In [16]:
phl_pixels_with_buildings_sjoin_adm3.drop(
    columns=[
        "index_right",
        "Shape_Leng",
        "Shape_Area",
        "ADM3_EN",
        "ADM3_REF",
        "ADM3ALT1EN",
        "ADM3ALT2EN",
        "ADM2_EN",
        "ADM2_PCODE",
        "ADM1_EN",
        "ADM1_PCODE",
        "ADM0_EN",
        "ADM0_PCODE",
        "date",
        "validOn",
        "validTo",
    ],
    inplace=True,
)

In [17]:
phl_pixels_with_buildings_sjoin_adm3.to_file(
    "../data/phl_pixels_with_buildings_sjoin_adm3.gpkg", driver="GPKG"
)

#### Find intersection of unmapped pixels and level 3 admin boundary

In [18]:
phl_pixels_no_buildings_sjoin_adm3 = gpd.sjoin(
    phl_pixels_no_buildings, phl_adm3, how="left", op="within"
)

In [19]:
phl_pixels_no_buildings_sjoin_adm3.drop(
    columns=[
        "index_right",
        "Shape_Leng",
        "Shape_Area",
        "ADM3_EN",
        "ADM3_REF",
        "ADM3ALT1EN",
        "ADM3ALT2EN",
        "ADM2_EN",
        "ADM2_PCODE",
        "ADM1_EN",
        "ADM1_PCODE",
        "ADM0_EN",
        "ADM0_PCODE",
        "date",
        "validOn",
        "validTo",
    ],
    inplace=True,
)

In [20]:
phl_pixels_no_buildings_sjoin_adm3.to_file(
    "../data/phl_pixels_no_buildings_sjoin_adm3.gpkg", driver="GPKG"
)

#### Load level 2 admin boundary

In [21]:
phl_adm2 = gpd.read_file(
    "../download_data/phl_adm_all/phl_admbnda_adm2_psa_namria_20200529.shp"
)

#### Find intersection of mapped pixels and level 2 admin boundary

In [29]:
phl_pixels_with_buildings_sjoin_adm2 = gpd.sjoin(
    phl_pixels_with_buildings, phl_adm2, how="left", op="within"
)

In [31]:
phl_pixels_with_buildings_sjoin_adm2.drop(
    columns=[
        "index_right",
        "Shape_Leng",
        "Shape_Area",
        "ADM2_EN",
        "ADM2_REF",
        "ADM2ALT1EN",
        "ADM2ALT2EN",
        "ADM1_EN",
        "ADM1_PCODE",
        "ADM0_EN",
        "ADM0_PCODE",
        "date",
        "validOn",
        "validTo",
    ],
    inplace=True,
)

In [34]:
phl_pixels_with_buildings_sjoin_adm2.to_file(
    "../data/phl_pixels_with_buildings_sjoin_adm2.gpkg", driver="GPKG"
)

#### Find intersection of unmapped pixels and level 2 admin boundary

In [35]:
phl_pixels_no_buildings_sjoin_adm2 = gpd.sjoin(
    phl_pixels_no_buildings, phl_adm2, how="left", op="within"
)

In [36]:
phl_pixels_no_buildings_sjoin_adm2.drop(
    columns=[
        "index_right",
        "Shape_Leng",
        "Shape_Area",
        "ADM2_EN",
        "ADM2_REF",
        "ADM2ALT1EN",
        "ADM2ALT2EN",
        "ADM1_EN",
        "ADM1_PCODE",
        "ADM0_EN",
        "ADM0_PCODE",
        "date",
        "validOn",
        "validTo",
    ],
    inplace=True,
)

In [38]:
phl_pixels_no_buildings_sjoin_adm2.to_file(
    "../data/phl_pixels_no_buildings_sjoin_adm2.gpkg", driver="GPKG"
)

#### Merge and concatenate dataframes

Uncomment the cell below if you have not yet loaded these variables.

In [16]:
# # You can run this so you don't have to rerun everything up top
# phl_pixels_with_buildings_sjoin_adm4 = gpd.read_file(
#     "../data/phl_pixels_with_buildings_sjoin_adm4.gpkg", driver="GPKG"
# )
# phl_pixels_with_buildings_sjoin_adm3 = gpd.read_file(
#     "../data/phl_pixels_with_buildings_sjoin_adm3.gpkg", driver="GPKG"
# )
# phl_pixels_with_buildings_sjoin_adm2 = gpd.read_file(
#     "../data/phl_pixels_with_buildings_sjoin_adm2.gpkg", driver="GPKG"
# )
# phl_pixels_no_buildings_sjoin_adm4 = gpd.read_file(
#     "../data/phl_pixels_no_buildings_sjoin_adm4.gpkg", driver="GPKG"
# )
# phl_pixels_no_buildings_sjoin_adm3 = gpd.read_file(
#     "../data/phl_pixels_no_buildings_sjoin_adm3.gpkg", driver="GPKG"
# )
# phl_pixels_no_buildings_sjoin_adm2 = gpd.read_file(
#     "../data/phl_pixels_no_buildings_sjoin_adm2.gpkg", driver="GPKG"
# )

In [49]:
phl_pixels_with_buildings_sjoin_adm2.head()

Unnamed: 0,index,ADM2_PCODE,geometry
0,1,PH020900000,POINT (121.84083 20.79833)
1,3,PH020900000,POINT (121.84389 20.79083)
2,4,PH020900000,POINT (121.84389 20.79056)
3,5,PH020900000,POINT (121.84222 20.79028)
4,8,PH020900000,POINT (121.84306 20.79000)
...,...,...,...
2155192,5515845,PH157000000,POINT (119.47556 4.66139)
2155193,5515841,PH157000000,POINT (119.39306 4.66139)
2155194,5515843,PH157000000,POINT (119.39472 4.66139)
2155195,5515846,PH157000000,POINT (119.47556 4.66111)


In [50]:
phl_pixels_with_buildings_sjoin_adm3.head()

Unnamed: 0,index,ADM3_PCODE,geometry
0,1,PH020902000,POINT (121.84083 20.79833)
1,3,PH020902000,POINT (121.84389 20.79083)
2,4,PH020902000,POINT (121.84389 20.79056)
3,5,PH020902000,POINT (121.84222 20.79028)
4,8,PH020902000,POINT (121.84306 20.79000)


In [51]:
phl_pixels_with_buildings_sjoin_adm4.head()

Unnamed: 0,index,RURBAN,ADM4_PCODE_NAME,geometry
0,1,R,020902010_Santa Rosa (Kaynatuan),POINT (121.84083 20.79833)
1,3,R,020902010_Santa Rosa (Kaynatuan),POINT (121.84389 20.79083)
2,4,R,020902010_Santa Rosa (Kaynatuan),POINT (121.84389 20.79056)
3,5,R,020902010_Santa Rosa (Kaynatuan),POINT (121.84222 20.79028)
4,8,R,020902010_Santa Rosa (Kaynatuan),POINT (121.84306 20.79000)


Merging and concatenating for level 2/3/4 admin boundaries.

In [21]:
# Merging adm3 boundaries with adm4
phl_pixels_with_buildings_sjoin = phl_pixels_with_buildings_sjoin_adm3.merge(
    phl_pixels_with_buildings_sjoin_adm4[["index", "RURBAN", "ADM4_PCODE_NAME"]],
    how="left",
    left_on="index",
    right_on="index",
)

# Additionally merging adm2 to adm3 and adm4
phl_pixels_with_buildings_sjoin = phl_pixels_with_buildings_sjoin.merge(
    phl_pixels_with_buildings_sjoin_adm2[["index", "ADM2_PCODE"]],
    how="left",
    left_on="index",
    right_on="index",
)

In [26]:
# Drop nan values in the dataframe
phl_pixels_with_buildings_sjoin.dropna(
    subset=["ADM3_PCODE", "RURBAN", "ADM4_PCODE_NAME","ADM2_PCODE"], inplace=True
)

In [28]:
# Adding a column to indicate that these pixels are mapped
phl_pixels_with_buildings_sjoin["status"] = "mapped"

In [31]:
# Merging adm3 boundaries with adm4
phl_pixels_no_buildings_sjoin = phl_pixels_no_buildings_sjoin_adm3.merge(
    phl_pixels_no_buildings_sjoin_adm4[["index", "RURBAN", "ADM4_PCODE_NAME"]],
    how="left",
    left_on="index",
    right_on="index",
)

# Additionally merging adm2 to adm3 and adm4
phl_pixels_no_buildings_sjoin = phl_pixels_no_buildings_sjoin.merge(
    phl_pixels_no_buildings_sjoin_adm2[["index", "ADM2_PCODE"]],
    how="left",
    left_on="index",
    right_on="index",
)

In [32]:
# Drop nan values in the dataframe
phl_pixels_no_buildings_sjoin.dropna(
    subset=["ADM3_PCODE", "RURBAN", "ADM4_PCODE_NAME", "ADM2_PCODE"], inplace=True
)

In [33]:
# Adding a column to indicate that these pixels are unmapped
phl_pixels_no_buildings_sjoin["status"] = "unmapped"

In [44]:
# Join the mapped and unppaed pixels together
phl_pixels_all = pd.concat(
    [phl_pixels_with_buildings_sjoin, phl_pixels_no_buildings_sjoin]
)

The final output of this notebook should look like this!

In [45]:
phl_pixels_all.head()

Unnamed: 0,index,ADM3_PCODE,geometry,RURBAN,ADM4_PCODE_NAME,ADM2_PCODE,status
0,1,PH020902000,POINT (121.84083 20.79833),R,020902010_Santa Rosa (Kaynatuan),PH020900000,mapped
1,3,PH020902000,POINT (121.84389 20.79083),R,020902010_Santa Rosa (Kaynatuan),PH020900000,mapped
2,4,PH020902000,POINT (121.84389 20.79056),R,020902010_Santa Rosa (Kaynatuan),PH020900000,mapped
3,5,PH020902000,POINT (121.84222 20.79028),R,020902010_Santa Rosa (Kaynatuan),PH020900000,mapped
4,8,PH020902000,POINT (121.84306 20.79000),R,020902010_Santa Rosa (Kaynatuan),PH020900000,mapped


In [46]:
# Save all columns (including geometry) to gkpg
phl_pixels_all.to_file("../data/phl_pixels_all.gpkg", driver="GPKG")

In [47]:
# Drop geometry column
phl_pixels_all.drop(columns=["geometry"], inplace=True)

In [48]:
# Save remaining columns as csv 
phl_pixels_all.to_csv("../data/phl_pixels_all.csv", index=False)

## Madagascar

The following code follows the same steps for Madagascar, and saves the output to `mdg_pixels_all.to_csv`

### HRSL download

Uncomment the cells below if you have not yet downloaded the HRSL datasets.

In [8]:
hrsl_mdg_url = "https://data.humdata.org/dataset/9e7ff424-7b9c-42cc-b869-5756fcad0956/resource/1fafdd04-8e0b-4c2a-b4dc-8f3ff39e3015/download/population_mdg_2018-10-01.zip"

In [15]:
wget.download(hrsl_mdg_url, '../download_data/mdg_hrsl_oct_2018.zip')

'../download_data/mdg_hrsl_oct_2018.zip'

### Admin boundary download

Uncomment the cells below if you have not yet downloaded the admin boundary datasets.

In [40]:
adm_mdg_url = "https://data.humdata.org/dataset/26fa506b-0727-4d9d-a590-d2abee21ee22/resource/ed94d52e-349e-41be-80cb-62dc0435bd34/download/mdg_adm_bngrc_ocha_20181031_shp.zip"

In [43]:
wget.download(adm_mdg_url, '../download_data/mdg_adm_all.zip')

'../download_data/mdg_adm_all.zip'

### OSM download

Uncomment the cells below if you have not yet downloaded the OSM datasets.

In [65]:
##### 2020 OSM Dataset
osm_mdg_url = "https://storage.googleapis.com/osm-completeness-thinkingmachines/mdg_osm_jan_2020_buildings.gpkg.zip"

In [66]:
wget.download(osm_mdg_url, '../download_data/mdg_osm_jan_2020_buildings.gpkg.zip')

'../download_data/mdg_osm_jan_2020_buildings.gpkg.zip'

In [10]:
##### 2021 OSM Dataset
# osm_mdg_url = "https://download.geofabrik.de/africa/madagascar-latest-free.shp.zip"

In [11]:
# wget.download(osm_mdg_url, '../download_data/madagascar-latest-free.shp.zip')

'../download_data/madagascar-latest-free.shp.zip'

### Unzip all datasets

In [67]:
for i in glob.glob("../download_data/*.zip"):
    if os.path.isdir(os.path.splitext(i)[0]):
        pass
    else:
        with ZipFile(i) as myzip:
            myzip.extractall(os.path.splitext(i)[0])

### Get intersection of HRSL pixels and OSM Buildings

#### Load HRSL dataset

Note: unlike Philippines, Madagascar has no separate male and female layers, so the "Add male and female population to get total population" step is skipped

In [3]:
hrsl_mdg = rasterio.open(
    "../download_data/mdg_hrsl_oct_2018/population_mdg_2018-10-01.tif"
)

In [69]:
hrsl_mdg_band1_mask = hrsl_mdg.read_masks(1)

#### Convert HRSL dataset from raster to vector

In [70]:
hrsl_mdg_rand = np.random.rand(
    np.shape(hrsl_mdg_band1_mask)[0], np.shape(hrsl_mdg_band1_mask)[1]
)
hrsl_mdg_rand = hrsl_mdg_rand.astype("float32")

In [71]:
hrsl_mdg_band1_poly = list(
    rasterio.features.shapes(
        hrsl_mdg_rand, transform=hrsl_mdg.transform, mask=hrsl_mdg_band1_mask
    )
)

In [72]:
hrsl_mdg_geom = []
for geom, value in hrsl_mdg_band1_poly:
    geom = shapely.geometry.shape(geom)
    hrsl_mdg_geom.append(geom)

In [73]:
hrsl_mdg_gdf = pd.DataFrame(hrsl_mdg_geom)
hrsl_mdg_gdf = gpd.GeoDataFrame(hrsl_mdg_gdf, geometry=hrsl_mdg_gdf[0], crs="EPSG:4326")
hrsl_mdg_gdf.drop(columns=[0], inplace=True)
hrsl_mdg_gdf.reset_index(level=0, inplace=True)

In [74]:
hrsl_mdg_gdf.to_file('../data/hrsl_mdg.gpkg', driver='GPKG')

#### Load OSM dataset

In [75]:
# 2020 OSM File
osm_mdg = gpd.read_file(
    "../download_data/mdg_osm_jan_2020_buildings.gpkg/mdg_osm_jan_2020_buildings.gpkg",
    driver="GPKG",
)

In [28]:
# The 2021 zip file contains many shapefiles for points of interests, landuse, waterways, etc. 
# We are only interested in loading the building footprint data
# osm_mdg = gpd.read_file(
#     "../download_data/madagascar-latest-free.shp/gis_osm_buildings_a_free_1.shp",
#     driver="shp",
# )

In [29]:
osm_mdg.to_file('../data/madagascar-latest-free.gpkg',driver = 'GPKG')

#### Get mapped pixels

In [30]:
# Just run this so you don't have to rerun everything up top
hrsl_mdg_gdf = gpd.read_file('../data/hrsl_mdg.gpkg', driver='GPKG')

In [30]:
# 2021 OSM File
osm_mdg = gpd.read_file('../data/madagascar-latest-free.gpkg',driver = 'GPKG')

In [76]:
mdg_pixels_with_buildings = gpd.sjoin(
    hrsl_mdg_gdf, osm_mdg, how="inner", op="intersects"
)

In [77]:
mdg_pixels_with_buildings = mdg_pixels_with_buildings.drop_duplicates(subset='index')

In [78]:
mdg_pixels_with_buildings.drop(columns=['index_right', 'osm_id', 'code', 'fclass', 'name', 'type'], inplace=True)

In [79]:
# 2020
mdg_pixels_with_buildings.to_file('../data/mdg_pixels_with_buildings_jan2020.gpkg', driver='GPKG')

In [46]:
# 2021
# mdg_pixels_with_buildings.to_file('../data/mdg_pixels_with_buildings.gpkg', driver='GPKG')

#### Get unmapped pixels

In [80]:
mdg_pixels_no_buildings = pd.merge(hrsl_mdg_gdf, mdg_pixels_with_buildings, how='outer', indicator=True)

In [81]:
mdg_pixels_no_buildings = mdg_pixels_no_buildings[mdg_pixels_no_buildings['_merge'] == 'left_only']

In [82]:
mdg_pixels_no_buildings.drop(columns=['_merge'], inplace=True)

In [83]:
# 2020
mdg_pixels_no_buildings.to_file('../data/mdg_pixels_no_buildings_jan2020.gpkg', driver='GPKG')

In [38]:
# 2021
# mdg_pixels_no_buildings.to_file('../data/mdg_pixels_no_buildings.gpkg', driver='GPKG')

#### Calculate percent completeness

In [84]:
len(mdg_pixels_with_buildings) / (len(mdg_pixels_with_buildings) + len(mdg_pixels_no_buildings)) * 100

10.89030131380777

### Aggregate to different admin boundaries

#### Turn mapped pixels from a polygon layer to a point layer

In [8]:
# # 2020
# # Just run this so you don't have to rerun everything up top
# mdg_pixels_with_buildings = gpd.read_file(
#     "../data/mdg_pixels_with_buildings_jan2020.gpkg", driver="GPKG"
# )

In [8]:
# # 2021
# # Just run this so you don't have to rerun everything up top
# mdg_pixels_with_buildings = gpd.read_file(
#     "../data/mdg_pixels_with_buildings.gpkg", driver="GPKG"
# )

In [85]:
mdg_pixels_with_buildings["geometry"] = mdg_pixels_with_buildings["geometry"].centroid


  """Entry point for launching an IPython kernel.


#### Turn unmapped pixels from a polygon layer to a point layer

In [10]:
# # 2020
# # Just run this so you don't have to rerun everything up top
# mdg_pixels_no_buildings = gpd.read_file(
#     "../data/mdg_pixels_no_buildings_jan2020.gpkg", driver="GPKG"
# )

In [10]:
# # 2021
# # Just run this so you don't have to rerun everything up top
# mdg_pixels_no_buildings = gpd.read_file(
#     "../data/mdg_pixels_no_buildings.gpkg", driver="GPKG"
# )

In [86]:
mdg_pixels_no_buildings["geometry"] = mdg_pixels_no_buildings["geometry"].centroid


  """Entry point for launching an IPython kernel.


#### Load level 4 admin boundary

In [12]:
mdg_adm4 = gpd.read_file(
    "../download_data/mdg_adm_all/mdg_admbnda_adm4_BNGRC_OCHA_20181031.shp"
)

In [44]:
mdg_adm4.head()

Unnamed: 0,ADM0_PCODE,ADM0_EN,ADM1_PCODE,ADM1_EN,ADM1_TYPE,ADM2_PCODE,ADM2_EN,ADM2_TYPE,ADM3_PCODE,ADM3_EN,ADM3_TYPE,ADM4_PCODE,ADM4_EN,ADM4_TYPE,PROV_CODE_,OLD_PROVIN,PROV_TYPE,NOTES,SOURCE,geometry
0,MG,Madagascar,MG22,Amoron I Mania,Region,MG22203,Ambositra,District,MG22203090,Andina,Commune,MG22203090008,Ampasina,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((47.10540 -20.51447, 47.11485 -20.514..."
1,MG,Madagascar,MG22,Amoron I Mania,Region,MG22203,Ambositra,District,MG22203090,Andina,Commune,MG22203090011,Antanifotsy,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((47.13580 -20.54133, 47.13600 -20.544..."
2,MG,Madagascar,MG24,Ihorombe,Region,MG24216,Ihosy,District,MG24216192,Andiolava,Commune,MG24216192006,Amboloando,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((45.58350 -22.53003, 45.59102 -22.534..."
3,MG,Madagascar,MG24,Ihorombe,Region,MG24216,Ihosy,District,MG24216192,Andiolava,Commune,MG24216192002,Vatambe Nanarena,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((45.51825 -22.27433, 45.51900 -22.274..."
4,MG,Madagascar,MG24,Ihorombe,Region,MG24216,Ihosy,District,MG24216192,Andiolava,Commune,MG24216192003,Vohimary,Fokontany,2,Fianarantsoa,Old Provinces/Faritany dissolved in 2007,,BNGRC (National Disaster Management Office) Fo...,"POLYGON ((45.62246 -22.40084, 45.62173 -22.414..."


In [87]:
mdg_adm4.nunique()

ADM0_PCODE        1
ADM0_EN           1
ADM1_PCODE       22
ADM1_EN          22
ADM1_TYPE         1
ADM2_PCODE      119
ADM2_EN         119
ADM2_TYPE         1
ADM3_PCODE     1579
ADM3_EN        1429
ADM3_TYPE         1
ADM4_PCODE    17465
ADM4_EN       10977
ADM4_TYPE         1
PROV_CODE_        6
OLD_PROVIN        6
PROV_TYPE         1
NOTES             1
SOURCE           10
geometry      17465
dtype: int64

Note: Unlike the PH boundaries, Madagascar has unique PCODES for each ADM4 region, so no need to create a unique index for level 4

#### Find intersection of mapped pixels and level 4 admin boundary

In [88]:
mdg_pixels_with_buildings_sjoin_adm4 = gpd.sjoin(
    mdg_pixels_with_buildings, mdg_adm4, how="left", op="within"
)

In [89]:
mdg_pixels_with_buildings_sjoin_adm4.drop(
    columns=[
        'index_right', 
        'ADM0_PCODE',
        'ADM0_EN',
        'ADM1_EN',
        'ADM1_TYPE',
        'ADM2_EN',
        'ADM2_TYPE',
        'ADM3_EN',
        'ADM3_TYPE',
        'ADM4_EN',
        'ADM4_TYPE',
        'PROV_CODE_',
        'OLD_PROVIN',
        'PROV_TYPE',
        'NOTES',
        'SOURCE'
    ],
    inplace=True,
)

In [90]:
mdg_pixels_with_buildings_sjoin_adm4.head()

Unnamed: 0,index,geometry,ADM1_PCODE,ADM2_PCODE,ADM3_PCODE,ADM4_PCODE
1,1,POINT (49.27167 -11.95611),MG71,MG71713,MG71713170,MG71713170002
5,5,POINT (49.25278 -11.99111),MG71,MG71713,MG71713170,MG71713170002
6,6,POINT (49.22861 -12.04611),MG71,MG71713,MG71713170,MG71713170001
7,7,POINT (49.22861 -12.04639),MG71,MG71713,MG71713170,MG71713170001
9,9,POINT (49.23278 -12.05417),MG71,MG71713,MG71713170,MG71713170001


In [91]:
# 2020
mdg_pixels_with_buildings_sjoin_adm4.to_file(
    "../data/mdg_pixels_with_buildings_sjoin_adm4_jan2020.gpkg", driver="GPKG"
)

In [26]:
# # 2021
# mdg_pixels_with_buildings_sjoin_adm4.to_file(
#     "../data/mdg_pixels_with_buildings_sjoin_adm4.gpkg", driver="GPKG"
# )

#### Find intersection of unmapped pixels and level 4 admin boundary

In [92]:
mdg_pixels_no_buildings_sjoin_adm4 = gpd.sjoin(
    mdg_pixels_no_buildings, mdg_adm4, how="left", op="within"
)

In [93]:
mdg_pixels_no_buildings_sjoin_adm4.drop(
    columns=[
        'index_right', 
        'ADM0_PCODE',
        'ADM0_EN',
        'ADM1_EN',
        'ADM1_TYPE',
        'ADM2_EN',
        'ADM2_TYPE',
        'ADM3_EN',
        'ADM3_TYPE',
        'ADM4_EN',
        'ADM4_TYPE',
        'PROV_CODE_',
        'OLD_PROVIN',
        'PROV_TYPE',
        'NOTES',
        'SOURCE'
    ],
    inplace=True,
)

In [94]:
mdg_pixels_no_buildings_sjoin_adm4.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 1411110 entries, 0 to 1583564
Data columns (total 6 columns):
 #   Column      Non-Null Count    Dtype   
---  ------      --------------    -----   
 0   index       1411110 non-null  int64   
 1   geometry    1411110 non-null  geometry
 2   ADM1_PCODE  1410004 non-null  object  
 3   ADM2_PCODE  1410004 non-null  object  
 4   ADM3_PCODE  1410004 non-null  object  
 5   ADM4_PCODE  1410004 non-null  object  
dtypes: geometry(1), int64(1), object(4)
memory usage: 75.4+ MB


In [95]:
# # 2020
mdg_pixels_no_buildings_sjoin_adm4.to_file(
    "../data/mdg_pixels_no_buildings_sjoin_adm4_jan2020.gpkg", driver="GPKG"
)

In [27]:
# # 2021
# mdg_pixels_no_buildings_sjoin_adm4.to_file(
#     "../data/mdg_pixels_no_buildings_sjoin_adm4.gpkg", driver="GPKG"
# )

### Saving final output

In [11]:
# # 2020
# # Just run this so you don't have to rerun everything up top
mdg_pixels_no_buildings_sjoin_adm4 = gpd.read_file(
    "../data/mdg_pixels_no_buildings_sjoin_adm4_jan2020.gpkg", driver="GPKG"
)
mdg_pixels_with_buildings_sjoin_adm4 = gpd.read_file(
    "../data/mdg_pixels_with_buildings_sjoin_adm4_jan2020.gpkg", driver="GPKG"
)

In [12]:
# Drop nan values in the dataframe
mdg_pixels_no_buildings_sjoin_adm4.dropna(
    subset=["ADM4_PCODE","ADM3_PCODE","ADM2_PCODE","ADM1_PCODE"], inplace=True
)

mdg_pixels_with_buildings_sjoin_adm4.dropna(
    subset=["ADM4_PCODE","ADM3_PCODE","ADM2_PCODE","ADM1_PCODE"], inplace=True
)

In [13]:
# Adding a column to indicate that these pixels are mapped/unmapped
mdg_pixels_with_buildings_sjoin_adm4["status"] = "mapped"
mdg_pixels_no_buildings_sjoin_adm4["status"] = "unmapped"

In [14]:
# Join the mapped and unppaed pixels together
mdg_pixels_all = pd.concat(
    [mdg_pixels_with_buildings_sjoin_adm4, mdg_pixels_no_buildings_sjoin_adm4]
)

Final output for Madagascar should be as follows:

In [15]:
mdg_pixels_all.head()

Unnamed: 0,index,ADM1_PCODE,ADM2_PCODE,ADM3_PCODE,ADM4_PCODE,geometry,status
0,1,MG71,MG71713,MG71713170,MG71713170002,POINT (49.27167 -11.95611),mapped
1,5,MG71,MG71713,MG71713170,MG71713170002,POINT (49.25278 -11.99111),mapped
2,6,MG71,MG71713,MG71713170,MG71713170001,POINT (49.22861 -12.04611),mapped
3,7,MG71,MG71713,MG71713170,MG71713170001,POINT (49.22861 -12.04639),mapped
4,9,MG71,MG71713,MG71713170,MG71713170001,POINT (49.23278 -12.05417),mapped


In [16]:
# 2020
# Save all columns (including geometry) to gkpg
mdg_pixels_all.to_file("../data/mdg_pixels_all_jan2020.gpkg", driver="GPKG")

In [17]:
# 2020
# Drop geometry column
mdg_pixels_all.drop(columns=["geometry"], inplace=True)

In [18]:
# 2020
# Save remaining columns as csv 
mdg_pixels_all.to_csv("../data/mdg_pixels_all_jan2020.csv", index=False)

In [20]:
! gsutil cp ~/osm-completeness/data/mdg_pixels_all.gpkg gs://tm-ardie/2021-07-13-osm-completeness-madagascar/

Copying file:///home/jupyter/osm-completeness/data/mdg_pixels_all.gpkg [Content-Type=application/octet-stream]...
==> NOTE: You are uploading one or more large file(s), which would run          
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this large files will
be uploaded as `composite objects
<https://cloud.google.com/storage/docs/composite-objects>`_,which
means that any user who downloads such objects will need to have a
compiled crcmod installed (see "gsutil help crcmod"). This is because
without a compiled crcmod, computing checksums on composite objects is
so slow that gsutil disables downloads of composite objects.

| [1 files][222.1 MiB/222.1 MiB]                                                
Operation completed over 1 objects/222.1 MiB.                                    


In [22]:
test_gdf = gpd.read_file("../data/mdg_pixels_all.gpkg",driver='GPKG')
test_gdf.head()

Unnamed: 0,index,ADM1_PCODE,ADM2_PCODE,ADM3_PCODE,ADM4_PCODE,status,geometry
0,1,MG71,MG71713,MG71713170,MG71713170002,mapped,POINT (49.27167 -11.95611)
1,5,MG71,MG71713,MG71713170,MG71713170002,mapped,POINT (49.25278 -11.99111)
2,6,MG71,MG71713,MG71713170,MG71713170001,mapped,POINT (49.22861 -12.04611)
3,7,MG71,MG71713,MG71713170,MG71713170001,mapped,POINT (49.22861 -12.04639)
4,9,MG71,MG71713,MG71713170,MG71713170001,mapped,POINT (49.23278 -12.05417)


In [62]:
# 2021
# Save all columns (including geometry) to gkpg
# mdg_pixels_all.to_file("../data/mdg_pixels_all.gpkg", driver="GPKG")

In [63]:
# 2021
# Drop geometry column
# mdg_pixels_all.drop(columns=["geometry"], inplace=True)

In [64]:
# 2021
# Save remaining columns as csv 
# mdg_pixels_all.to_csv("../data/mdg_pixels_all.csv", index=False)

In [None]:
1+1