Temporary notebook to run and manually test the OSM feature generation function.

PHL cluster-level data is taken from [GDrive](https://drive.google.com/drive/u/0/folders/1N9vSFX05bDWGolfsRjTTBUxGtuVP-77s).

In [1]:
%reload_ext autoreload
%autoreload 2

In [2]:
import sys
sys.path.append("../../../")

from povertymapping import osm

import geopandas as gpd
import pandas as pd

from povertymapping import settings

Load in temporary data for PHL clusters

In [3]:
# Load ground truth data as a DataFrame first
GROUND_TRUTH_CSV = settings.DATA_DIR/"phl_dhs_cluster_level.csv"
df = pd.read_csv(GROUND_TRUTH_CSV)

# Some of the coordinates in the data are invalid. This filters them out.
df = df[(df.longitude>0)&(df.latitude>0)]

# Create a GeoDataFrame from the longitude, latitude columns.
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude), crs="epsg:4326")

print(f"There are {len(gdf):,} clusters.")
gdf.head()

There are 1,213 clusters.


Unnamed: 0,DHSCLUST,Wealth Index,DHSID,longitude,latitude,geometry
0,1,-31881.6087,PH201700000001,122.109807,6.674652,POINT (122.10981 6.67465)
1,2,-2855.375,PH201700000002,122.132027,6.662256,POINT (122.13203 6.66226)
2,3,-57647.04762,PH201700000003,122.179496,6.621822,POINT (122.17950 6.62182)
3,4,-54952.66667,PH201700000004,122.137965,6.485298,POINT (122.13796 6.48530)
5,6,-80701.69565,PH201700000006,121.916094,6.629457,POINT (121.91609 6.62946)


Try the OSM feature generation function.

In [4]:
%%time
gdf_with_features = osm.add_osm_features(gdf, "philippines", cache_dir=settings.CACHE_DIR)
gdf_with_features.head()

2023-01-06 17:09:49.588 | INFO     | povertymapping.osm:download_osm_country_data:86 - OSM Data: Cached data available for philippines at /home/alron/unicef-ai4d-poverty-mapping/data/data_cache/osm/philippines? True


CPU times: user 6.62 s, sys: 198 ms, total: 6.82 s
Wall time: 6.82 s


Unnamed: 0,DHSCLUST,Wealth Index,DHSID,longitude,latitude,geometry,poi_count,restaurant_count,restaurant_nearest,school_count,school_nearest,bank_count,bank_nearest,supermarket_count,supermarket_nearest,mall_count,mall_nearest,atm_count,atm_nearest
0,1,-31881.6087,PH201700000001,122.109807,6.674652,POINT (122.10981 6.67465),0.0,0.0,5151.143562,0.0,3392.458241,0.0,3762.475794,0.0,999999.0,0.0,999999.0,0.0,999999.0
1,2,-2855.375,PH201700000002,122.132027,6.662256,POINT (122.13203 6.66226),0.0,0.0,7787.461326,0.0,5600.808072,0.0,965.901202,0.0,999999.0,0.0,999999.0,0.0,999999.0
2,3,-57647.04762,PH201700000003,122.179496,6.621822,POINT (122.17950 6.62182),0.0,0.0,999999.0,0.0,2641.308143,0.0,6025.856477,0.0,999999.0,0.0,999999.0,0.0,999999.0
3,4,-54952.66667,PH201700000004,122.137965,6.485298,POINT (122.13796 6.48530),0.0,0.0,999999.0,0.0,4287.469155,0.0,999999.0,0.0,999999.0,0.0,999999.0,0.0,999999.0
5,6,-80701.69565,PH201700000006,121.916094,6.629457,POINT (121.91609 6.62946),0.0,0.0,999999.0,0.0,5930.52947,0.0,999999.0,0.0,999999.0,0.0,999999.0,0.0,999999.0


Sanity check that the internal helper function can successfully download and cache OSM data for the DHS countries.

In [5]:
countries = [
    "philippines", "cambodia", "east-timor", "myanmar"
]

for country in countries:
    osm.download_osm_country_data(country, cache_dir=settings.CACHE_DIR)

2023-01-06 17:09:56.519 | INFO     | povertymapping.osm:download_osm_country_data:86 - OSM Data: Cached data available for philippines at /home/alron/unicef-ai4d-poverty-mapping/data/data_cache/osm/philippines? True
2023-01-06 17:09:56.521 | INFO     | povertymapping.osm:download_osm_country_data:86 - OSM Data: Cached data available for cambodia at /home/alron/unicef-ai4d-poverty-mapping/data/data_cache/osm/cambodia? True
2023-01-06 17:09:56.523 | INFO     | povertymapping.osm:download_osm_country_data:86 - OSM Data: Cached data available for east-timor at /home/alron/unicef-ai4d-poverty-mapping/data/data_cache/osm/east-timor? True
2023-01-06 17:09:56.525 | INFO     | povertymapping.osm:download_osm_country_data:86 - OSM Data: Cached data available for myanmar at /home/alron/unicef-ai4d-poverty-mapping/data/data_cache/osm/myanmar? True
