# Launch INFRA SAP
The World Bank's Geospatial Operational Support team, in collaboration with the Infrastructure Chief Economist's office have developed a diagnostic toolkit for assessing the state of infrastucture in a country through an assessment of infrastructure, access, connectivity, and commodity flows.

The purpose of this notebook is to launch the data preparation step of the INFRA SAP toolkit. It is principally designed to integrate with the GOST team's high compute cluster, but has been made as flexible as possible to facilitate replication. The steps in data processing require the following input:
1. Administrative boundaries of interest (defines total extent of analysis and level of aggregation)
2. Country ISO3 code

Based on these basic datasets we will extract the following datasets **these steps are particular to the World Bank's data schema, but can be directly supplied to later functions if necessary**

1. Open Street Map
2. WorldPop 2020 gridded population data
3. International airports (from OSM)
4. Major ports (from OSM)
5. Official Border Crossings (from ???)

With these data either extracted or processed we run the following analyses

1. Calculate urban and rural following the GURBA process - LINK
2. Attempt to identify/name urban areas
3. (optional) Re-sample population to 1km

Following these data preparation steps a sanity check should be performed on the extracted data to ensure major POIs are not missed and that all data have been properly extracted

In [7]:
import sys, os, importlib
import rasterio

import geopandas as gpd
import pandas as pd

sys.path.append("../")

import infrasap.osm_extractor as osm

In [15]:
iso3 = "ARG"
base_out = r"J:\Data\PROJECTS\INFRA_SAP"
out_folder = os.path.join(base_out, iso3)
if not os.path.exists(out_folder):
    os.makedirs(out_folder)

# select out admin2 from global boundaries dataset
global_admin2 = r"R:\GLOBAL\ADMIN\Official Bank Borders\Polygons\Admin2_Polys.shp"
focal_admin2 = os.path.join(out_folder, "admin.shp")
if not os.path.exists(focal_admin2):
    in_bounds = gpd.read_file(global_admin2)
    out_bounds = in_bounds.loc[in_bounds['ISO3'] == iso3]
    out_bounds = out_bounds.to_crs({'init':'epsg:4326'})
    out_bounds.to_file(focal_admin2)
    
# define other base data
global_osm = "/home/public/Data/GLOBAL/OSM/GLOBAL/planet-latest.osm.pbf"
focal_osm = os.path.join(base_out, "national_complete.osm.pbf")
if not os.path.exists(focal_osm):
    extractor = osm.osmExtraction(osmosisCmd = "/home/wb411133/Code/Osmosis/bin/osmosis", tempFile = "/home/wb411133/temp/temp_execution.bat")
    print(extractor.extractBoundingBox(global_osm, focal_admin2, focal_osm, execute=False))

KeyError: 3

In [None]:
wp_dataset = ""