As much as the Relative Wealth Index is an effective method of measuring wealth for low and middle income countries at a very granular level, it's current structure is only tied to the country's overall wealth; it's also important to note that more populated poor areas should be highlighted more than others, but we should also take into account that different areas bear different population densities; we wouldn't want to incorrectly label an area as low-density if it's the most populated in a low populated, smaller area of the nation.
This short notebook allows to get a better perspective of a microarea's wealth relative to a more specific area that entiles it.
The code is very heavily inspired by META's approach on the matter https://dataforgood.facebook.com/dfg/docs/tutorial-calculating-population-weighted-relative-wealth-index, and it should in no way be implied to be the work of the notebook's author

In [1]:
import pandas as pd 
import geopandas as gpd 
from shapely.geometry import Point 
from shapely.geometry.polygon import Polygon 
from pyquadkey2 import quadkey
import os
import fsspec

The code aims to give weighted RWI scores based to a certain granularity within a nation (going from gdi = 0, which usually has regions as the administrative unit, towards more granular units as the gdi increases).
The code requires the original RWI outputs for the given nation, the nation's population density estimates (both available here https://data.humdata.org/dataset), and a document containing all the administrative areas of the nation (available here https://gadm.org/data.html). It should be noted that the given data of the administrative areas is a .zip file containing multiple shapefiles of various partitionings of the country; it is required to only take the files with the higher number, as it contains the highest granularity possible, while also keeping the info about the lower level granularities.

The functions reads the administrative division files (at the granular level desired) and creates a geodataframe containing the different administrative areas that divide the country and their geometries; after that, every RWI estimate is joint to its administrative area (worth noting that if no area is found, the estimate is dropped from calculations). Then, the quadkey14 tile population value is estimated via the sum of population of all the points in the tile. The population value of an administrative unit is calculated by joining population and RWI datasets and taking the sum of all quadkey tiles in the administrative unit. In the end, we can calculate a tile's population weight over the administrative area, and consequentially calculate the weighted RWI as the original RWI times the tile's population weight.

In [6]:
def get_point_in_polygon(lat, lon, polygons):
    """ @param lat: double @param lon: double @param polygons: dict @return geo_id: str """ 
    point = Point(lon, lat) 
    for geo_id in polygons: 
        polygon = polygons[geo_id] 
        if polygon.contains(point): 
            return geo_id 
            
    return 'null'

In [7]:
def rwi_pond(country: str, gid: int):
    os.chdir("C:\\Users\\Luca\\Downloads")
    path = f'gadm41_{country}_shp.zip'
    with fsspec.open(path) as file:
        shapefile = gpd.read_file(file) 
    
    polygons = dict(zip(shapefile[f'GID_{gid}'], shapefile['geometry'])) 
    print(f'shapefile shape: {shapefile.shape}') 

    os.chdir("C:\\Users\\Luca\\Downloads\\Relative_Wealth_Index")
    rwi = pd.read_csv(f'{country}_relative_wealth_index.csv')

    rwi['geo_id'] = rwi.apply(lambda x: get_point_in_polygon(x['latitude'], x['longitude'], polygons), axis=1) 
    rwi = rwi[rwi['geo_id'] != 'null'] 
    rwi = rwi.astype({'quadkey': str})
    print(f'rwi shape: {rwi.shape}')

    population = pd.read_csv(f"C:\\Users\\Luca\\Downloads\\{country.lower()}_general_2020.csv")
    population = population.rename(columns={f'{country.lower()}_general_2020': 'pop_2020'})

    population['quadkey'] = population.apply(lambda x: str(quadkey.from_geo((x['latitude'], x['longitude']), 14)), axis=1)
    print(f'population shape: {population.shape}')

    bing_tile_z14_pop = population.groupby('quadkey', as_index=False)['pop_2020'].sum()
    
    rwi_pop = rwi.merge(bing_tile_z14_pop, on='quadkey', how='inner')
    
    geo_pop = rwi_pop.groupby('geo_id', as_index=False)['pop_2020'].sum()
    geo_pop = geo_pop.rename(columns={'pop_2020': 'geo_2020'})
    
    rwi_pop = rwi_pop.merge(geo_pop, on='geo_id', how='inner')

    rwi_pop['pop_weight'] = rwi_pop['pop_2020'] / rwi_pop['geo_2020']
    rwi_pop['rwi_weight'] = rwi_pop['rwi'] * rwi_pop['pop_weight']

    geo_rwi = rwi_pop.groupby('geo_id', as_index=False)['rwi_weight'].sum()
    rwi = rwi.merge(geo_rwi, on = 'geo_id', how = 'left')

    return rwi

To call the function, refer to a nation by its 3 letters ISO code in uppercase, and input the desired granularity level. The output is a dataset containing both the initial RWI estimate and its weighted estimate. 
Below, an example for the country of Albania. We can see, for example, that the third tile seems poorer than the first, but after weighting for the population density, the first tile is relatively poorer given how it has a higher density of people relative to its administrative area then the third tile.

In [8]:
albania = rwi_pond('ALB', 2)
print(albania.shape)
albania.head()

shapefile shape: (378, 17)
rwi shape: (344, 6)
population shape: (2356844, 4)
(344, 7)


Unnamed: 0,quadkey,latitude,longitude,rwi,error,geo_id,rwi_weight
0,12023332300002,41.483891,20.401611,-0.4,0.453,ALB.2.1_1,-0.172335
1,12023321313221,42.317939,19.544678,0.11,0.465,ALB.10.1_1,-0.005419
2,12201110021131,40.672306,20.028076,-0.549,0.431,ALB.1.1_1,-0.093296
3,12201110130133,40.655638,20.906982,-0.407,0.471,ALB.7.1_1,0.116587
4,12201110021331,40.605612,20.028076,-0.802,0.44,ALB.1.1_1,-0.093296
