Relative Wealth Index is an optimal way to assess a country's internal wealth distribution across areas, given the country falls below the low or middle-income category. Since the RWI is asset-based, rather than using measures more akin to high-income countries, it both means the model isn't very accurate for the higher income countries, and there can not be confrontation across countries (as one area's asset wealth is relative to the countries average asset wealth).
To allow both confrontations between countries and a way to more accurately predict a high-income country's wealth distribution, we use the Absolute Wealth Estimate (AWE), which estimates a per-capita GDP, based on the country's own average GDP, it's Gini Index as a mean of dispersion, and an inverse cumulative distribution of wealth (known as ICDF) based on a mixture of Pareto and log-normal distribution, as per Hruschka's proposed model (https://keep-dev.lib.asu.edu/system/files/c160/Hruschka_2015.pdf).
Below we will illustrate the conversion process of two different countries, using separate distributions of Pareto and log-normal. It's important to know that the estimates are unreliable if the country's true wealth distribution is particularly distant to the proposed distribution.

We take the RWI data for the country of Albania, derived from META's original work.

In [1]:
import pandas as pd
import os

file = "C:\\Users\\Luca\\Downloads\\ALB_relative_wealth_index.csv"

data = pd.read_csv(file)
data

Unnamed: 0,quadkey,latitude,longitude,rwi,error
0,12023330223311,42.090069,20.028076,-0.461,0.400
1,12201110020221,40.605612,19.720459,-0.251,0.453
2,12201110130222,40.588928,20.753174,-0.030,0.468
3,12201110010212,40.888600,20.093994,-0.566,0.434
4,12023332022323,41.516803,19.808350,-0.076,0.481
...,...,...,...,...,...
3497,12023332031011,41.763117,20.291748,-0.322,0.422
3498,12201110223131,40.002371,20.028076,-0.646,0.381
3499,12023332023330,41.533254,20.006103,0.082,0.459
3500,12023332322133,41.054501,20.555420,0.192,0.482


In [2]:
data.rwi.describe()

count    3502.000000
mean       -0.031131
std         0.442059
min        -1.255000
25%        -0.354750
50%        -0.082000
75%         0.241750
max         1.866000
Name: rwi, dtype: float64

The AWE uses a rank system, with the AWE increasing as the rank gets higher; we need to sort the quadkey 14 tiles by their rwi and assign the highest rank to the (relative) wealthiest tile.

In [3]:
data = data.sort_values(by = 'rwi', ascending = False).reset_index()
data.drop('index', axis = 1, inplace = True)
data

Unnamed: 0,quadkey,latitude,longitude,rwi,error
0,12023332202311,41.302571,19.852295,1.866,0.503
1,12201110021011,40.705627,19.940185,1.788,0.470
2,12023323113213,41.812267,19.588623,1.771,0.583
3,12023332202113,41.352072,19.852295,1.628,0.540
4,12023332202100,41.368564,19.786377,1.589,0.532
...,...,...,...,...,...
3497,12023332201113,41.483891,20.028076,-1.004,0.409
3498,12201101130010,40.705627,19.390869,-1.066,0.472
3499,12023332233110,41.104190,20.357666,-1.132,0.433
3500,12023332033320,41.533254,20.313721,-1.142,0.484


In [4]:
import numpy as np
rank = np.linspace(len(data),1, num = len(data))
rank = [int(r) for r in rank]

In [5]:
data['rank'] = rank

In [6]:
data

Unnamed: 0,quadkey,latitude,longitude,rwi,error,rank
0,12023332202311,41.302571,19.852295,1.866,0.503,3502
1,12201110021011,40.705627,19.940185,1.788,0.470,3501
2,12023323113213,41.812267,19.588623,1.771,0.583,3500
3,12023332202113,41.352072,19.852295,1.628,0.540,3499
4,12023332202100,41.368564,19.786377,1.589,0.532,3498
...,...,...,...,...,...,...
3497,12023332201113,41.483891,20.028076,-1.004,0.409,5
3498,12201101130010,40.705627,19.390869,-1.066,0.472,4
3499,12023332233110,41.104190,20.357666,-1.132,0.433,3
3500,12023332033320,41.533254,20.313721,-1.142,0.484,2


We now need the average GDP of Albania, as a mean of central tendency, and its Gini Index, as a mean of dispersion. The data is derived ffrom the World Bank (https://data.worldbank.org/indicator/ny.gdp.pcap.cd).

In [8]:
GDP = 6810.1
Gini = 0.294

Our first choice is to use the Pareto's distribution to calculate the ICDF of Albania. Pareto's distribution has a shape parameter alpha (equal to the ratio of Gini + 1 and double the Gini's index) and a scale parameter threshold (equal to 1 - the inverse of alpha).
In the calculation we will use the average ICDF value.

In [9]:
import numpy as np
from scipy.stats import pareto

# Definire i parametri della distribuzione di Pareto
alpha = (1 + Gini)/(2*Gini)  # parametro di forma
threshold = 1 - (1/alpha)  # parametro di scala, di default è 1

print(f'alpha: {alpha}, scale: {threshold}')
print('-'*100)
# Valori di quantili
quantiles = np.linspace(0,0.99, num = len(data))

# Calcolo dell'ICDF
icdf_values = pareto.ppf(quantiles, alpha, scale=threshold)

print("ICDF values at quantiles:", icdf_values)

alpha: 2.2006802721088436, scale: 0.545595054095827
----------------------------------------------------------------------------------------------------
ICDF values at quantiles: [0.54559505 0.54566517 0.54573532 ... 4.31343813 4.36694071 4.42262704]


In [10]:
mean_icdf = 0

for icdf in icdf_values:
    mean_icdf += icdf
mean_icdf

3252.192667518527

The constant in the equation is thus the ratio of the average GDP per the average ICDF value, which will then be multiplied by the tile's own rank.

In [11]:
GDP / mean_icdf

2.094002630291954

In [12]:
data['awe_pareto'] = data.apply(lambda x: x['rank'] * GDP/mean_icdf, axis = 1)

In [13]:
data

Unnamed: 0,quadkey,latitude,longitude,rwi,error,rank,awe_pareto
0,12023332202311,41.302571,19.852295,1.866,0.503,3502,7333.197211
1,12201110021011,40.705627,19.940185,1.788,0.470,3501,7331.103209
2,12023323113213,41.812267,19.588623,1.771,0.583,3500,7329.009206
3,12023332202113,41.352072,19.852295,1.628,0.540,3499,7326.915203
4,12023332202100,41.368564,19.786377,1.589,0.532,3498,7324.821201
...,...,...,...,...,...,...,...
3497,12023332201113,41.483891,20.028076,-1.004,0.409,5,10.470013
3498,12201101130010,40.705627,19.390869,-1.066,0.472,4,8.376011
3499,12023332233110,41.104190,20.357666,-1.132,0.433,3,6.282008
3500,12023332033320,41.533254,20.313721,-1.142,0.484,2,4.188005


In [14]:
data.awe_pareto.describe()

count    3502.000000
mean     3667.645607
std      2117.213914
min         2.094003
25%      1834.869805
50%      3667.645607
75%      5500.421409
max      7333.197211
Name: awe_pareto, dtype: float64

We now repeat the process using the log-normal distribution to define the ICDF. The distribution is based on a normal distribution with the paremeters sigma and mu representing the standard deviation and the mean of the distribution, with sigma being dependent of a probit function on the Gini Index and mu being dependent of both sigma and the GDP.

In [19]:
import numpy as np
from scipy.stats import lognorm
import scipy.stats as stats

# Parametri della distribuzione lognormale
sigma = np.sqrt(2) * stats.norm.ppf((Gini +1)/2)  # sigma dipende da una funzione probit
mean = np.log(GDP) - ((sigma**2) / 2)  # parametro di scala (exp della media del logaritmo della variabile)

print(f'sigma: {sigma}, mean: {mean}')
print('-'*100)
# Calcolo dell'ICDF
icdf_values_logn = lognorm.ppf(quantiles, sigma, scale=mean)

print("ICDF values at quantiles:", icdf_values_logn)

sigma: 0.5334888970076086, mean: 8.68385688170836
----------------------------------------------------------------------------------------------------
ICDF values at quantiles: [ 0.          1.38016036  1.52895181 ... 29.71035924 29.87299505
 30.04050184]


In [20]:
mean_icdf_logn = 0

for icdf in icdf_values_logn:
    mean_icdf_logn += icdf
mean_icdf_logn

34127.867938395

In [21]:
data['awe_logn'] = data.apply(lambda x: x['rank'] * GDP/mean_icdf_logn, axis = 1)

In [22]:
data

Unnamed: 0,quadkey,latitude,longitude,rwi,error,rank,awe_pareto,awe_logn
0,12023332202311,41.302571,19.852295,1.866,0.503,3502,7333.197211,698.812192
1,12201110021011,40.705627,19.940185,1.788,0.470,3501,7331.103209,698.612645
2,12023323113213,41.812267,19.588623,1.771,0.583,3500,7329.009206,698.413099
3,12023332202113,41.352072,19.852295,1.628,0.540,3499,7326.915203,698.213552
4,12023332202100,41.368564,19.786377,1.589,0.532,3498,7324.821201,698.014006
...,...,...,...,...,...,...,...,...
3497,12023332201113,41.483891,20.028076,-1.004,0.409,5,10.470013,0.997733
3498,12201101130010,40.705627,19.390869,-1.066,0.472,4,8.376011,0.798186
3499,12023332233110,41.104190,20.357666,-1.132,0.433,3,6.282008,0.598640
3500,12023332033320,41.533254,20.313721,-1.142,0.484,2,4.188005,0.399093


The log-normal estimates are around 10 times smaller than the pareto ones. We choose to use the Pareto estimates.

Below is illustrated the same process for the country of Angola

In [23]:
ago = pd.read_csv("C:\\Users\\Luca\\Downloads\\AGO_relative_wealth_index.csv")
ago.rwi.describe()

count    29611.000000
mean        -0.410375
std          0.313908
min         -1.462000
25%         -0.605000
50%         -0.443000
75%         -0.263000
max          1.905000
Name: rwi, dtype: float64

In [24]:
ago = ago.sort_values(by = 'rwi', ascending = False).reset_index()
ago.drop('index', axis = 1, inplace = True)
ago

Unnamed: 0,latitude,longitude,rwi,error
0,-9.069551,13.414307,1.905,0.556
1,-8.939340,13.260498,1.902,0.540
2,-9.546583,16.358643,1.895,0.529
3,-8.961045,13.304443,1.856,0.515
4,-8.787368,13.326416,1.762,0.559
...,...,...,...,...
29606,-16.983248,16.182861,-1.338,0.357
29607,-16.983248,16.226807,-1.338,0.363
29608,-16.119708,12.843018,-1.407,0.356
29609,-15.125159,14.117432,-1.423,0.420


In [25]:
rank_ago = np.linspace(len(ago),1, num = len(ago))
rank_ago = [int(r) for r in rank_ago]

In [26]:
ago['rank'] = rank_ago

In [27]:
GDP_ago = 3000.4
Gini_ago = 0.513

In [28]:
import numpy as np
from scipy.stats import pareto

# Definire i parametri della distribuzione di Pareto
alpha = (1 + Gini_ago)/(2*Gini_ago)  # parametro di forma
threshold = 1 - (1/alpha)  # parametro di scala, di default è 1

print(f'alpha: {alpha}, scale: {threshold}')
print('-'*100)
# Valori di quantili
quantiles = np.linspace(0,0.99, num = len(ago))

# Calcolo dell'ICDF
icdf_values = pareto.ppf(quantiles, alpha, scale=threshold)

print("ICDF values at quantiles:", icdf_values)

alpha: 1.4746588693957114, scale: 0.3218770654329147
----------------------------------------------------------------------------------------------------
ICDF values at quantiles: [0.32187707 0.32188436 0.32189166 ... 7.27734766 7.29378366 7.31031182]


In [29]:
mean_icdf = 0

for icdf in icdf_values:
    mean_icdf += icdf
mean_icdf

23120.104410490305

In [30]:
ago['awe_pareto'] = ago.apply(lambda x: x['rank'] * GDP_ago/mean_icdf, axis = 1)
ago

Unnamed: 0,latitude,longitude,rwi,error,rank,awe_pareto
0,-9.069551,13.414307,1.905,0.556,29611,3842.752733
1,-8.939340,13.260498,1.902,0.540,29610,3842.622958
2,-9.546583,16.358643,1.895,0.529,29609,3842.493184
3,-8.961045,13.304443,1.856,0.515,29608,3842.363409
4,-8.787368,13.326416,1.762,0.559,29607,3842.233635
...,...,...,...,...,...,...
29606,-16.983248,16.182861,-1.338,0.357,5,0.648873
29607,-16.983248,16.226807,-1.338,0.363,4,0.519098
29608,-16.119708,12.843018,-1.407,0.356,3,0.389324
29609,-15.125159,14.117432,-1.423,0.420,2,0.259549


In [31]:
import numpy as np
from scipy.stats import lognorm
import scipy.stats as stats

# Parametri della distribuzione lognormale
sigma = np.sqrt(2) * stats.norm.ppf((Gini_ago +1)/2)  # sigma dipende da una funzione probit
mean = np.log(GDP_ago) - ((sigma**2) / 2)  # parametro di scala (exp della media del logaritmo della variabile)

print(f'sigma: {sigma}, mean: {mean}')
print('-'*100)
# Calcolo dell'ICDF
icdf_values_logn = lognorm.ppf(quantiles, sigma, scale=mean)

print("ICDF values at quantiles:", icdf_values_logn)

sigma: 0.9830032254456962, mean: 7.52335322147716
----------------------------------------------------------------------------------------------------
ICDF values at quantiles: [ 0.          0.14936267  0.17612882 ... 73.87394949 73.96470681
 74.05584045]


In [32]:
mean_icdf_logn = 0

for icdf in icdf_values_logn:
    mean_icdf_logn += icdf
mean_icdf_logn

332149.10661319335

In [33]:
ago['awe_logn'] = ago.apply(lambda x: x['rank'] * GDP_ago/mean_icdf_logn, axis = 1)
ago

Unnamed: 0,latitude,longitude,rwi,error,rank,awe_pareto,awe_logn
0,-9.069551,13.414307,1.905,0.556,29611,3842.752733,267.484821
1,-8.939340,13.260498,1.902,0.540,29610,3842.622958,267.475788
2,-9.546583,16.358643,1.895,0.529,29609,3842.493184,267.466755
3,-8.961045,13.304443,1.856,0.515,29608,3842.363409,267.457721
4,-8.787368,13.326416,1.762,0.559,29607,3842.233635,267.448688
...,...,...,...,...,...,...,...
29606,-16.983248,16.182861,-1.338,0.357,5,0.648873,0.045166
29607,-16.983248,16.226807,-1.338,0.363,4,0.519098,0.036133
29608,-16.119708,12.843018,-1.407,0.356,3,0.389324,0.027100
29609,-15.125159,14.117432,-1.423,0.420,2,0.259549,0.018067


We can see that, on average, an area in Albania is almost twice as rich as an area in Angola (which we can assess to be reasonable given how Albania's average GDP is more than twice that of Angola, which however has an higher dispersion in its distribution).

In [34]:
data.awe_pareto.mean(), ago.awe_pareto.mean()

(3667.645606956357, 1921.4412535197503)