# Demystifying ICP Purchasing Power Parity (PPP) Calculations

###  Authors: William Vigil-Oliver and Shriya Chahuan. 


This notebook provides the accompanying code for the World Bank blog "Demystifying ICP Purchasing Power Parity (PPP) Calculations". 

The published blog is available at: http://worldbank.org/xxxxx

Contents
- [Load required Python libraries](#Libraries)  
- [Load input price data](#Libraries)  

## Load required Python libraries
The code will require loading the following well-known Python libaries.

In [31]:
## Libraries
## Load required libaries 
import pandas as pd
import numpy as np 
import statsmodels.api as sm

## Load input price data

Let us then load the price input dataset containing mock price data. 

This example contains four countries, three basic headings (garment, rice and pork) each containing different varities of garment, rice and pork. Notice that the number of items in each basic heading is not the same. Instead, the number of item per basic headings varies from three (in the case of garments) to two (rice and pork). In addition, not all countries report prices for all items, a situation the resembles the one in actual ICP.

In [32]:
## load input price data
#bh: basic_heading 
#item: Each item that is priced as a unique item, similar items priced are covered under the same basic heading. 
#country: 1-4 suggest we are looking at same items priced in 4 different countries
#price: The different prices of items in different countries 
#imp: Importance indicators, used as weights to show how relevant is a particular item in a country within a given basic heading.
data="price_data.csv"
prices=pd.read_csv(data) 
prices # Show dataset 

Unnamed: 0,country,bh,item,price,imp
0,country1,garment,garment1,4500.38,3.0
1,country1,garment,garment2,11583.39,3.0
2,country1,garment,garment3,7000.94,1.0
3,country1,pork,pork1,2500.71,1.0
4,country1,pork,pork2,3561.45,1.0
5,country1,rice,rice1,1020.22,1.0
6,country1,rice,rice2,1000.0,1.0
7,country2,garment,garment1,700.566,1.0
8,country2,garment,garment2,877.95,3.0
9,country2,garment,garment3,616.87,3.0


## Calculate basic heading PPPs

At the first stage, PPPs are first estimated for more than one hundred groups containing similar items, so called ‘basic headings’. The basic heading is also the level of aggregation for which national accounts expenditure values can typically be provided by ICP-participating countries (as opposed to, say, the item-level for which expenditure values are rarely available). The end-result of this stage are a set of several PPPs per country, one for each basic heading.

The procedure involves averaging price relatives for individual items from different countries to obtain basic heading level PPPs using the weighted country product dummy (CPD-W). 

The CPD-W is carried out within each basic heading by regressing the logarithm of the observed country item prices on item and country dummies. Each item price will be identified as important or unimportant as reported by each ICP participating country and is defined at the basic heading level. The ICP Technical Advisory Group (TAG) of the ICP recommended that a weight of 3 be attached to items identified as 'important' and a weight of '1' to items deemed unimportant.

###  Select the reference or numeriare currency
This refers to the country/currrency against which all the estimated PPP values will be compared.

In [33]:
## Select the reference or numeraire currency
numeraire = 'country2' #For global results, the country/current would be the United States/USD

###  Prep the input dataset to run the CPD-W

In [34]:
## Prep
## Drop country-item observations without a price
prices = prices[prices['price'].notnull()]

## Dataframe with country prices
d_country=pd.get_dummies(prices['country'])

## Prepare design matrix
d_country=pd.get_dummies(prices['country'])
d_country.drop(numeraire, axis=1, inplace=True) #drop numeraire
d_country = d_country.add_prefix('c_') #add prefix to countries
d_item=pd.get_dummies(prices['item'],drop_first=False) #include all item dummies
d_item = d_item.add_prefix('i_') #add prefix to items
prices=pd.concat([prices,d_country,d_item],axis=1) # Concatenate the new cols

## Create empty arrays to store results
l_coef= [] # to store exp(beta_hats)
l_bh= [] # to store bh labels


###  Run the CPD-W on each basic heading and store results

In [35]:
for bh in prices.bh.unique():
    tempdf=prices[prices.bh == bh] 
    X=tempdf.loc[:, [x for x in tempdf.columns if x.startswith(('c_', 'i_'))]]
    y = np.log(tempdf['price']) 
    wts=tempdf['imp']

    wts_cpd=sm.WLS(y, X,weights=wts)
    res=wts_cpd.fit()
    res_eparams=np.exp(res.params)
    
    print("\n","Basic Heading:", bh, "\n")
    print('Exponentiated Parameters: ',res_eparams)
    
    l_coef.append(res_eparams)
    l_bh.append(bh)

coef = np.array(l_coef, dtype=float)
coef = np.round(coef,4) # round to 4 decimals
cols = list(X) #store column heads of X as a list
coef[coef == 1] = np.nan #%% replace PPPs that were exp(0)=1 with 'np.nan'




 Basic Heading: garment 

Exponentiated Parameters:  c_country1      9.743516
c_country3     20.360644
c_country4      0.094657
i_garment1    598.805582
i_garment2    985.325043
i_garment3    579.168568
i_pork1         1.000000
i_pork2         1.000000
i_rice1         1.000000
i_rice2         1.000000
dtype: float64

 Basic Heading: pork 

Exponentiated Parameters:  c_country1     13.874905
c_country3     18.985054
c_country4      0.091746
i_garment1      1.000000
i_garment2      1.000000
i_garment3      1.000000
i_pork1       181.782469
i_pork2       254.494367
i_rice1         1.000000
i_rice2         1.000000
dtype: float64

 Basic Heading: rice 

Exponentiated Parameters:  c_country1    14.084683
c_country3    10.511296
c_country4     0.067242
i_garment1     1.000000
i_garment2     1.000000
i_garment3     1.000000
i_pork1        1.000000
i_pork2        1.000000
i_rice1       73.227162
i_rice2       70.230777
dtype: float64


###  Display the estimated basic heading PPPs 

In [36]:
#Create dataframe of PPP results from the Numpy arrays
#dimension = "# BHs" x "# coef"
df_bhppp=pd.DataFrame(data = coef, index = l_bh, columns = cols)
numeraire=f"c_{numeraire}"
df_bhppp.insert(0, numeraire, 1.000) #insert column of 1s for numeraire

In [37]:
df_bhppp=df_bhppp.loc[:, [x for x in df_bhppp.columns if x.startswith(('c_'))]] #subsetting to store only country level PPPs
df_bhppp.columns = df_bhppp.columns.str.replace('^c_', '') 
df_bhppp = df_bhppp.reindex(sorted(df_bhppp.columns), axis=1) #sort cols alphabetically
df_bhppp

Unnamed: 0,country1,country2,country3,country4
garment,9.7435,1.0,20.3606,0.0947
pork,13.8749,1.0,18.9851,0.0917
rice,14.0847,1.0,10.5113,0.0672


In [38]:
df_bhppp

Unnamed: 0,country1,country2,country3,country4
garment,9.7435,1.0,20.3606,0.0947
pork,13.8749,1.0,18.9851,0.0917
rice,14.0847,1.0,10.5113,0.0672


In [39]:
#shape is # of BHs x '# of countries'
print("\n", "Matrix of BH PPP Results(headings x countries):\n", df_bhppp.shape)


 Matrix of BH PPP Results(headings x countries):
 (3, 4)


## Calculate above-basic heading PPPs

At the second stage of calculation, PPPs calculated at the basic heading-level are then aggregated. The procedure entails using national accounts expenditures as weights to arrive at a set of PPPs containing only one PPP for each country. This PPP can refer to any expenditure level above the basic heading, including major GDP aggregates such as total household consumption. 

The ICP method uses the Fisher ideal index to construct bilateral PPPs for each pair of countries, using basic heading expenditure weights from each country in turn. These bilateral PPPs are then averaged using the Gini-Éltető-Köves-Szulc (GEKS) approach to arrive at a final set of above basic headings PPPs, containing one PPP for each country relative to the numeraire.

###  Load the basic heading level expenditure values
As a first step let us load the basic heading level national accounts expenditure values for each country.

In [40]:
#Load exp data
#Should contain bh and countries with prefix c
code="bhdata_exp.csv"
df_bh=pd.read_csv(code,index_col="icp_bh")
df_bh = df_bh.reindex(sorted(df_bh.columns), axis=1) #sort cols alphabetically
df_bh

Unnamed: 0_level_0,c_country1,c_country2,c_country3,c_country4
icp_bh,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
bhppp_rice,12055360000.0,2120000000000.0,19456580000.0,6900940414
bhppp_beef,81607990000.0,19714540000.0,27687870000.0,7189988899
bhppp_garment,710876100.0,1270000000000.0,100000000000.0,1446676800


Check that both the basic heading PPP and basic heading expenditure matrices have the same dimensions.

In [41]:
df_bhexp=df_bh.loc[:, [x for x in df_bh.columns if x.startswith(('c_'))]]

print("Dimensions of Matrices (no. of headings x no. of countries):","\n")
print("BH Purchasing Power Parities (PPPs)  = ",df_bhppp.shape)
print("BH Nominal Expenditures in LCUs      = ", df_bhexp.shape)

Dimensions of Matrices (no. of headings x no. of countries): 

BH Purchasing Power Parities (PPPs)  =  (3, 4)
BH Nominal Expenditures in LCUs      =  (3, 4)


###  Calculate binary PPPs (Laspeyres-, Paasche-, and Fisher-type)


In [42]:
#Calculate Laspeyres bilateral ppps 
shape = (len(df_bhexp.columns),len(df_bhexp.columns))
lp = np.zeros(shape) #square matrix: country x country
nrow= len(lp)  # gets the number of rows
ncol = len(lp[0]) #get the number of cols

for row in range(nrow):
    for col in range(ncol):
        #weighted means by looping over df rows
        lp[row][col]= np.average((df_bhppp.iloc[:,row]/df_bhppp.iloc[:,col]),weights=df_bhexp.iloc[:,col])

lp_ppp = lp

Square ('country x country') matrix of bilateral (Laspeyres-type) PPPs

In [43]:
print("Laspeyres-type binary PPPs:","\n", lp_ppp)

Laspeyres-type binary PPPs: 
 [[1.00000000e+00 1.13843332e+01 1.11143732e+00 1.35229214e+02]
 [7.59682297e-02 1.00000000e+00 8.10602828e-02 1.11218715e+01]
 [1.45576582e+00 1.66841256e+01 1.00000000e+00 2.05860193e+02]
 [6.99251485e-03 8.44398614e-02 5.86867217e-03 1.00000000e+00]]


In [44]:
#Calculate Paasche bilateral ppps 
pa_ppp = np.transpose(np.reciprocal(lp_ppp))


Square ('country x country') matrix of bilateral (Paasche-type) PPPs

In [45]:
print("Paasche-type binary PPPs:","\n", pa_ppp)

Paasche-type binary PPPs: 
 [[1.00000000e+00 1.31633974e+01 6.86923671e-01 1.43010065e+02]
 [8.78400155e-02 1.00000000e+00 5.99372137e-02 1.18427480e+01]
 [8.99735844e-01 1.23364978e+01 1.00000000e+00 1.70396296e+02]
 [7.39485183e-03 8.99129254e-02 4.85766570e-03 1.00000000e+00]]


In [46]:
#Create geomean function
def nangmean(arr, axis=None):
    arr = np.asarray(arr)
    inverse_valids = 1. / np.sum(~np.isnan(arr), axis=axis)  # could be a problem for all-nan-axis
    rhs = inverse_valids * np.nansum(np.log(arr), axis=axis)
    return np.exp(rhs)

#Calculate Fisher bilateral ppps 
fi = np.zeros(shape)
nrow=len(fi)
ncol=len(fi[0])

for row in range(nrow):
    for col in range(ncol):
        fi[row][col]= nangmean([lp_ppp[row][col],pa_ppp[row][col]])
        
fi_ppp = fi

Square ('country x country') matrix of bilateral (Fisher-type) PPPs

In [47]:
print("Fisher-type binary PPPs:","\n", fi_ppp)

Fisher-type binary PPPs: 
 [[1.00000000e+00 1.22415890e+01 8.73769196e-01 1.39065232e+02]
 [8.16887414e-02 1.00000000e+00 6.97031383e-02 1.14766511e+01]
 [1.14446699e+00 1.43465563e+01 1.00000000e+00 1.87290722e+02]
 [7.19086999e-03 8.71334320e-02 5.33929279e-03 1.00000000e+00]]


###  Calculate GEKS PPPs

Finally, let us calculate the GEKS PPPs between each country and the numeraire.

The GEKS PPPs for country with respect to the numeraire is equal to the geometric mean of the Fisher-type PPP relatives between each of the two countries and the remaining countries.

In [48]:
#Calculate GEKS multilateral ppps 
##reqs the earlier nangmean function 
geks = np.zeros(shape)  # zero 'country x country' matrix
nrow=len(geks)  # gets the number of rows
ncol=len(geks[0])

for row in range(nrow):
    for col in range(ncol):
        geks[row][col]= nangmean(fi_ppp[row]/fi_ppp[col])     

geks_vec = np.zeros(shape=(1,len(df_bhexp.columns))) # as we need a vector of ppp's, not a matrix
j=len(geks_vec[0])
for col in range(j):#..one PPP per country, or col of bhexp df
    geks_vec[:,col]=nangmean(geks[col,0]/geks[0,0]) #geomean over each row, w/ each col rebased to country in col1    

geks_ppp = np.array(geks_vec)
geks_ppp = np.round(geks_ppp,4) # round to 4 decimals

Vector containing one GEKS PPP per country

In [49]:
geks_ppp = pd.DataFrame(geks_ppp)
geks_ppp.columns = df_bhexp.columns
geks_ppp = round(geks_ppp,4)
geks_ppp

Unnamed: 0,c_country1,c_country2,c_country3,c_country4
0,1.0,0.0814,1.1991,0.0069
