# Exploratory analysis on health expenditure

Data source: https://databank.worldbank.org/source/world-development-indicators

Used indicators: 

- Domestic general government health expenditure (% of GDP)(SH.XPD.GHED.GD.ZS): Public expenditure on health from domestic sources as a share of the economy as measured by GDP. Most recent data from 2018.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from io import BytesIO
from zipfile import ZipFile
import requests
import xml.etree.ElementTree as et

In [2]:
def get_WB_indicator(indicator, year):
    url_base='https://api.worldbank.org/v2/en/indicator/'
    f = requests.get(url_base+indicator+'?downloadformat=xml').content
    zf = ZipFile(BytesIO(f), 'r' )
    data=zf.read(zf.namelist()[0])
    dataroot = et.fromstring(data)
    df = pd.DataFrame()
    for m in dataroot[0]:
        if m.tag=='record':
            d={}
            for x in m:
                d[x.attrib['name']]=x.text
            record = pd.Series(data=d, dtype=str, index=['Country or Area', 'Item', 'Year','Value'])
            if int(d['Year'])==year:
                df=df.append(record, ignore_index=True)
    df=df.rename(columns={'Value':indicator}).drop(['Item'], axis=1)
    return df

df=get_WB_indicator('SH.XPD.GHED.GD.ZS', 2018).set_index('Country or Area')

In [3]:
df['SH.XPD.GHED.GD.ZS']=df['SH.XPD.GHED.GD.ZS'].astype(float)
df.describe()

Unnamed: 0,SH.XPD.GHED.GD.ZS
count,187.0
mean,3.529218
std,2.413863
min,0.210372
25%,1.676775
50%,3.018888
75%,4.636415
max,15.212634


In [4]:
print('Missing values:',df['SH.XPD.GHED.GD.ZS'].isna().sum())

Missing values: 79
