<a href="https://colab.research.google.com/github/tonyhollaar/projects/blob/main/Visualizations_Bar_Chart_Race.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bar Chart Race Visualization with Python
<br> API source package: https://pypi.org/project/wbgapi/
<br> description: wbgapi provides a comprehensive interface to the World Bank's data and metadata APIs

# 1. Import Libraries

In [2]:
# source: https://pypi.org/project/wbgapi/
!pip install wbgapi

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wbgapi
  Downloading wbgapi-1.0.12-py3-none-any.whl (36 kB)
Installing collected packages: wbgapi
Successfully installed wbgapi-1.0.12


In [3]:
import pandas
import wbgapi as wb

##################
# optional help
##################
# help(wb)  
# help(wb.series)  
#wb.source.info()

# search on indicator e.g. GDP
##################
#wb.search('GDP')
# SL.GDP.PCAP.EM.KD

# for row in wb.data.fetch('SL.GDP.PCAP.EM.KD', 'USA'): # all years
#     print(row)

# 2. Lookup indicator in World Bank Database e.g. 'total population'

In [5]:
wb.search('total population')
# indicator: SP.POP.TOTL 
# description: Total population is based on the de facto definition of population, which counts all residents 

ID,Name,Field,Value
AG.LND.TOTL.RU.K2,Rural land area (sq. km),Longdefinition,"...or approximated urban extents based on buffered settlement points for which the total population is greater than 5,000 persons...."
AG.LND.TOTL.RU.K2,Rural land area (sq. km),Statisticalconceptandmethodology,"...or approximated urban extents based on buffered settlement points for which the total population is greater than 5,000 persons. This dataset is produced by the Columbia..."
AG.LND.TOTL.UR.K2,Urban land area (sq. km),Longdefinition,"...or approximated urban extents based on buffered settlement points for which the total population is greater than 5,000 persons...."
AG.LND.TOTL.UR.K2,Urban land area (sq. km),Statisticalconceptandmethodology,"...or approximated urban extents based on buffered settlement points for which the total population is greater than 5,000 persons. This dataset is produced by the Columbia..."
DT.ODA.ODAT.PC.ZS,Net ODA received per capita (current US$),Statisticalconceptandmethodology,"...Total population is based on the de facto definition of population, which counts all residents..."
EG.CFT.ACCS.ZS,Access to clean fuels and technologies for cooking (% of population),Longdefinition,...Access to clean fuels and technologies for cooking is the proportion of total population primarily using clean cooking fuels and technologies for cooking. Under WHO...
EN.ATM.PM25.MC.T1.ZS,"PM2.5 pollution, population exposed to levels exceeding WHO Interim Target-1 value (% of total)",Statisticalconceptandmethodology,"...value, in this case 10 micrograms per cubic meter, and then dividing by total population...."
EN.ATM.PM25.MC.T2.ZS,"PM2.5 pollution, population exposed to levels exceeding WHO Interim Target-2 value (% of total)",Statisticalconceptandmethodology,"...value, in this case 10 micrograms per cubic meter, and then dividing by total population...."
EN.ATM.PM25.MC.T3.ZS,"PM2.5 pollution, population exposed to levels exceeding WHO Interim Target-3 value (% of total)",Statisticalconceptandmethodology,"...value, in this case 10 micrograms per cubic meter, and then dividing by total population...."
EN.ATM.PM25.MC.ZS,"PM2.5 air pollution, population exposed to levels exceeding WHO guideline value (% of total)",Statisticalconceptandmethodology,"...value, in this case 10 micrograms per cubic meter, and then dividing by total population...."


# Create a dataframe of indicator e.g. total population

In [8]:
# create dataframe of indicator
df = wb.data.DataFrame('SP.POP.TOTL', time='all', labels=True).reset_index()

In [9]:
# transpose dataset e.g. with years as index
df_transposed = df.T

In [10]:
# 
df_transposed.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,256,257,258,259,260,261,262,263,264,265
economy,ZWE,ZMB,YEM,PSE,VIR,VNM,VEN,VUT,UZB,URY,...,EMU,TEA,EAP,EAS,EAR,CEB,CSS,ARB,AFW,AFE
Country,Zimbabwe,Zambia,"Yemen, Rep.",West Bank and Gaza,Virgin Islands (U.S.),Vietnam,"Venezuela, RB",Vanuatu,Uzbekistan,Uruguay,...,Euro area,East Asia & Pacific (IDA & IBRD countries),East Asia & Pacific (excluding high income),East Asia & Pacific,Early-demographic dividend,Central Europe and the Baltics,Caribbean small states,Arab World,Africa Western and Central,Africa Eastern and Southern
YR1960,3806310.0,3119430.0,5542459.0,,32500.0,32718461.0,8156937.0,64608.0,8372311.0,2529021.0,...,265244987.0,884811163.0,896482332.0,1043333636.0,979461502.0,91401764.0,4209141.0,93359407.0,97256290.0,130692579.0
YR1961,3925952.0,3219451.0,5646668.0,,34300.0,33621982.0,8453106.0,66462.0,8692048.0,2561153.0,...,267560575.0,884080493.0,896012881.0,1045203037.0,1004319366.0,92232738.0,4289429.0,95760348.0,99314028.0,134169237.0
YR1962,4049778.0,3323427.0,5753386.0,,35000.0,34533889.0,8754082.0,68391.0,9038222.0,2592441.0,...,269908375.0,895683380.0,907880207.0,1059600211.0,1029962253.0,93009498.0,4366420.0,98268683.0,101445032.0,137835590.0


In [11]:
# change header to country
df_transposed.columns = df_transposed.iloc[1]
df_transposed = df_transposed[2:]

In [44]:
# remove string YR from index and keep year only format yyyy
import regex as re
df_transposed.index = df_transposed.index.map(lambda x: x.lstrip('YR'))

In [45]:
df_transposed

Country,Zimbabwe,Zambia,"Yemen, Rep.",West Bank and Gaza,Virgin Islands (U.S.),Vietnam,"Venezuela, RB",Vanuatu,Uzbekistan,Uruguay,...,Euro area,East Asia & Pacific (IDA & IBRD countries),East Asia & Pacific (excluding high income),East Asia & Pacific,Early-demographic dividend,Central Europe and the Baltics,Caribbean small states,Arab World,Africa Western and Central,Africa Eastern and Southern
1960,3806310,3119430,5542459,0,32500,32718461,8156937,64608,8372311,2529021,...,265244987,884811163,896482332,1043333636,979461502,91401764,4209141,93359407,97256290,130692579
1961,3925952,3219451,5646668,0,34300,33621982,8453106,66462,8692048,2561153,...,267560575,884080493,896012881,1045203037,1004319366,92232738,4289429,95760348,99314028,134169237
1962,4049778,3323427,5753386,0,35000,34533889,8754082,68391,9038222,2592441,...,269908375,895683380,907880207,1059600211,1029962253,93009498,4366420,98268683,101445032,137835590
1963,4177931,3431381,5860197,0,39800,35526727,9059953,70400,9394588,2622936,...,272291865,918644019,931136006,1085398906,1056327420,93840016,4443544,100892507,103667517,141630546
1964,4310332,3542764,5973803,0,40800,36509166,9371333,72493,9758147,2652376,...,274700502,941205716,954010411,1110819272,1083430197,94715795,4520592,103618568,105959979,145605995
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2017,14751101,17298054,30034389,4454805,107281,94033048,30563433,290239,32388600,3422200,...,341246081,2055414680,2080968782,2327134580,3252529883,102740078,7303634,423664839,431138704,632746296
2018,15052184,17835893,30790513,4569087,107001,94914330,29825653,297298,32956100,3427042,...,342065158,2068898629,2094573278,2341387076,3294298709,102538451,7374650,432545676,442646825,649756874
2019,15354608,18380477,31546691,4685306,106669,95776716,28971683,304404,33580350,3428409,...,342452734,2080648616,2106439246,2353862247,3335463995,102398537,7424102,441467739,454306063,667242712
2020,15669666,18927715,32284046,4803269,106290,96648685,28490453,311685,34232050,3429086,...,342913447,2090523535,2116424876,2363940425,3375134276,102180124,7444768,449228296,466189102,685112705


In [46]:
df_transposed.head()
df_transposed.reindex(index=range(1960,20))

Country,Zimbabwe,Zambia,"Yemen, Rep.",West Bank and Gaza,Virgin Islands (U.S.),Vietnam,"Venezuela, RB",Vanuatu,Uzbekistan,Uruguay,...,Euro area,East Asia & Pacific (IDA & IBRD countries),East Asia & Pacific (excluding high income),East Asia & Pacific,Early-demographic dividend,Central Europe and the Baltics,Caribbean small states,Arab World,Africa Western and Central,Africa Eastern and Southern


In [47]:
# show all columns
country_list = []

for col in df_transposed.columns:
  country_list.append(col)
country_list.sort()

In [48]:
# list of countries or regions
country_list

['Afghanistan',
 'Africa Eastern and Southern',
 'Africa Western and Central',
 'Albania',
 'Algeria',
 'American Samoa',
 'Andorra',
 'Angola',
 'Antigua and Barbuda',
 'Arab World',
 'Argentina',
 'Armenia',
 'Aruba',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahamas, The',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Benin',
 'Bermuda',
 'Bhutan',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Botswana',
 'Brazil',
 'British Virgin Islands',
 'Brunei Darussalam',
 'Bulgaria',
 'Burkina Faso',
 'Burundi',
 'Cabo Verde',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Caribbean small states',
 'Cayman Islands',
 'Central African Republic',
 'Central Europe and the Baltics',
 'Chad',
 'Channel Islands',
 'Chile',
 'China',
 'Colombia',
 'Comoros',
 'Congo, Dem. Rep.',
 'Congo, Rep.',
 'Costa Rica',
 "Cote d'Ivoire",
 'Croatia',
 'Cuba',
 'Curacao',
 'Cyprus',
 'Czechia',
 'Denmark',
 'Djibouti',
 'Dominica',
 'Dominican Republic',
 'Early-demographic dividend',
 'East

In [49]:
# original datatatypes is object
df_transposed.dtypes

Country
Zimbabwe                          int64
Zambia                            int64
Yemen, Rep.                       int64
West Bank and Gaza                int64
Virgin Islands (U.S.)             int64
                                  ...  
Central Europe and the Baltics    int64
Caribbean small states            int64
Arab World                        int64
Africa Western and Central        int64
Africa Eastern and Southern       int64
Length: 266, dtype: object

In [50]:
# convert all NaN's to 0
df_transposed = df_transposed.fillna(0)

# convert all columns datatypes to integer
df_transposed = df_transposed.astype(int)

In [51]:
# source: https://www.dexplo.org/bar_chart_race/installation/
!pip install bar_chart_race

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [52]:
# fraction of US population to world population
my_df = df_transposed[['United States', 
                       'Russian Federation', 
                       'Germany', 
                       'France', 
                       'Belgium', 
                       'Netherlands', 
                       'Spain',
                       'Italy',
                       'Greece']]

In [53]:
# source: https://github.com/marcellusruben/medium-resources/blob/main/Bar%20Chart%20Race/bar%20chart%20race.ipynb
# import package for visualization of bart chart
import bar_chart_race as bcr
bcr.bar_chart_race(df = my_df, 
                   title='Yearly Population by Country', 
                   orientation='h', # horizontal barchart
                   sort='desc', # descending
                   n_bars=10, # number of bars/countries in list 
                   steps_per_period=10, 
                   period_length=500)