<a href="https://colab.research.google.com/github/tonyhollaar/projects/blob/main/Visualizations_Bar_Chart_Race.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bar Chart Race Visualization with Python
<b> Goal: </b> Example of Bart Chart Race with World Bank Data retrieved with API
<br> <b> Author: </b> Tony Hollaar

<br><b> Special Thanks </b> 
- To Marcellus Ruben for notes on wbgapi, see article: https://github.com/marcellusruben/medium-resources/blob/main/Bar%20Chart%20Race/bar%20chart%20race.ipynb

<br> <b> Notes: </b>
- API source package: https://pypi.org/project/wbgapi/
- <b> wbgapi </b> provides a comprehensive interface to the World Bank's data and metadata APIs


# 0) Install Packages

In [1]:
# API to retrieve World Bank's Data and Metadata
# source: https://pypi.org/project/wbgapi/
!pip install wbgapi

# Visualization package for bar chart race i.e. animated graph
# source: https://www.dexplo.org/bar_chart_race/installation/
!pip install bar_chart_race

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# 1) Import Libraries

In [2]:
import pandas
# API to retrieve World Bank's Data and Metadata
import wbgapi as wb
# import package for visualization of bart chart
import bar_chart_race as bcr
# regular expressions
import regex as re

##################
# optional help
##################
# help(wb)  
# help(wb.series)  
#wb.source.info()

# 2) Lookup indicator in World Bank Database e.g. 'total population'

In [3]:
# search on indicator e.g. total population
wb.search('total population')
# indicator: SP.POP.TOTL 
# description: Total population is based on the de facto definition of population, which counts all residents 

ID,Name,Field,Value
AG.LND.TOTL.RU.K2,Rural land area (sq. km),Longdefinition,"...or approximated urban extents based on buffered settlement points for which the total population is greater than 5,000 persons...."
AG.LND.TOTL.RU.K2,Rural land area (sq. km),Statisticalconceptandmethodology,"...or approximated urban extents based on buffered settlement points for which the total population is greater than 5,000 persons. This dataset is produced by the Columbia..."
AG.LND.TOTL.UR.K2,Urban land area (sq. km),Longdefinition,"...or approximated urban extents based on buffered settlement points for which the total population is greater than 5,000 persons...."
AG.LND.TOTL.UR.K2,Urban land area (sq. km),Statisticalconceptandmethodology,"...or approximated urban extents based on buffered settlement points for which the total population is greater than 5,000 persons. This dataset is produced by the Columbia..."
DT.ODA.ODAT.PC.ZS,Net ODA received per capita (current US$),Statisticalconceptandmethodology,"...Total population is based on the de facto definition of population, which counts all residents..."
EG.CFT.ACCS.ZS,Access to clean fuels and technologies for cooking (% of population),Longdefinition,...Access to clean fuels and technologies for cooking is the proportion of total population primarily using clean cooking fuels and technologies for cooking. Under WHO...
EN.ATM.PM25.MC.T1.ZS,"PM2.5 pollution, population exposed to levels exceeding WHO Interim Target-1 value (% of total)",Statisticalconceptandmethodology,"...value, in this case 10 micrograms per cubic meter, and then dividing by total population...."
EN.ATM.PM25.MC.T2.ZS,"PM2.5 pollution, population exposed to levels exceeding WHO Interim Target-2 value (% of total)",Statisticalconceptandmethodology,"...value, in this case 10 micrograms per cubic meter, and then dividing by total population...."
EN.ATM.PM25.MC.T3.ZS,"PM2.5 pollution, population exposed to levels exceeding WHO Interim Target-3 value (% of total)",Statisticalconceptandmethodology,"...value, in this case 10 micrograms per cubic meter, and then dividing by total population...."
EN.ATM.PM25.MC.ZS,"PM2.5 air pollution, population exposed to levels exceeding WHO guideline value (% of total)",Statisticalconceptandmethodology,"...value, in this case 10 micrograms per cubic meter, and then dividing by total population...."


# 3) Create a dataframe of indicator e.g. total population

In [4]:
def create_df(id, time='all'):
  """
  provide indicator (id) 
  and timeframe e.g. time='all'
  """
  # create dataframe of indicator
  df = wb.data.DataFrame(id, time, labels=True).reset_index()
  
  ########################################################################
  # Data Transformation
  ######################################################################## 
  # transpose dataset e.g. with years as index
  df_transposed = df.T

  ########################################################################
  # Data Cleaning
  ########################################################################
  # change header to country
  df_transposed.columns = df_transposed.iloc[1]
  # exclude first two rows with country shortname and index <0,1,2...N>
  df_transposed = df_transposed[2:]
  # remove string YR from index and keep year only format yyyy
  df_transposed.index = df_transposed.index.map(lambda x: x.lstrip('YR'))
  # convert all NaN's to 0
  df_transposed = df_transposed.fillna(0)
  # convert all columns datatypes to integer
  df_transposed = df_transposed.astype(int)
  ######################################################################## 
  return df_transposed

In [5]:
df_transposed = create_df(id='SP.POP.TOTL', time='all')
df_transposed

Country,Zimbabwe,Zambia,"Yemen, Rep.",West Bank and Gaza,Virgin Islands (U.S.),Vietnam,"Venezuela, RB",Vanuatu,Uzbekistan,Uruguay,...,Euro area,East Asia & Pacific (IDA & IBRD countries),East Asia & Pacific (excluding high income),East Asia & Pacific,Early-demographic dividend,Central Europe and the Baltics,Caribbean small states,Arab World,Africa Western and Central,Africa Eastern and Southern
1960,3806310,3119430,5542459,0,32500,32718461,8156937,64608,8372311,2529021,...,265244987,884811163,896482332,1043333636,979461502,91401764,4209141,93359407,97256290,130692579
1961,3925952,3219451,5646668,0,34300,33621982,8453106,66462,8692048,2561153,...,267560575,884080493,896012881,1045203037,1004319366,92232738,4289429,95760348,99314028,134169237
1962,4049778,3323427,5753386,0,35000,34533889,8754082,68391,9038222,2592441,...,269908375,895683380,907880207,1059600211,1029962253,93009498,4366420,98268683,101445032,137835590
1963,4177931,3431381,5860197,0,39800,35526727,9059953,70400,9394588,2622936,...,272291865,918644019,931136006,1085398906,1056327420,93840016,4443544,100892507,103667517,141630546
1964,4310332,3542764,5973803,0,40800,36509166,9371333,72493,9758147,2652376,...,274700502,941205716,954010411,1110819272,1083430197,94715795,4520592,103618568,105959979,145605995
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2017,14751101,17298054,30034389,4454805,107281,94033048,30563433,290239,32388600,3422200,...,341246081,2055414680,2080968782,2327134580,3252529883,102740078,7303634,423664839,431138704,632746296
2018,15052184,17835893,30790513,4569087,107001,94914330,29825653,297298,32956100,3427042,...,342065158,2068898629,2094573278,2341387076,3294298709,102538451,7374650,432545676,442646825,649756874
2019,15354608,18380477,31546691,4685306,106669,95776716,28971683,304404,33580350,3428409,...,342452734,2080648616,2106439246,2353862247,3335463995,102398537,7424102,441467739,454306063,667242712
2020,15669666,18927715,32284046,4803269,106290,96648685,28490453,311685,34232050,3429086,...,342913447,2090523535,2116424876,2363940425,3375134276,102180124,7444768,449228296,466189102,685112705


# 4) list all the countries and regions to choose subset for bar chart race

In [6]:
def country_and_region_list(df):
  """
  return a list of all regions/countries e.g.
  all columns/headers within the dataframe specified
  """
  # show all columns
  my_list = []
  for col in df_transposed.columns:
    my_list.append(col)
  # sort A-Z countries/regions
  my_list.sort()
  # list of countries or regions
  return my_list

In [7]:
# apply custom function to return all available countries
country_and_region_list(df_transposed)

['Afghanistan',
 'Africa Eastern and Southern',
 'Africa Western and Central',
 'Albania',
 'Algeria',
 'American Samoa',
 'Andorra',
 'Angola',
 'Antigua and Barbuda',
 'Arab World',
 'Argentina',
 'Armenia',
 'Aruba',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahamas, The',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Benin',
 'Bermuda',
 'Bhutan',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Botswana',
 'Brazil',
 'British Virgin Islands',
 'Brunei Darussalam',
 'Bulgaria',
 'Burkina Faso',
 'Burundi',
 'Cabo Verde',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Caribbean small states',
 'Cayman Islands',
 'Central African Republic',
 'Central Europe and the Baltics',
 'Chad',
 'Channel Islands',
 'Chile',
 'China',
 'Colombia',
 'Comoros',
 'Congo, Dem. Rep.',
 'Congo, Rep.',
 'Costa Rica',
 "Cote d'Ivoire",
 'Croatia',
 'Cuba',
 'Curacao',
 'Cyprus',
 'Czechia',
 'Denmark',
 'Djibouti',
 'Dominica',
 'Dominican Republic',
 'Early-demographic dividend',
 'East

In [8]:
# create subset of countries as pandas dataframe
my_df = df_transposed[['United States', 
                       'Russian Federation', 
                       'Germany', 
                       'France', 
                       'Belgium', 
                       'Netherlands', 
                       'Spain',
                       'Italy',
                       'Greece']]

# Create the Bar Chart Race and specify parameters

- <b> n_bars: </b> int, default None
    `Choose the maximum number of bars to display on the graph. By default, use all bars. New bars entering the race will appear from the edge of the axes.`

- <b> steps_per_period: </b> int, default 10
    `The number of steps to go from one time period to the next. The bars will grow linearly between each period.`

- <b> period_length: </b> int, default 500
    `Number of milliseconds to animate each period (row). Default is 500ms (half of a second)`

- <b> figsize: </b> two-item tuple of numbers, default (6, 3.5)
    `matplotlib figure size in inches. Will be overridden if figure supplied to fig.`

- <b> filename: </b> None or str, default None
    `If None return animation as an HTML5 string. If a string, save animation to that filename location. Use .mp4, .gif, .html, .mpeg, .mov and any other extensions supported by ffmpeg or ImageMagick.`

-  For more parameters, see source: https://www.dexplo.org/bar_chart_race/api/


In [9]:
# see required/optional parameters function
bcr.bar_chart_race

<function bar_chart_race._make_chart.bar_chart_race(df, filename=None, orientation='h', sort='desc', n_bars=None, fixed_order=False, fixed_max=False, steps_per_period=10, period_length=500, interpolate_period=False, label_bars=True, bar_size=0.95, period_label=True, period_fmt=None, period_summary_func=None, perpendicular_bar_func=None, figsize=(6, 3.5), cmap=None, title=None, title_size=None, bar_label_size=7, tick_label_size=7, shared_fontdict=None, scale='linear', writer=None, fig=None, dpi=144, bar_kwargs=None, filter_column_colors=False)>

In [15]:
# visualize in e.g. Google Colab Notebook
bcr.bar_chart_race(df = my_df, 
                   title='Yearly Population by Country', 
                   orientation='h', # horizontal barchart
                   sort='desc', # descending
                   n_bars=len(my_df.columns), # number of bars/countries in list 
                   steps_per_period=10, 
                   period_length=300,
                   dpi=150,
                   #figsize=(6,3.5),
                   fixed_max=True)

In [14]:
# save to file in Google Colab
bcr.bar_chart_race(df = my_df, 
                   title='Yearly Population by Country', 
                   orientation='h', # horizontal barchart
                   sort='desc', # descending
                   n_bars=len(my_df.columns), # number of bars/countries in list 
                   steps_per_period=10, 
                   period_length=300,
                   dpi=150,
                   filename='population.mp4',
                   #figsize=(6,3.5),
                   fixed_max=True)