# Wikipedia Notable Life Expectancies

# [Notebook 4 of 4: Data Cleaning](https://github.com/teresahanak/wikipedia-life-expectancy/blob/main/wp_life_expect_data_clean3_thanak_2022_06_23.ipynb)

## Context

The


## Objective

The

### Data Dictionary

- Feature: Description

## Importing Necessary Libraries

In [1]:
# To structure code automatically
%load_ext nb_black

# To import/export sqlite databases
import sqlite3 as sql

# To save/open python objects in pickle file
import pickle

# To help with reading, cleaning, and manipulating data
import pandas as pd
import numpy as np
import re

# To define maximum number of columns to be displayed in a dataframe
pd.set_option("display.max_columns", None)
# To define the maximum number of rows to be displayed in a dataframe
pd.set_option("display.max_rows", 200)

# To supress warnings
# import warnings

# warnings.filterwarnings("ignore")

# To set some visualization attributes
pd.set_option("max_colwidth", 150)

# To play auditory cue when cell has executed, has warning, or has error and set chime theme
import chime

chime.theme("zelda")

<IPython.core.display.Javascript object>

## Data Overview

### Reading, Sampling, and Checking Data Shape

In [2]:
# Reading the dataset
conn = sql.connect("wp_life_expect_clean2.db")
data = pd.read_sql("SELECT * FROM wp_life_expect_clean2", conn)

# Making a working copy
df = data.copy()

# Checking the shape
print(f"There are {df.shape[0]} rows and {df.shape[1]} columns.")

# Checking first 2 rows of the data
df.head(2)

There are 132652 rows and 21 columns.


Unnamed: 0,day,name,info,link,num_references,year,month,info_parenth,info_1,info_2,info_3,info_4,info_5,info_6,info_7,info_8,info_9,info_10,info_11,age,cause_of_death
0,1,William Chappell,", 86, British dancer, ballet designer and director.",https://en.wikipedia.org/wiki/William_Chappell_(dancer),21,1994,January,,,British dancer,ballet designer and director,,,,,,,,,86.0,
1,1,Raymond Crotty,", 68, Irish economist, writer, and academic.",https://en.wikipedia.org/wiki/Raymond_Crotty,12,1994,January,,,Irish economist,writer,and academic,,,,,,,,68.0,


<IPython.core.display.Javascript object>

In [3]:
# Checking last 2 rows of the data
df.tail(2)

Unnamed: 0,day,name,info,link,num_references,year,month,info_parenth,info_1,info_2,info_3,info_4,info_5,info_6,info_7,info_8,info_9,info_10,info_11,age,cause_of_death
132650,9,Oleg Moliboga,", 69, Russian volleyball player, Olympic champion and coach.",https://en.wikipedia.org/wiki/Oleg_Moliboga,2,2022,June,(1980),,Russian volleyball player,Olympic champion and coach,,,,,,,,,69.0,
132651,9,Zou Jing,", 86, Chinese engineer, member of the Chinese Academy of Engineering.",https://en.wikipedia.org/wiki/Zou_Jing_(engineer),3,2022,June,,,Chinese engineer,member of the Chinese Academy of Engineering,,,,,,,,,86.0,


<IPython.core.display.Javascript object>

In [4]:
# Checking a sample of the data
df.sample(5)

Unnamed: 0,day,name,info,link,num_references,year,month,info_parenth,info_1,info_2,info_3,info_4,info_5,info_6,info_7,info_8,info_9,info_10,info_11,age,cause_of_death
27833,24,"John Vivian, 4th Baron Swansea",", 80, British peer and sports shooter.","https://en.wikipedia.org/wiki/John_Vivian,_4th_Baron_Swansea",4,2005,June,,,British peer and sports shooter,,,,,,,,,,80.0,
57835,5,Keith Ripley,", 77, English footballer.",https://en.wikipedia.org/wiki/Keith_Ripley_(footballer_born_1935),3,2012,November,,,English footballer,,,,,,,,,,77.0,
119546,16,Si Spencer,", 59, British comic book writer .",https://en.wikipedia.org/wiki/Si_Spencer,5,2021,February,"(, )",,British comic book writer,,,,,,,,,,59.0,
24473,6,Francesco Scavullo,", 82, American fashion photographer.",https://en.wikipedia.org/wiki/Francesco_Scavullo,6,2004,January,,,American fashion photographer,,,,,,,,,,82.0,
21154,3,Edward Brodney,", 92, American artist, known for his drawings and paintings of World War II.",https://en.wikipedia.org/wiki/Edward_Brodney,4,2002,August,,,American artist,known for his drawings and paintings of World War II,,,,,,,,,92.0,


<IPython.core.display.Javascript object>

### Checking Data Types, Duplicates, and Null Values

In [5]:
# Checking data types and null values
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 132652 entries, 0 to 132651
Data columns (total 21 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   day             132652 non-null  object 
 1   name            132652 non-null  object 
 2   info            132652 non-null  object 
 3   link            132652 non-null  object 
 4   num_references  132652 non-null  object 
 5   year            132652 non-null  int64  
 6   month           132652 non-null  object 
 7   info_parenth    49830 non-null   object 
 8   info_1          35 non-null      object 
 9   info_2          132604 non-null  object 
 10  info_3          62571 non-null   object 
 11  info_4          12605 non-null   object 
 12  info_5          1497 non-null    object 
 13  info_6          216 non-null     object 
 14  info_7          31 non-null      object 
 15  info_8          6 non-null       object 
 16  info_9          1 non-null       object 
 17  info_10   

<IPython.core.display.Javascript object>

#### Loading `nation_country_dict` from Pickle File

In [6]:
# Load the nation_country_dict
with open("nation_country_dict.pkl", "rb") as f:
    nation_country_dict = pickle.load(f)

<IPython.core.display.Javascript object>

## Extracting Nationality Continued
Here is the approach we will take:
- The plan will be to save the country name, in lieu of nationality, in new `place_1` and `place_2` columns as it is standardized for the various associated nationality values.
- First, we will update the keys and values in `nation_country_dict` by replacing hyphens with a single space.
- Then we will remove "-born" from the column we are searching, as well as replace "-" and "/" each with single spaces.  In this step, we can also remove leading and trailing periods and whitespace.
- We will proceed to search the numbered `info` columns in order checking as follows:
    1. if column value starts with a value in the dictionary:
        - save country to `place_1` and remove value from searched column.
    2. if `place_1` value has been found:
        - if updated column value starts with a value in the dictionary:
            - save country to `place_2` and remove value from searched column.
    3. Repeat steps 1 and 2 but comparing with country (dictionary keys)
    4. Check unique values for column starting with capital letters.

#### Removing "-" and "." from `nation_country_dict`

In [7]:
# Removing hyphens from nation_country_dict
nation_country_dict = {
    key.replace("-", ""): value.replace("-", " ")
    for key, value in nation_country_dict.items()
}

# Removing periods from nation_country_dict
nation_country_dict = {
    key.replace(".", ""): value.replace(".", " ")
    for key, value in nation_country_dict.items()
}

<IPython.core.display.Javascript object>

#### Removing or Replacing Extra Characters in Numbered `info` Columns

In [8]:
%%time

# List of columns to treat
cols_lst = [
    "info_1",
    "info_2",
    "info_3",
    "info_4",
    "info_5",
    "info_6",
    "info_7",
    "info_8",
    "info_9",
    "info_10",
    "info_11",
]

# Dictionary of keys to find and values to replace keys
replace_dict = {'-born': '', '–born': '', '-': ' ', '–': ' ', '/': ' ', '.': ' '}

# For loop to find and replace characters in replace_dict in columns in cols_list
# and strip any leading or trailing periods or whitespace
for column in cols_lst:
    for key, value in replace_dict.items():
        for index in df[column].notna().index:
            item = df.loc[index, column]
            if item:
                df.loc[index, column] = item.replace(key, value).strip(' .')
                
# Chime notification when cell successfully executes
chime.success()

CPU times: total: 2min 31s
Wall time: 2min 31s


<IPython.core.display.Javascript object>

#### Checking `info_1` for `place_1`

In [9]:
# Column to check
column = "info_1"

# Extract to column
extract_to = "place_1"

# Dataframe to check
dataframe = df[(df[column].notna())]

# For loop to extract nation data to place column
for nationality, country in nation_country_dict.items():
    for index in dataframe.index:
        item = df.loc[index, column]
        if item.startswith(nationality):
            df.loc[index, extract_to] = country
            df.loc[index, column] = (
                df.loc[index, column].replace(nationality, "").strip()
            )

# Check a sample of treated rows
df[df[extract_to].notna()].sample(2)

Unnamed: 0,day,name,info,link,num_references,year,month,info_parenth,info_1,info_2,info_3,info_4,info_5,info_6,info_7,info_8,info_9,info_10,info_11,age,cause_of_death,place_1
104644,22,Wayne See,", 95 American basketball player .",https://en.wikipedia.org/wiki/Wayne_See,3,2019,July,(Waterloo Hawks),basketball player,,,,,,,,,,,95.0,,United States of America
20388,8,Helen Gilbert,", 80 American artist.",https://en.wikipedia.org/wiki/Helen_Gilbert,4,2002,April,,artist,,,,,,,,,,,80.0,,United States of America


<IPython.core.display.Javascript object>

#### Observations:
- `info_1` provides us a nice small sample on which to test code.
- We successfully extracted those `place_1` values, now we will do the same on the treated rows for `place_2`.

#### Checking `info_1` for `place_2`

In [10]:
# Column to check
column = "info_1"

# Extract to column
extract_to = "place_2"

# Dataframe to check
dataframe = df[(df[column].notna()) & (df["place_1"].notna())]

# For loop to extract nation data to place column
for nationality, country in nation_country_dict.items():
    for index in dataframe.index:
        item = df.loc[index, column]
        if item.startswith(nationality):
            df.loc[index, extract_to] = country
            df.loc[index, column] = (
                df.loc[index, column].replace(nationality, "").strip()
            )

# Check a sample of rows
df.sample(2)

Unnamed: 0,day,name,info,link,num_references,year,month,info_parenth,info_1,info_2,info_3,info_4,info_5,info_6,info_7,info_8,info_9,info_10,info_11,age,cause_of_death,place_1
48493,21,Anne Mathams,", 97, Scottish education and disability campaigner.",https://en.wikipedia.org/wiki/Anne_Mathams,2,2011,February,,,Scottish education and disability campaigner,,,,,,,,,,97.0,,
67124,9,Nazario Moreno González,", 44, Mexican drug lord, shot.",https://en.wikipedia.org/wiki/Nazario_Moreno_Gonz%C3%A1lez,94,2014,March,,,Mexican drug lord,shot,,,,,,,,,44.0,,


<IPython.core.display.Javascript object>

#### Observations:
- Here we can see that the new column `place_2` has not yet been added as there were not any matching values.
- Let us confirm by checking the remaining unique values in `info_1`.

#### Checking Remaining Unique Values in `info_1`

In [11]:
# Checking unique values
df["info_1"].unique()

array([None, 'politician', 'Olympic sprinter', 'gridiron football player',
       'writer', 'businessman', 'social psychologist', 'King of Nepal',
       'Maori leader', 'artist', 'English sports journalist',
       'Jules Engel', 'early', 'aka', 'Jr', 'professional wrestler',
       'automotive engineer', 'materials scientist', 'weightlifter',
       'common chimpanzee', '', 'Olympic athlete', 'actor',
       'Olympic gymnast', 'broadcaster and writer', 'Olympic swimmer',
       'Olympic boxer', 'Olympic wrestler', 'Olympic sailor',
       'basketball player', 'college basketball coach',
       'choral conductor', 'Tree of the Year'], dtype=object)

<IPython.core.display.Javascript object>

#### Obsservations:
- Neither "English" nor "Maori" are keys in the current dictionary.
- Maori is an ethnicity within the country of New Zealand, so for now, we will add it as a key our dictionary with the country value of New Zealand.  If we have matching first and second countries, we can later remove the second value.
- We will also add the key "English" with the country value 'United Kingdom of Great Britain and Northern Ireland'.
- Then, we can rerun the above code for `place_1` and `place_2`.
- The country value of "Nepal" is also present.  We will hold off on extracting country names until we have first exhausted matching nationalities, as the Wikipedia field called for nationalities.

#### Updating `nation_country_dict`

In [12]:
# Adding key: country pairs to nation_country_dict
nation_country_dict["English"] = nation_country_dict["British"]
nation_country_dict["Maori"] = nation_country_dict["New Zealand"]

<IPython.core.display.Javascript object>

#### Re-checking `info_1` for `place_1`

In [13]:
# Column to check
column = "info_1"

# Extract to column
extract_to = "place_1"

# Dataframe to check
dataframe = df[(df[column].notna()) & (df[extract_to].isna())]

# For loop to extract nation data to place column
for nationality, country in nation_country_dict.items():
    for index in dataframe.index:
        item = df.loc[index, column]
        if item.startswith(nationality):
            df.loc[index, extract_to] = country
            df.loc[index, column] = (
                df.loc[index, column].replace(nationality, "").strip()
            )

# Check a sample of rows
df[df[extract_to].notna()].sample(2)

Unnamed: 0,day,name,info,link,num_references,year,month,info_parenth,info_1,info_2,info_3,info_4,info_5,info_6,info_7,info_8,info_9,info_10,info_11,age,cause_of_death,place_1
61945,1,Basil Soper,", British actor, 74–75.",https://en.wikipedia.org/wiki/Basil_Soper,0,2013,June,,actor,,,,,,,,,,,74.5,,United Kingdom of Great Britain and Northern Ireland
8861,13,George Strugar,", 63. American gridiron football player, lung cancer.",https://en.wikipedia.org/wiki/George_Strugar,0,1997,June,,gridiron football player,lung cancer,,,,,,,,,,63.0,,United States of America


<IPython.core.display.Javascript object>

#### Re-checking `info_1` for `place_2`

In [14]:
# Column to check
column = "info_1"

# Extract to column
extract_to = "place_2"

# Dataframe to check
dataframe = df[(df[column].notna()) & (df["place_1"].notna())]

# For loop to extract nation data to place column
for nationality, country in nation_country_dict.items():
    for index in dataframe.index:
        item = df.loc[index, column]
        if item.startswith(nationality):
            df.loc[index, extract_to] = country
            df.loc[index, column] = (
                df.loc[index, column].replace(nationality, "").strip()
            )

# Checking rows
df[df["place_2"].notna()]

Unnamed: 0,day,name,info,link,num_references,year,month,info_parenth,info_1,info_2,info_3,info_4,info_5,info_6,info_7,info_8,info_9,info_10,info_11,age,cause_of_death,place_1,place_2
19580,20,Dame Miraka Szászy,", 80. New Zealand Maori leader.",https://en.wikipedia.org/wiki/Mira_Sz%C3%A1szy,21,2001,December,,leader,,,,,,,,,,,80.0,,New Zealand,New Zealand


<IPython.core.display.Javascript object>

#### Observations:
- Our code appears to be finding the matching values and assigning the corresponding country to the correct nation column.
- We see "New Zealand" added to both nation columns here, which was expected as both New Zealand and Maori are in the description.
- Now we can proceed to doing the same extraction on `info_2`.

#### Checking `info_2` for `place_1`

In [None]:
# For loop to extract nation data to place column
for nationality, country in nation_country_dict.items():
    for index in dataframe.index:
        item = df.loc[index, column]
        if item.startswith(nationality):
            df.loc[index, extract_to] = country
            df.loc[index, column] = (
                df.loc[index, column].replace(nationality, "").strip()
            )

In [15]:
%%time

# Column to check
column = "info_2"

# Extract to column
extract_to = "place_1"

# Dataframe to check
dataframe = df[(df[column].notna()) & (df[extract_to].isna())]

# For loop to extract nation data to place column
for nationality, country in nation_country_dict.items():
    for index in dataframe.index:
        item = df.loc[index, column]
        if item.startswith(nationality):
            df.loc[index, extract_to] = country
            df.loc[index, column] = (
                df.loc[index, column].replace(nationality, "").strip()
            )

# Chime notification when cell successfully executes
chime.success()

CPU times: total: 5min 37s
Wall time: 5min 37s


<IPython.core.display.Javascript object>

In [16]:
# Check a sample of rows
df[df[extract_to].notna()].sample(2)

Unnamed: 0,day,name,info,link,num_references,year,month,info_parenth,info_1,info_2,info_3,info_4,info_5,info_6,info_7,info_8,info_9,info_10,info_11,age,cause_of_death,place_1,place_2
80257,11,Ellison Kelly,", 80, American-born Canadian football player , heart failure.",https://en.wikipedia.org/wiki/Ellison_Kelly,2,2016,February,"(Hamilton Tiger-Cats, Toronto Argonauts)",,Canadian football player,heart failure,,,,,,,,,80.0,,United States of America,
45237,4,David Foster,", 90, British naval pilot.",https://en.wikipedia.org/wiki/David_Foster_(Royal_Navy_officer),1,2010,June,,,naval pilot,,,,,,,,,,90.0,,United Kingdom of Great Britain and Northern Ireland,


<IPython.core.display.Javascript object>

#### Checking `info_2` for `place_2`

In [17]:
%%time

# Column to check
column = "info_2"

# Extract to column
extract_to = "place_2"

# Dataframe to check
dataframe = df[
    (df[column].notna()) & (df[extract_to].isna()) & (df["place_1"].notna())
]

# For loop to extract nation data to place column
for nationality, country in nation_country_dict.items():
    for index in dataframe.index:
        item = df.loc[index, column]
        if item.startswith(nationality):
            df.loc[index, extract_to] = country
            df.loc[index, column] = (
                df.loc[index, column].replace(nationality, "").strip()
            )
            
# Chime notification when cell successfully executes
chime.success()

CPU times: total: 5min 31s
Wall time: 5min 31s


<IPython.core.display.Javascript object>

In [18]:
# Check a sample of rows
df[df[extract_to].notna()].sample(2)

Unnamed: 0,day,name,info,link,num_references,year,month,info_parenth,info_1,info_2,info_3,info_4,info_5,info_6,info_7,info_8,info_9,info_10,info_11,age,cause_of_death,place_1,place_2
14850,1,Betty Archdale,", 92, English-Australian educationalist and cricketer.",https://en.wikipedia.org/wiki/Betty_Archdale,10,2000,January,,,educationalist and cricketer,,,,,,,,,,92.0,,United Kingdom of Great Britain and Northern Ireland,Australia
96506,10,Liliana Ross,", 79, Italian-born Chilean actress .",https://en.wikipedia.org/wiki/Liliana_Ross,8,2018,June,"(, )",,actress,,,,,,,,,,79.0,,Italy,Chile


<IPython.core.display.Javascript object>

#### Checking Remaining Missing Values for `place_1` and Number of Rows with a `place_2` Value.

In [19]:
# Checking number of remaining missing values for place_1 and number of captured values for place_2
print(f'There are {df["place_1"].isna().sum()} remaining missing values for place_1.\n')
print(f'{df["place_2"].notna().sum()} entries have a value for place_2, thus far.')

There are 2394 remaining missing values for place_1.

2251 entries have a value for place_2, thus far.


<IPython.core.display.Javascript object>

#### Observations:
- We have captured the `place_1` value for the vast majority of entries.
- Relatively few entries have `place_2` values, which we would expect.
- Let us check the remaining rows with missing `place_1` for possible values that are not yet in our reference dictionary.

In [22]:
# Column to check
column = "info_2"

# Dataframe to check
dataframe = df[(df[column].notna()) & (df["place_1"].isna())]

# Checking set of first words in info_2 where place_1 is missing
set([item.split()[0] for item in dataframe[column] if item[0].isupper()])

{'AIDS',
 'ANC',
 'Abkhaz',
 'Abkhazian',
 'Aboriginal',
 'Actress',
 'African',
 'Afrikaans',
 'Afrikaner',
 'Afro',
 'Air',
 'Alfa',
 'All',
 'Alyawarre',
 'Amateur',
 'America',
 "America's",
 'Amrican',
 'Anglican',
 'Anglo',
 'Anguillan',
 'Antigua',
 'Arabic',
 'Archbishop',
 'Archdeacon',
 'Argentinian',
 'Aruba',
 'Aruban',
 'Assamese',
 'Associate',
 'Assyrian',
 'Athletics',
 'Aussie',
 'Australia',
 "Australia's",
 'Austria',
 'Austro',
 'Avarian',
 'Azerbaijan',
 'Azorean',
 'BBC',
 'Baltic',
 'Bangladesh',
 'Barbados',
 'Basque',
 'Bavarian',
 'Belarus',
 'Belarussian',
 'Belgium',
 'Benedictine',
 'Benin',
 'Bermudan',
 'Bermudian',
 'Bessarabian',
 'Bletchley',
 'Bodo',
 'Bosnia',
 'Botswana',
 'Braziliam',
 'Breton',
 'Brigadier',
 "Britain's",
 'Britsih',
 'Bulgaria',
 'California',
 'Californian',
 'Calypso',
 'Canada',
 'Cantonese',
 'Caribbean',
 'Catalan',
 'Catholic',
 'Caymanian',
 'Ceylon',
 'Ceylonese',
 'Chagossian',
 'Chairman',
 'Chechen',
 'Cherokee',
 'Chi

<IPython.core.display.Javascript object>

#### Observations:
- We can see there are some remaining variations on how nationality was entered that are not yet in `nation_country_dict`.
- Let us add those now, then do another iteration for searching `info_2`.

In [37]:
nation_country_dict['Abkhaz'] = nation_country_dict["Georgian"]
nation_country_dict['Abkhazian'] = nation_country_dict["Georgian"]
nation_country_dict['Aboriginal'] = nation_country_dict["Australian"]
nation_country_dict['African'] = 'Africa'
nation_country_dict['Afrikaans'] = nation_country_dict['African']
nation_country_dict['Afrikaner'] = nation_country_dict['African']
nation_country_dict['Afro'] = nation_country_dict['African']
nation_country_dict['Alyawarre'] = nation_country_dict["Australian"]
nation_country_dict['America'] = nation_country_dict["US"]
nation_country_dict["America's"] = nation_country_dict['US']
nation_country_dict['Amrican'] = nation_country_dict['US']
nation_country_dict['Anguillan'] = 'Anguilla'
nation_country_dict['Antigua']= nation_country_dict['Antiguan']
nation_country_dict['Arabic'] = 'Arab world'
nation_country_dict['Argentinian'] = nation_country_dict['Argentine']
nation_country_dict['Aruba'] = 'Aruba'
nation_country_dict['Aruban'] = nation_country_dict['Aruba']
nation_country_dict['Assamese'] = nation_country_dict["Indian"]
nation_country_dict['Assyrian'] = 'Middle East'
nation_country_dict['Aussie'] = nation_country_dict["Australian"]
nation_country_dict['Australia'] = nation_country_dict["Australian"]
nation_country_dict['Austria'] = nation_country_dict["Austrian"]
nation_country_dict['Austro'] = nation_country_dict["Austrian"]
nation_country_dict['Avarian'] = nation_country_dict["Russian"]
nation_country_dict['Azerbaijan'] = nation_country_dict["Azerbaijani"]
nation_country_dict['Azorean'] = nation_country_dict["Portuguese"]
nation_country_dict['Azeri'] = nation_country_dict["Azerbaijani"]
nation_country_dict['Baltic'] = "Baltic states"
nation_country_dict['Bangladesh'] = nation_country_dict["Bangladeshi"]
nation_country_dict['Barbados'] = 'Barbados'
nation_country_dict['Basque'] = "Western Continental Europe"
nation_country_dict['Bavarian'] = nation_country_dict["German"]
nation_country_dict['Belarus'] = nation_country_dict["Belarusian"]






<IPython.core.display.Javascript object>

In [None]:
df[df["place_1"].isna()].head(100)

In [None]:
nation_country_dict["Korean"]

In [36]:
nation_country_dict["Belarusian"]

'Belarus'

<IPython.core.display.Javascript object>

In [27]:
nation_country_dict

{'Afghan': 'Afghanistan',
 'Albanian': 'Albania',
 'Algerian': 'Algeria',
 'Andorran': 'Andorra',
 'Angolan': 'Angola',
 'Antiguan': 'Antigua and Barbuda',
 'Barbudan': 'Antigua and Barbuda',
 'Argentine': 'Argentina',
 'Armenian': 'Armenia',
 'Australian': 'Australia',
 'Austrian': 'Austria',
 'Azerbaijani': 'Azerbaijan',
 'Azeri': 'Azerbaijan',
 'Bahamian': 'The Bahamas',
 'Bahraini': 'Bahrain',
 'Bengali': 'Bangladesh',
 'Barbadian': 'Barbados',
 'Belarusian': 'Belarus',
 'Belgian': 'Belgium',
 'Belizean': 'Belize',
 'Beninese': 'Benin',
 'Beninois': 'Benin',
 'Bhutanese': 'Bhutan',
 'Bolivian': 'Bolivia',
 'Bosnian': 'Bosnia and Herzegovina',
 'Herzegovinian': 'Bosnia and Herzegovina',
 'Motswana': 'Botswana',
 'Botswanan': 'Botswana',
 'Brazilian': 'Brazil',
 'Bruneian': 'Brunei',
 'Bulgarian': 'Bulgaria',
 'Burkinabé': 'Burkina Faso',
 'Burmese': 'Burma',
 'Burundian': 'Burundi',
 'Cabo Verdean': 'Cabo Verde',
 'Cambodian': 'Cambodia',
 'Cameroonian': 'Cameroon',
 'Canadian': 'Ca

<IPython.core.display.Javascript object>