Both graphs have been implemented in Dash plotly to create interactive web apps. As we use Jupiter notebook some changes were needed like to import 'jupyter plotly dash' library and create our app on JupyterDash rather than dash as it normally is. Those changes do not effect the outcome of the app and which runs in the web browser as its original format. Therefore the output of the two graphs is in http servers, which we have to click in order to see and interact with the full graph.

## Packages needed to be installed for the code to run in jupyter note book:
    
1. conda install -c plotly plotly=4.14.3
2. pip install dash
3. conda install "notebook>=5.3" "ipywidgets>=7.5"
4. pip install dashserve
5. pip install jupyter_plotly_dash
6. pip install jupyter-dash
7. conda install -c conda-forge -c plotly jupyter-dash
8. pip install xlrd
9. conda install -c anaconda openpyxl
10. conda install -c conda-forge pycountry
11. conda install -c conda-forge dash-bootstrap-components
12. jupyter serverextension enable --sys-prefix jupyter_server_proxy
13. conda install nb_conda

In [70]:
#%tb

In [3]:
conda list 

# packages in environment at /Applications/anaconda3:
#
# Name                    Version                   Build  Channel
_anaconda_depends         2021.11                  py39_0  
_ipyw_jlab_nb_ext_conf    0.1.0            py39hecd8cb5_0  
aiohttp                   3.8.1            py39h89e85a6_0    conda-forge
aiosignal                 1.2.0              pyhd8ed1ab_0    conda-forge
alabaster                 0.7.12             pyhd3eb1b0_0  
anaconda                  custom                   py39_1  
anaconda-client           1.9.0            py39hecd8cb5_0  
anaconda-navigator        2.1.1                    py39_0  
anaconda-project          0.10.1             pyhd3eb1b0_0  
ansi2html                 1.7.0                    pypi_0    pypi
anyio                     2.2.0            py39hecd8cb5_1  
appdirs                   1.4.4              pyhd3eb1b0_0  
applaunchservices         0.2.1              pyhd3eb1b0_0  
appnope                   0.1.2           py39hec


Note: you may need to restart the kernel to use updated packages.


In [4]:
#conda install -c plotly plotly=4.14.3
#pip install dash
#conda install "notebook>=5.3" "ipywidgets>=7.5"
#pip install dashserve
#pip install jupyter_plotly_dash
#pip install jupyter-dash
#conda install -c conda-forge -c plotly jupyter-dash
#pip install xlrd
#conda install -c anaconda openpyxl
#conda install -c conda-forge pycountry
#conda install -c conda-forge dash-bootstrap-components
#conda install -c conda-forge jupyter-server-proxy

In [5]:
#Liabraries needed for the graphs

#from jupyter_plotly_dash import JupyterDash

from jupyter_dash import JupyterDash
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output, State
from dash.exceptions import PreventUpdate
import dash_bootstrap_components as dbc

import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.templates.default = "seaborn+presentation"

import pandas as pd
import numpy as np

import xlrd
import pycountry




# First Graph (Choropleth Map)

Original Dataset: https://www.rug.nl/ggdc/blog/maddison-project-database-2020-04-11-2020?lang=en

The chosen dataset is a very well structured dataset which allowed us to create the graph with minimum preprocessing on the data.


##  The graph shows how the Gross domestic product per capita is changing through the years ( from 1970 to 2018) for each country.

The color gradation depicted for each country implys the GDP per capita movement. The pallet main color is blue. The color hue moves from lighter to darker showing a smaller to bigger increase to GDP per capita value for each country.
The submit button allows the user to change the year of interest and display for this year the value of the GDP per capita of each county.

In [6]:
# Read the data
df = pd.read_excel('mpd2020.xlsx', sheet_name='Full data',engine='openpyxl')

In [7]:
df.head()

Unnamed: 0,countrycode,country,year,gdppc,pop
0,AFG,Afghanistan,1820,,3280.0
1,AFG,Afghanistan,1870,,4207.0
2,AFG,Afghanistan,1913,,5730.0
3,AFG,Afghanistan,1950,1156.0,8150.0
4,AFG,Afghanistan,1951,1170.0,8284.0


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21682 entries, 0 to 21681
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   countrycode  21682 non-null  object 
 1   country      21682 non-null  object 
 2   year         21682 non-null  int64  
 3   gdppc        19706 non-null  float64
 4   pop          17199 non-null  float64
dtypes: float64(2), int64(1), object(2)
memory usage: 847.1+ KB


In [9]:
#Find the distribution of the years.
yeatlist = df.year.value_counts()
yeatlist

2007    169
1991    169
1997    169
1996    169
1995    169
       ... 
1262      1
1267      1
1266      1
1265      1
1268      1
Name: year, Length: 772, dtype: int64

In [10]:
yeatlist.index[:70]

Int64Index([2007, 1991, 1997, 1996, 1995, 1994, 1993, 1992, 1990, 1982, 1989,
            1988, 1987, 1986, 1985, 1984, 1998, 1999, 2000, 2001, 2002, 2003,
            2004, 2005, 2006, 2017, 2008, 2009, 2010, 2011, 2012, 2013, 2014,
            1983, 1981, 2016, 1956, 1962, 1961, 1960, 1959, 1958, 1957, 1955,
            1980, 1954, 1953, 1952, 1951, 1950, 2018, 1963, 1964, 1965, 1966,
            1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977,
            1978, 1979, 2015, 1820],
           dtype='int64')

In [11]:
#From 2016 to 1970 the distribution of the values of the years is equal.
for i in range(0,70):
    print(yeatlist.iat[i])

169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
169
90


In [12]:
df.gdppc.isnull().sum()

1976

In [13]:
# New dataframe with the needed columns
dff = df[['countrycode','country','year','gdppc']]

In [14]:
#Rename gdppc to a self exploratory name
#GDP per capita  gross domestic product (GDP) per capita
dff = dff.rename(columns={"gdppc": "GDP per capita"})
#Drop rows with any NaN value.
dff=dff.dropna()

In [15]:
dff.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 19706 entries, 3 to 21681
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   countrycode     19706 non-null  object 
 1   country         19706 non-null  object 
 2   year            19706 non-null  int64  
 3   GDP per capita  19706 non-null  float64
dtypes: float64(1), int64(1), object(2)
memory usage: 769.8+ KB


In [16]:
#Create a dataframe from 1970 to 2018
dff = dff[dff['year'] >= 1970]

In [17]:
dff['countrycode'] = dff['countrycode'].astype(str)

In [18]:
dff.year.unique().tolist()

[1970,
 1971,
 1972,
 1973,
 1974,
 1975,
 1976,
 1977,
 1978,
 1979,
 1980,
 1981,
 1982,
 1983,
 1984,
 1985,
 1986,
 1987,
 1988,
 1989,
 1990,
 1991,
 1992,
 1993,
 1994,
 1995,
 1996,
 1997,
 1998,
 1999,
 2000,
 2001,
 2002,
 2003,
 2004,
 2005,
 2006,
 2007,
 2008,
 2009,
 2010,
 2011,
 2012,
 2013,
 2014,
 2015,
 2016,
 2017,
 2018]

In [21]:
app = JupyterDash('First Graph')

colors = {
    
    'background': '#FFFFFF',
    'text': '#7FDBFF'
}

#App Layout 


app.layout = html.Div([
    
    
    html.Div([
        dcc.Graph( id='graph1', style={ 'height': 600,'width': '70%', 'display': 'flex', 'text-align': 'center'}),
    ]),
    

    html.Div([
        html.P("Change the year form 1970 to 2018"),

        dcc.Input( id='input_state', type='number', inputMode='numeric', value=2000,
                   max=2018, min=1970, step=1, required=True ),
        html.Button( id='apply_button', n_clicks=0, children='Apply' ),
        html.Div( id='output' ),
    ] , style={'width': '45%', 'text-align': 'center', 'fontColor' : 'black'} ),


])

#conencting the input state with the output state
   
@app.callback(
    [Output( 'output', 'children' ), Output( component_id='graph1', component_property='figure' )],
    [Input( component_id='apply_button', component_property='n_clicks')],
    [State( component_id='input_state', component_property='value' )]
)


def update_output(nclicks, value):
    if value == None:
        raise PreventUpdate
    else:
        #filter the data
        dfilter = dff.query("year=={}".format(value))

        choropleth = px.choropleth( dfilter, locations="countrycode",
                               color="GDP per capita",
                               hover_name="country",
                               projection= 'natural earth',
                               scope= 'world',
                               height=600,
                               width=800,
                               title='Yearly GDP per capita value by Country',
                               color_continuous_scale=px.colors.sequential.PuBu
                               
                             )

        choropleth.update_layout( title=dict( font=dict( size=26 ), x=0.55, xanchor='center' ),
                             margin=dict( l=65, r=65, t=55, b=55 ))
        
        

        return ('The year of {} is displayed on the map and {} different years has been applied so far.'.format( value, nclicks ), choropleth)


# Run app and display result inline in the notebook
app.run_server( host="localhost",port=8054)
#app.run_server()

Dash app running on http://localhost:8054/


# Second Graph (dashboard)

Original Dataset:
Supported Dataset: 'https://www.kaggle.com/hamzael1/world-countries-income-class-2020'




## Shows how the Birthrate, Deathrate and Infant Mortality are changing depending on different general predictors for each country.

The chosen predictors are the gross domestic product per capita, regions and countries. Those three predictors were chosen as based on the literature review they affect the most the three dependent variables. Three separate graphs were implemented with the x-axis having the values of each separate predictor and the y-axis having the value of one of three dependent variables. Therefore, the graph is a dashboard of one scatter plot for the GDP and two bar plots for the other two predictors. The different colours inside each graph represent the hue values and helps the viewer to dibranchiate the graph elements but at the same time to group them. The viewer can also choose to visualise the dashboard depending on the population as the first slider indicates.


### First dataset(original)

In [22]:
#Load the dataset
data12 = pd.read_csv('countries of the world.csv')

In [23]:
#Change the name of the columns
def to_snakecase (cols):
    map_dict = {}
    for col in cols:
        map_dict[col] = col.lower().strip().replace(' ', '_')
    return map_dict

data12.rename(to_snakecase(data12.columns), axis=1, inplace=True)

In [24]:
data12.head()

Unnamed: 0,country,region,population,area_(sq._mi.),pop._density_(per_sq._mi.),coastline_(coast/area_ratio),net_migration,infant_mortality_(per_1000_births),gdp_($_per_capita),literacy_(%),phones_(per_1000),arable_(%),crops_(%),other_(%),climate,birthrate,deathrate,agriculture,industry,service
0,Afghanistan,ASIA (EX. NEAR EAST),31056997,647500,480,0,2306,16307,700.0,360,32,1213,22,8765,1,466,2034,38.0,24.0,38.0
1,Albania,EASTERN EUROPE,3581655,28748,1246,126,-493,2152,4500.0,865,712,2109,442,7449,3,1511,522,232.0,188.0,579.0
2,Algeria,NORTHERN AFRICA,32930091,2381740,138,4,-39,31,6000.0,700,781,322,25,9653,1,1714,461,101.0,6.0,298.0
3,American Samoa,OCEANIA,57794,199,2904,5829,-2071,927,8000.0,970,2595,10,15,75,2,2246,327,,,
4,Andorra,WESTERN EUROPE,71201,468,1521,0,66,405,19000.0,1000,4972,222,0,9778,3,871,625,,,


In [25]:
data12.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 227 entries, 0 to 226
Data columns (total 20 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   country                             227 non-null    object 
 1   region                              227 non-null    object 
 2   population                          227 non-null    int64  
 3   area_(sq._mi.)                      227 non-null    int64  
 4   pop._density_(per_sq._mi.)          227 non-null    object 
 5   coastline_(coast/area_ratio)        227 non-null    object 
 6   net_migration                       224 non-null    object 
 7   infant_mortality_(per_1000_births)  224 non-null    object 
 8   gdp_($_per_capita)                  226 non-null    float64
 9   literacy_(%)                        209 non-null    object 
 10  phones_(per_1000)                   223 non-null    object 
 11  arable_(%)                          225 non-n

In [26]:
#Create new datadrame only with columns that are going to be used for the graphs
data3 = data12[['country', 'population' ,'region' ,'infant_mortality_(per_1000_births)' , 'gdp_($_per_capita)' , 'birthrate' , 'deathrate' ]]

In [27]:
data3.head()

Unnamed: 0,country,population,region,infant_mortality_(per_1000_births),gdp_($_per_capita),birthrate,deathrate
0,Afghanistan,31056997,ASIA (EX. NEAR EAST),16307,700.0,466,2034
1,Albania,3581655,EASTERN EUROPE,2152,4500.0,1511,522
2,Algeria,32930091,NORTHERN AFRICA,31,6000.0,1714,461
3,American Samoa,57794,OCEANIA,927,8000.0,2246,327
4,Andorra,71201,WESTERN EUROPE,405,19000.0,871,625


In [28]:
#Drop all rows that have at least one NaN value
data3 = data3.dropna()

In [29]:
#View the values of the region 
data3.region.unique()

array(['ASIA (EX. NEAR EAST)         ',
       'EASTERN EUROPE                     ',
       'NORTHERN AFRICA                    ',
       'OCEANIA                            ',
       'WESTERN EUROPE                     ',
       'SUB-SAHARAN AFRICA                 ', 'LATIN AMER. & CARIB    ',
       'C.W. OF IND. STATES ', 'NEAR EAST                          ',
       'NORTHERN AMERICA                   ',
       'BALTICS                            '], dtype=object)

In [30]:
#Change the values of the regions in a form that is more well recognized and self explanatory

data3['region'] = data3['region'].astype(str).str.rstrip()

data3['region'] = data3['region'].str.replace('ASIA (EX. NEAR EAST)', 'ASIA')
data3['region'] = data3['region'].str.replace('EASTERN EUROPE', 'EAST EUROPE')
data3['region'] = data3['region'].str.replace('NORTHERN AFRICA', 'NORTH AFRICA')
data3['region'] = data3['region'].str.replace('OCEANIA', 'OCEANIA')
data3['region'] = data3['region'].str.replace('WESTERN EUROPE', 'WEST EUROPE')
data3['region'] = data3['region'].str.replace('SUB-SAHARAN AFRICA', 'SUB-SAHARAN AFRICA')
data3['region'] = data3['region'].str.replace('LATIN AMER. & CARIB', 'LATIN AMERICA & CARIB')
data3['region'] = data3['region'].str.replace('C.W. OF IND. STATES',  'C.W OF INDEP. STATES')
data3['region'] = data3['region'].str.replace('NEAR EAST', 'NEAR EAST')
data3['region'] = data3['region'].str.replace('NORTHERN AMERICA', 'NORTH AMERICA')
data3['region'] = data3['region'].str.replace('BALTICS', 'BALTICS')


The default value of regex will change from True to False in a future version.


The default value of regex will change from True to False in a future version.


The default value of regex will change from True to False in a future version.



In [31]:
#Verify the changes
data3.region.unique()

array(['ASIA (EX. NEAR EAST)', 'EAST EUROPE', 'NORTH AFRICA', 'OCEANIA',
       'WEST EUROPE', 'SUB-SAHARAN AFRICA', 'LATIN AMERICA & CARIB',
       'C.W OF INDEP. STATES', 'NEAR EAST', 'NORTH AMERICA', 'BALTICS'],
      dtype=object)

In [32]:
#Change the type of the three y variables of our graphs
    
data3['infant_mortality_(per_1000_births)'] = data3['infant_mortality_(per_1000_births)'].astype(str).str.replace(',', '.').astype(float)
data3['birthrate'] = data3['birthrate'].astype(str).str.replace(',', '.').astype(float)
data3['deathrate'] = data3['deathrate'].astype(str).str.replace(',', '.').astype(float)


In [33]:
data3.head()

Unnamed: 0,country,population,region,infant_mortality_(per_1000_births),gdp_($_per_capita),birthrate,deathrate
0,Afghanistan,31056997,ASIA (EX. NEAR EAST),163.07,700.0,46.6,20.34
1,Albania,3581655,EAST EUROPE,21.52,4500.0,15.11,5.22
2,Algeria,32930091,NORTH AFRICA,31.0,6000.0,17.14,4.61
3,American Samoa,57794,OCEANIA,9.27,8000.0,22.46,3.27
4,Andorra,71201,WEST EUROPE,4.05,19000.0,8.71,6.25


In [34]:
data3.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 223 entries, 0 to 226
Data columns (total 7 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   country                             223 non-null    object 
 1   population                          223 non-null    int64  
 2   region                              223 non-null    object 
 3   infant_mortality_(per_1000_births)  223 non-null    float64
 4   gdp_($_per_capita)                  223 non-null    float64
 5   birthrate                           223 non-null    float64
 6   deathrate                           223 non-null    float64
dtypes: float64(4), int64(1), object(2)
memory usage: 13.9+ KB


In [35]:
#Check for duplicate values in the country column. There are no duplicate values.
data3.country.nunique()

223

In [36]:
#Find the min population which is going to be used for the slider of the dashboard.
data3.population.min()

7026

In [37]:
#Find the max population which is going to be used for the slider of the dashboard.
data3.population.max()

1313973713

In [38]:
#Strip out the white spaces that country' values have at the end.
data3['country'] = data3['country'].astype(str).str.rstrip()

In [39]:
data3.country.unique()

array(['Afghanistan', 'Albania', 'Algeria', 'American Samoa', 'Andorra',
       'Angola', 'Anguilla', 'Antigua & Barbuda', 'Argentina', 'Armenia',
       'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'The Bahamas',
       'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium',
       'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia',
       'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso',
       'Burma', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde',
       'Cayman Islands', 'Central African Republic', 'Chad', 'Chile',
       'China', 'Colombia', 'Comoros', 'Congo, Dem. Rep.',
       'Congo, Repub. of the', 'Costa Rica', "Cote d'Ivoire", 'Croatia',
       'Cuba', 'Cyprus', 'Czech Republic', 'Denmark', 'Djibouti',
       'Dominica', 'Dominican Republic', 'East Timor', 'Ecuador', 'Egypt',
       'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia',
       'Ethiopia', 'Faroe Islands', 'Fiji', 'Finlan

In [40]:
# Find the iso code alpha_3 for each country in order to connected with the new data base
def search(country):
    try:
        result = pycountry.countries.search_fuzzy( country )
    except Exception:
        return np.nan
    else:
        return result[0].alpha_3


iso2_code2 = {i: search(i) for i in data3["country"].unique()}



In [41]:
# Some values were not have the corect country code. Based on the 'https://www.iso.org/obp/ui/#home' we change
#the four countries with incorrect alpha_3 code.

iso2_code2.update({ 'Niger' : 'NER', 'Mayotte': 'MYT', 'Guadeloupe': 'GLP' ,  'Virgin Islands': 'VIR'})

iso2_code2

{'Afghanistan': 'AFG',
 'Albania': 'ALB',
 'Algeria': 'DZA',
 'American Samoa': 'ASM',
 'Andorra': 'AND',
 'Angola': 'AGO',
 'Anguilla': 'AIA',
 'Antigua & Barbuda': nan,
 'Argentina': 'ARG',
 'Armenia': 'ARM',
 'Aruba': 'ABW',
 'Australia': 'AUS',
 'Austria': 'AUT',
 'Azerbaijan': 'AZE',
 'The Bahamas': 'BHS',
 'Bahrain': 'BHR',
 'Bangladesh': 'BGD',
 'Barbados': 'BRB',
 'Belarus': 'BLR',
 'Belgium': 'BEL',
 'Belize': 'BLZ',
 'Benin': 'BEN',
 'Bermuda': 'BMU',
 'Bhutan': 'BTN',
 'Bolivia': 'BOL',
 'Bosnia and Herzegovina': 'BIH',
 'Botswana': 'BWA',
 'Brazil': 'BRA',
 'British Virgin Islands': 'VGB',
 'Brunei': 'BRN',
 'Bulgaria': 'BGR',
 'Burkina Faso': 'BFA',
 'Burma': nan,
 'Burundi': 'BDI',
 'Cambodia': 'KHM',
 'Cameroon': 'CMR',
 'Canada': 'CAN',
 'Cape Verde': nan,
 'Cayman Islands': 'CYM',
 'Central African Republic': 'CAF',
 'Chad': 'TCD',
 'Chile': 'CHL',
 'China': 'CHN',
 'Colombia': 'COL',
 'Comoros': 'COM',
 'Congo, Dem. Rep.': nan,
 'Congo, Repub. of the': nan,
 'Costa Ri

In [42]:
#Map the country codes to the country names.
data3["code"] = data3["country"].map(iso2_code2)  # column with the iso_code to be recognized

In [43]:
#Make the code column string.
data3['code'] = data3['code'].astype(str)

In [44]:
#Check for valid outcome.
data3.code.unique()

array(['AFG', 'ALB', 'DZA', 'ASM', 'AND', 'AGO', 'AIA', 'nan', 'ARG',
       'ARM', 'ABW', 'AUS', 'AUT', 'AZE', 'BHS', 'BHR', 'BGD', 'BRB',
       'BLR', 'BEL', 'BLZ', 'BEN', 'BMU', 'BTN', 'BOL', 'BIH', 'BWA',
       'BRA', 'VGB', 'BRN', 'BGR', 'BFA', 'BDI', 'KHM', 'CMR', 'CAN',
       'CYM', 'CAF', 'TCD', 'CHL', 'CHN', 'COL', 'COM', 'CRI', 'CIV',
       'HRV', 'CUB', 'CYP', 'CZE', 'DNK', 'DJI', 'DMA', 'DOM', 'ECU',
       'EGY', 'SLV', 'GNQ', 'ERI', 'EST', 'ETH', 'FRO', 'FJI', 'FIN',
       'FRA', 'GUF', 'PYF', 'GAB', 'GEO', 'DEU', 'GHA', 'GIB', 'GRC',
       'GRL', 'GRD', 'GLP', 'GUM', 'GTM', 'GGY', 'GIN', 'GNB', 'GUY',
       'HTI', 'HND', 'HKG', 'HUN', 'ISL', 'IND', 'IDN', 'IRN', 'IRQ',
       'IRL', 'IMN', 'ISR', 'ITA', 'JAM', 'JPN', 'JEY', 'JOR', 'KAZ',
       'KEN', 'KIR', 'KWT', 'KGZ', 'LVA', 'LBN', 'LSO', 'LBR', 'LBY',
       'LIE', 'LTU', 'LUX', 'MKD', 'MDG', 'MWI', 'MYS', 'MDV', 'MLI',
       'MLT', 'MHL', 'MTQ', 'MRT', 'MUS', 'MYT', 'MEX', 'MDA', 'MCO',
       'MNG', 'MSR',

In [45]:
#Drop 'nan' as they will interfere with the connection of the new dataset and will not give us any useful information.
data3.drop(index=data3[data3['code'] == 'nan'].index, inplace=True)

In [46]:
data3.head()

Unnamed: 0,country,population,region,infant_mortality_(per_1000_births),gdp_($_per_capita),birthrate,deathrate,code
0,Afghanistan,31056997,ASIA (EX. NEAR EAST),163.07,700.0,46.6,20.34,AFG
1,Albania,3581655,EAST EUROPE,21.52,4500.0,15.11,5.22,ALB
2,Algeria,32930091,NORTH AFRICA,31.0,6000.0,17.14,4.61,DZA
3,American Samoa,57794,OCEANIA,9.27,8000.0,22.46,3.27,ASM
4,Andorra,71201,WEST EUROPE,4.05,19000.0,8.71,6.25,AND


In [47]:
#Check for valid outcome.
data3.code.value_counts()

AFG    1
PAK    1
NPL    1
NLD    1
NCL    1
      ..
GRL    1
GRD    1
GLP    1
GUM    1
ZWE    1
Name: code, Length: 205, dtype: int64

In [48]:
data3.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 205 entries, 0 to 226
Data columns (total 8 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   country                             205 non-null    object 
 1   population                          205 non-null    int64  
 2   region                              205 non-null    object 
 3   infant_mortality_(per_1000_births)  205 non-null    float64
 4   gdp_($_per_capita)                  205 non-null    float64
 5   birthrate                           205 non-null    float64
 6   deathrate                           205 non-null    float64
 7   code                                205 non-null    object 
dtypes: float64(4), int64(1), object(3)
memory usage: 14.4+ KB


### Second dataset (external)

In [49]:
#We want to find the income group that each country belongs. Threfore  from 
#'https://www.kaggle.com/hamzael1/world-countries-income-class-2020' dataset we will use the income_group column and
#conect it with our original dataset.
new = pd.read_csv('countries_income_group.csv')

In [50]:
new.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 275 entries, 0 to 274
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Unnamed: 0    275 non-null    int64 
 1   Economy       265 non-null    object
 2   Code          265 non-null    object
 3   Region        219 non-null    object
 4   Income group  219 non-null    object
dtypes: int64(1), object(4)
memory usage: 10.9+ KB


In [51]:
new.head()

Unnamed: 0.1,Unnamed: 0,Economy,Code,Region,Income group
0,0,x,x,x,x
1,1,Afghanistan,AFG,South Asia,Low income
2,2,Albania,ALB,Europe & Central Asia,Upper middle income
3,3,Algeria,DZA,Middle East & North Africa,Upper middle income
4,4,American Samoa,ASM,East Asia & Pacific,Upper middle income


In [52]:
#Create a new dataframe with the needed columns.
df_new_0 = new[['Code','Income group']]    

In [53]:
#Leave out the first row as it does not contain any information.
df_new = df_new_0[1:]

In [54]:
#Drop all rows that have NaN values
df_new = df_new.dropna()

In [55]:
#Rename the columuns
df_new = df_new.rename(columns={"Code": "code", "Income group": "income_group"})

In [56]:
#Change the type of column code as we want to have the same fromat as the original column because base on them we will
#condact a left out join to the original dataset to create a new column with the income group.
df_new['code'] = df_new['code'].astype(str)

In [57]:
df_new.code.unique()

array(['AFG', 'ALB', 'DZA', 'ASM', 'AND', 'AGO', 'ATG', 'ARG', 'ARM',
       'ABW', 'AUS', 'AUT', 'AZE', 'BHS', 'BHR', 'BGD', 'BRB', 'BLR',
       'BEL', 'BLZ', 'BEN', 'BMU', 'BTN', 'BOL', 'BIH', 'BWA', 'BRA',
       'VGB', 'BRN', 'BGR', 'BFA', 'BDI', 'CPV', 'KHM', 'CMR', 'CAN',
       'CYM', 'CAF', 'TCD', 'CHI', 'CHL', 'CHN', 'COL', 'COM', 'COD',
       'COG', 'CRI', 'CIV', 'HRV', 'CUB', 'CUW', 'CYP', 'CZE', 'DNK',
       'DJI', 'DMA', 'DOM', 'ECU', 'EGY', 'SLV', 'GNQ', 'ERI', 'EST',
       'SWZ', 'ETH', 'FRO', 'FJI', 'FIN', 'FRA', 'PYF', 'GAB', 'GMB',
       'GEO', 'DEU', 'GHA', 'GIB', 'GRC', 'GRL', 'GRD', 'GUM', 'GTM',
       'GIN', 'GNB', 'GUY', 'HTI', 'HND', 'HKG', 'HUN', 'ISL', 'IND',
       'IDN', 'IRN', 'IRQ', 'IRL', 'IMN', 'ISR', 'ITA', 'JAM', 'JPN',
       'JOR', 'KAZ', 'KEN', 'KIR', 'PRK', 'KOR', 'XKX', 'KWT', 'KGZ',
       'LAO', 'LVA', 'LBN', 'LSO', 'LBR', 'LBY', 'LIE', 'LTU', 'LUX',
       'MAC', 'MDG', 'MWI', 'MYS', 'MDV', 'MLI', 'MLT', 'MHL', 'MRT',
       'MUS', 'MEX',

In [58]:
#Drop the columns with 'nan' values as they will affect the join of the two datasets and they not contain any
#useful information.
df_new.drop(index=df_new[df_new['code'] == 'nan'].index, inplace=True)

In [59]:
#Check for valid outcome.
df_new.code.value_counts()

AFG    1
PAK    1
NPL    1
NLD    1
NCL    1
      ..
GRC    1
GRL    1
GRD    1
GUM    1
ZWE    1
Name: code, Length: 218, dtype: int64

In [60]:
df_new.head()

Unnamed: 0,code,income_group
1,AFG,Low income
2,ALB,Upper middle income
3,DZA,Upper middle income
4,ASM,Upper middle income
5,AND,High income


In [61]:
df_new.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 218 entries, 1 to 218
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   code          218 non-null    object
 1   income_group  218 non-null    object
dtypes: object(2)
memory usage: 13.2+ KB


In [62]:
#Check for valid outcome.
list1= data3['code'].tolist()
list2=df_new['code'].tolist()
common= set(list1) & set(list2)  
len(common)

195

### Final dataset of the dashboard

In [63]:
# Merge the two datasets with outer left join to keep the original dataset values untouched and add the income group
#only if the two 'code' column are matched. The code columns contain only unique values.
data_final = pd.merge(data3, df_new, how='left', on=['code'])

In [64]:
data_final.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 205 entries, 0 to 204
Data columns (total 9 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   country                             205 non-null    object 
 1   population                          205 non-null    int64  
 2   region                              205 non-null    object 
 3   infant_mortality_(per_1000_births)  205 non-null    float64
 4   gdp_($_per_capita)                  205 non-null    float64
 5   birthrate                           205 non-null    float64
 6   deathrate                           205 non-null    float64
 7   code                                205 non-null    object 
 8   income_group                        195 non-null    object 
dtypes: float64(4), int64(1), object(4)
memory usage: 16.0+ KB


In [65]:
#Drop all rows with NaN as they do not contain useful information.
data_final= data_final.dropna()

In [66]:
data_final.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 195 entries, 0 to 204
Data columns (total 9 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   country                             195 non-null    object 
 1   population                          195 non-null    int64  
 2   region                              195 non-null    object 
 3   infant_mortality_(per_1000_births)  195 non-null    float64
 4   gdp_($_per_capita)                  195 non-null    float64
 5   birthrate                           195 non-null    float64
 6   deathrate                           195 non-null    float64
 7   code                                195 non-null    object 
 8   income_group                        195 non-null    object 
dtypes: float64(4), int64(1), object(4)
memory usage: 15.2+ KB


In [68]:
app = JupyterDash('Second Graph')

colors = {
    'background': '#FFFFFF',
    'text': '#7FDBFF'
}

# Reading The Dataset 



#App Layout 

app.layout = html.Div(style={'backgroundColor': colors['background']}, children=[
      html.H1('Studying how Birthrate, Deathrate and Infant Mortality are changing depending on different general predictors for each country.', style={'textAlign':'center'}),
      html.Div([
          html.Div([ 
              html.Label('Population: Please select the population value that is in your interest.'),
              dcc.Slider(
                  id='slider-population',
                  min=data_final.population.min(),
                  max=data_final.population.max(),
                  marks={
                    1: '',
                    10000000 : '10M',
                    40000000 : '40M',  
                    70000000 : '70M',
                    100000000 : '100M',
                    150000000 : '150M',
                    300000000 : '300M',
                    600000000 : '600M',  
                    1000000000 : '1B',
                    1313973713 : '1.3B' 
                  },
                  value=data_final.population.min(),
                  step=10000,
                  updatemode='drag'
              )
              
              
          ]),
          html.Div([
              html.Label('Interest Variable (Dependent variable of Y-axis)'),
              dcc.Dropdown(
                  id='dependent-variable',
                  options=[{'label':'Infant Mortality (per_1000_births)', 'value':'infant_mortality_(per_1000_births)'},
                           {'label': 'Birthrate', 'value':'birthrate'},
                           {'label': 'Deathrate', 'value':'deathrate'}],
                  value='deathrate' 
     
              )
          ])
      ], style = {'width':'99%','margin':'auto'}),  
    
      
            html.Div([ 
                      dcc.Graph(
                          id='dependent-variable-gdppc',
                      ),   


                     html.Div( dcc.Graph(
                          id='dependent-variable-region',
                      ), style = {'width': '50%', 'display': 'inline-block'}),


                 html.Div(
                      dcc.Graph(
                          id='dependent-variable-country',
                      ),style = {'width': '50%', 'display': 'inline-block'})


              ], style = {'width':'99%','margin':'auto'})

])



def axisy_variable(var):
    if var == 'infant_mortality_(per_1000_births)':
        x = 'Infant Mortality'
    elif var == 'birthrate':
        x = 'Birthrate of the total population'
    elif var == 'deathrate':
        x='Deathrate of the total population'
    return x

  
    



#Conencting the input state with the output state
    
    
#Graph 1: Dependent Variable VS GDP per capita (scatter plot)

@app.callback(Output('dependent-variable-gdppc', 'figure'),
              [Input('slider-population', 'value'),
               Input('dependent-variable', 'value')]) 


def update_scatterplot_graph_one(population_value, dependent_variable):


    sorted = data_final[data_final.population <= population_value]
    fig = px.scatter(sorted,
                      x='gdp_($_per_capita)',
                      y= dependent_variable,
                      size='population',
                      color='income_group',
                      hover_name='country',
                      template='plotly_white',
                      labels={'gdp_($_per_capita)':'Gross Domestic Product per catita',
                              'y': axisy_variable(dependent_variable)},
                      title='Interest Variable VS Gross Domestic Product per capita')
    fig.update_layout(transition_duration=500)
    return fig





#Graph 2: Dependent Variable VS region (bar plot)

@app.callback(Output('dependent-variable-region', 'figure'),
              [Input('slider-population', 'value'),
               Input('dependent-variable', 'value')])  


def update_barplot_graph_two(population_value, dependent_variable):
    sorted = data_final[data_final.population <= population_value].groupby(by='region').sum().reset_index()
    fig = px.bar(sorted,
                  x='region',
                  y=dependent_variable,
                  color='region', 
                  template='plotly_white',
                  labels={'region':'Regions',
                        'infant_mortality_(per_1000_births)':'Infant Mortality',
                        'birthrate':'Total Birthrate',
                        'deathrate':'Total Deathrate'},
                 
                  title='Interest Variable VS Region')
    fig.update_layout()
    return fig







#Graph 3: Dependent Variable VS country (bar plot)

@app.callback(Output('dependent-variable-country', 'figure'),
              [Input('slider-population', 'value'),
               Input('dependent-variable', 'value')])

def update_barplot_graph_three(population_value, dependent_variable):
    sorted = data_final[data_final.population <= population_value]
    fig = px.bar(sorted, 
                x='country', 
                y= dependent_variable , 
                color='income_group',
                template='plotly_white',
                labels={'country':'Country',
                        'infant_mortality_(per_1000_births)':'Infant Mortality',
                        'birthrate':'Total Birthrate',
                        'deathrate':'Total Deathrate'},
                title='Interest Variable VS Countries')
    fig.update_layout()
    return fig




# Run app and display result inline in the notebook
app.run_server(host="localhost",port=8053)
#app.run_server()

Dash app running on http://localhost:8053/
