### Introduction

Life is very precious to us. Every minute, every hour, there would be several deaths happen around the world. However, not everyone has ever questioned that how many infant's lives are lost every day. Therefore, we will analyze and visualize the data to answer our primary questions: **What are the top 5 countries with the highest infant mortality rate from 2009 to 2019? Is the male infant mortality rate higher than the female infant mortality rate?**

### Data Source

The source of [this dataset](https://www.kaggle.com/komalkhetlani/infant-mortality) is the publicly available dataset at Unicef. The dataset contains the mortality rate for individual countries over the years for both genders.

Infant mortality is the death of an infant before his or her first birthday. The infant mortality rate is the number of infant deaths for every 1,000 live births.

In [106]:
# import essential libraries
import pandas as pd 
import plotly.express as px
from bubbly.bubbly import bubbleplot 
import plotly.graph_objects as go
import iplot
import matplotlib.pyplot as plt

In [105]:
# read in csv
infant_df = pd.read_csv('data/InfantMortalityRate.csv', encoding='ISO-8859-1')
infant_df.head()

Unnamed: 0,Country,Infant Mortality Rate,Gender,Year
0,Afghanistan,43.050731,Female,2019.0
1,Angola,44.851045,Female,2019.0
2,Albania,7.659442,Female,2019.0
3,Andorra,2.555451,Female,2019.0
4,United Arab Emirates,5.716825,Female,2019.0


In [19]:
# get info
infant_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7625 entries, 0 to 7624
Data columns (total 4 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Country                7623 non-null   object 
 1   Infant Mortality Rate  7623 non-null   float64
 2   Gender                 7623 non-null   object 
 3   Year                   7623 non-null   float64
dtypes: float64(2), object(2)
memory usage: 238.4+ KB


In [17]:
# check unique_values
infant_df.nunique()

Country                   229
Infant Mortality Rate    7553
Gender                      3
Year                       11
dtype: int64

In [117]:
infant_df['Country'].value_counts()

sub-Saharan Africa                  66
North America                       66
Malta                               33
Bolivia (Plurinational State of)    33
Suriname                            33
                                    ..
Chile                               33
World Bank (low income)             33
Papua New Guinea                    33
Seychelles                          33
Colombia                            33
Name: Country, Length: 229, dtype: int64

In [115]:
infant_df.groupby(by=['Country', 'Year', 'Gender']).sum()['Infant Mortality Rate'].to_frame()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Infant Mortality Rate
Country,Year,Gender,Unnamed: 3_level_1
Afghanistan,2009.0,Female,62.316809
Afghanistan,2009.0,Male,70.513185
Afghanistan,2009.0,Total,66.525823
Afghanistan,2010.0,Female,60.022910
Afghanistan,2010.0,Male,67.945090
...,...,...,...
sub-Saharan Africa,2018.0,Male,116.443601
sub-Saharan Africa,2018.0,Total,106.447435
sub-Saharan Africa,2019.0,Female,93.468053
sub-Saharan Africa,2019.0,Male,113.589196


In [112]:
infant_df.loc[infant_df['Country'] == 'sub-Saharan Africa']

Unnamed: 0,Country,Infant Mortality Rate,Gender,Year
191,sub-Saharan Africa,46.545764,Female,2019.0
211,sub-Saharan Africa,46.922289,Female,2019.0
422,sub-Saharan Africa,47.779412,Female,2018.0
442,sub-Saharan Africa,48.166960,Female,2018.0
653,sub-Saharan Africa,49.057202,Female,2017.0
...,...,...,...,...
7141,sub-Saharan Africa,63.815406,Total,2011.0
7352,sub-Saharan Africa,65.211275,Total,2010.0
7372,sub-Saharan Africa,65.761902,Total,2010.0
7583,sub-Saharan Africa,67.233529,Total,2009.0


In [13]:
# check gender
infant_df['Gender'].value_counts()

Total     2541
Female    2541
Male      2541
Name: Gender, dtype: int64

In [22]:
# remove total
infant_df = infant_df[infant_df['Gender'].isin(['Male', 'Female'])]

In [24]:
# check unique_values again
infant_df.nunique()

Country                   229
Infant Mortality Rate    5038
Gender                      2
Year                       11
dtype: int64

In [93]:
# read in population
population_df = pd.read_csv('data/population_by_country_2020.csv')
population_df = population_df[['Country (or dependency)', 'Population (2020)']]
population_df.columns = ['Country', 'Population']
population_df

Unnamed: 0,Country,Population
0,China,1438207241
1,India,1377233523
2,United States,330610570
3,Indonesia,272931713
4,Pakistan,219992900
...,...,...
230,Montserrat,4991
231,Falkland Islands,3458
232,Niue,1624
233,Tokelau,1354


In [99]:
# merge infant_df and population_df
df = pd.merge(infant_df, population_df, how='inner', on='Country')
df = df.groupby(by='Country').sum()['Infant Mortality Rate'].reset_index()
df

Unnamed: 0,Country,Infant Mortality Rate
0,Afghanistan,1836.316824
1,Albania,317.305845
2,Algeria,723.774164
3,Andorra,117.197459
4,Angola,2061.312619
...,...,...
172,Uzbekistan,739.441060
173,Vanuatu,786.850621
174,Yemen,1432.194180
175,Zambia,1575.709710


In [68]:
# top 10 countries with highest infant mortality rate
infant_df.groupby(by='Country').sum()['Infant Mortality Rate'].reset_index().sort_values(by='Infant Mortality Rate', ascending=False).head(10)

Unnamed: 0,Country,Infant Mortality Rate
228,sub-Saharan Africa,2595.89022
176,Sierra Leone,2089.044569
33,Central African Republic,2016.639642
182,Somalia,1878.761008
141,Nigeria,1762.09817
36,Chad,1704.474006
50,Democratic Republic of the Congo,1665.65384
111,Lesotho,1575.744909
217,West and Central Africa,1551.863752
63,Equatorial Guinea,1549.337901


In [69]:
# top 10 countries with lowest infant mortality rate
infant_df.groupby(by='Country').sum()['Infant Mortality Rate'].reset_index().sort_values(by='Infant Mortality Rate').head(10)

Unnamed: 0,Country,Infant Mortality Rate
88,Iceland,40.160808
170,San Marino,42.466743
97,Japan,45.906678
179,Slovenia,47.322903
72,Finland,47.49811
177,Singapore,47.757848
147,Norway,50.812804
193,Sweden,51.478725
115,Luxembourg,52.207147
46,Cyprus,52.531803


In [27]:
# make subsets for male and female infants
male_infants = infant_df.loc[infant_df['Gender'] == 'Male']
female_infants = infant_df.loc[infant_df['Gender'] == 'Female']

Unnamed: 0,Country,Infant Mortality Rate,Gender,Year
0,Afghanistan,43.050731,Female,2019.0
1,Angola,44.851045,Female,2019.0
2,Albania,7.659442,Female,2019.0
3,Andorra,2.555451,Female,2019.0
4,United Arab Emirates,5.716825,Female,2019.0
...,...,...,...,...
2536,Samoa,15.318312,Female,2009.0
2537,Yemen,40.831265,Female,2009.0
2538,South Africa,33.492393,Female,2009.0
2539,Zambia,49.467612,Female,2009.0


In [58]:
# male infants per year
mi_year = male_infants.groupby('Year').sum()['Infant Mortality Rate'].reset_index()

# female infants per year
fi_year = female_infants.groupby('Year').sum()['Infant Mortality Rate'].reset_index()

In [59]:
# Line chart 
fig = go.Figure()
fig.add_trace(go.Scatter(x=mi_year['Year'], 
                         y=mi_year['Infant Mortality Rate'],
                         mode='lines+markers',
                         name='Male',
                         line=dict(color='#30beff', width=2)
                        ))

fig.add_trace(go.Scatter(x=fi_year['Year'], 
                         y=fi_year['Infant Mortality Rate'],
                         mode='lines+markers',
                         name='Female',
                         line=dict(color='#ff78e2', width=2)
                        ))

fig.update_layout(
    title='Male and Female Infant Mortality Rate',
    xaxis_tickfont_size=14,
    yaxis=dict(
        title='Infant Mortality Rate',
        titlefont_size=16,
        tickfont_size=14,
    ),
    legend=dict(
        x=0.88,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    )
)
fig.show()

In [100]:
fig = px.scatter(df, x='Population', y='Infant Mortality Rate', color='Country', 
                 height=700, hover_name='Country', log_x=True, log_y=True, 
                 title='Population vs Infant Mortality Rate',
                 color_discrete_sequence=px.colors.qualitative.Vivid)
fig.update_traces(textposition='top center')
# fig.update_layout(showlegend=False)
# fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of ['Country', 'Infant Mortality Rate'] but received: Population