In [1]:
import pandas as pd
import numpy as np

## 2. Data Exploration

Here, simple aggregates and descriptions of the data will be explored for each of the 3 datasets.

### Dataset 1 - Income Inequality

In [2]:
inequality = pd.read_csv("income_inequality_cleaned.csv")

In [3]:
inequality.head()

Unnamed: 0,continent,country,year,gini_index,democracy_index,gdp_per_capita,invest_%_gdp,tax_%_gdp
0,Asia,Afghanistan,2006,36.8,30.6,1120.0,23.4,6.88
1,Asia,Afghanistan,2007,36.8,30.4,1250.0,19.9,5.23
2,Asia,Afghanistan,2008,36.8,30.2,1270.0,18.9,6.04
3,Asia,Afghanistan,2009,36.8,27.5,1500.0,17.9,8.44
4,Asia,Afghanistan,2010,36.8,24.8,1670.0,17.9,9.12


This dataset has 2535 rows and 8 columns.

In [4]:
inequality.shape

(2535, 8)

There are 195 unique countries in this dataset.

In [5]:
len(inequality["country"].unique())

195

In [6]:
inequality["country"].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola',
       'Antigua and Barbuda', 'Argentina', 'Armenia', 'Australia',
       'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh',
       'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bhutan',
       'Bolivia', 'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'Brunei', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia',
       'Cameroon', 'Canada', 'Cape Verde', 'Central African Republic',
       'Chad', 'Chile', 'China', 'Colombia', 'Comoros',
       'Congo, Dem. Rep.', 'Congo, Rep.', 'Costa Rica', "Cote d'Ivoire",
       'Croatia', 'Cuba', 'Cyprus', 'Czech Republic', 'Denmark',
       'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt',
       'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia',
       'Ethiopia', 'Fiji', 'Finland', 'France', 'Gabon', 'Gambia',
       'Georgia', 'Germany', 'Ghana', 'Greece', 'Grenada', 'Guatemala',
       'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', '

It covers data over a 13 year period from 2006-2018.

In [7]:
inequality["year"].unique()

array([2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016,
       2017, 2018], dtype=int64)

**1.1 Which countries had the greatest income inequality in 2010? (highest GINI index)**

In [8]:
# Filtering for 2010
i2010 = inequality[["continent", "country", "year", "gini_index"]]
i2010 = i2010[i2010["year"] == 2010]

# Using nlargest to find the top 5 GINI index scores
i2010.nlargest(5, "gini_index")

Unnamed: 0,continent,country,year,gini_index
2045,Africa,South Africa,2010,63.3
1538,Africa,Namibia,2010,61.0
290,Africa,Botswana,2010,60.8
2162,Americas,Suriname,2010,60.8
420,Africa,Central African Republic,2010,56.2


In 2010, South Africa had the greatest income inequality with a GINI index of 63.3. Of the top 5 countries with the greatest income inequality, 4 out of 5 of them are in Africa.

**1.2 Which countries had the lowest income inequality in 2010? (lowest GINI index)**

In [9]:
# Filtering for 2010
i2010 = inequality[["continent", "country", "year", "gini_index"]]
i2010 = i2010[i2010["year"] == 2010]

# Using nsmallest to find the bottom 5 GINI index scores
i2010.nsmallest(5, "gini_index")

Unnamed: 0,continent,country,year,gini_index
2006,Europe,Slovenia,2010,24.8
2383,Europe,Ukraine,2010,25.2
1668,Europe,Norway,2010,26.0
589,Europe,Czech Republic,2010,26.3
1993,Europe,Slovak Republic,2010,26.6


In 2010, Slovenia had the lowest income inequality with a GINI index of 24.8. All 5 of the top 5 countries with the lowest income inequality are in Europe.

**1.3 Which countries had the highest GDP per capita in 2012?**

In [10]:
# Filtering for 2012
i2012 = inequality[["continent", "country", "year", "gdp_per_capita"]]
i2012 = i2012[i2012["year"] == 2012]

# Using nlargest to find the top 5 GDP per capita
i2012.nlargest(5, "gdp_per_capita")

Unnamed: 0,continent,country,year,gdp_per_capita
1826,Asia,Qatar,2012,120000.0
1293,Europe,Luxembourg,2012,89500.0
318,,Brunei,2012,82200.0
1163,Asia,Kuwait,2012,78700.0
1982,Asia,Singapore,2012,77500.0


In 2012, Qatar had the highest GDP per capita at \\$120,000. This was followed by Luxembourg (\\$89,500), Brunei (\\$82,200), Kuwait (\\$78,700) and Singapore (\\$77,500).

**1.4 For countries in Asia, what was the greatest difference in income inequality for the year 2018?**

In [11]:
# Filtering for 2018
i2018 = inequality[["continent", "country", "year", "gini_index"]]
i2018 = i2018[i2018["year"] == 2018]

# Filtering for Asia
i2018_asia = i2018[i2018["continent"] == "Asia"]

In [12]:
greatest_ie = i2018_asia.nlargest(1, "gini_index")
greatest_ie

Unnamed: 0,continent,country,year,gini_index
1793,Asia,Philippines,2018,42.2


The Philippines had the greatest income inequality in Asia in 2018 with a GINI index of 42.2.

In [13]:
lowest_ie = i2018_asia.nsmallest(1, "gini_index")
lowest_ie

Unnamed: 0,continent,country,year,gini_index
1182,Asia,Kyrgyz Republic,2018,26.8


Kyrgyzstan, officially the Kyrgyz Republic had the lowest income inequality in Asia in 2018 with a GINI index of 26.8.

In [14]:
difference = greatest_ie.iloc[0]["gini_index"] - lowest_ie.iloc[0]["gini_index"]
difference

15.400000000000002

For countries in Asia in 2018, the greatest difference in income inequality was between the Philippines and the Kyrgyz Republic, with a GINI index difference of 15.4.

### Dataset 2 - Perceived Crime

In [15]:
crime = pd.read_csv("perceived_crime_cleaned.csv")

In [16]:
crime.head()

Unnamed: 0,year,rank,country,crime_index,safety_index
0,2012,1,Venezuela,84.74,15.26
1,2012,2,South Africa,78.12,21.88
2,2012,3,Puerto Rico,73.06,26.94
3,2012,4,Malaysia,70.88,29.12
4,2012,5,United States,64.93,35.07


This dataset has 1072 rows and 5 columns.

In [17]:
crime.shape

(1072, 5)

There are 149 unique countries in this dataset.

In [18]:
len(crime["country"].unique())

149

In [19]:
crime["country"].unique()

array(['Venezuela', 'South Africa', 'Puerto Rico', 'Malaysia',
       'United States', 'Algeria', 'Mexico', 'Peru', 'Lebanon',
       'Bangladesh', 'Brazil', 'Pakistan', 'Italy', 'Chile', 'Ecuador',
       'Ukraine', 'Philippines', 'Greece', 'Iran', 'Russia', 'Argentina',
       'Colombia', 'Slovakia', 'New Zealand', 'Ireland', 'United Kingdom',
       'India', 'Denmark', 'France', 'China', 'Egypt', 'Austria',
       'Uruguay', 'Bulgaria', 'Israel', 'Portugal', 'Belgium',
       'Australia', 'Bosnia And Herzegovina', 'Canada', 'Thailand',
       'Lithuania', 'Poland', 'Azerbaijan', 'Jamaica', 'Czech Republic',
       'Sweden', 'United Arab Emirates', 'Slovenia', 'Morocco', 'Iceland',
       'Spain', 'Saudi Arabia', 'Panama', 'Croatia', 'Georgia', 'Hungary',
       'Cyprus', 'Albania', 'Romania', 'Serbia', 'Netherlands', 'Turkey',
       'Switzerland', 'Estonia', 'Indonesia', 'Germany', 'Malta',
       'Singapore', 'Jordan', 'Norway', 'Finland', 'Hong Kong', 'Taiwan',
       'Japan', 'G

It covers data over a 9 year period from 2012-2020.

In [20]:
crime["year"].unique()

array([2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020], dtype=int64)

**2.1 Which countries were the most dangerous in 2020?**

In [21]:
# Filtering for 2020
c2020 = crime[crime["year"] == 2020]

# Using nsmallest to find the top 5 ranked for crime index
c2020.nsmallest(5, "rank")

Unnamed: 0,year,rank,country,crime_index,safety_index
943,2020,1,Venezuela,84.49,15.51
944,2020,2,Papua New Guinea,81.93,18.07
945,2020,3,South Africa,77.49,22.51
946,2020,4,Afghanistan,76.23,23.77
947,2020,5,Honduras,76.11,23.89


In 2020, Venezuela was the most dangerous country with a crime index of 84.49, followed by Papua New Guinea (81.93), South Africa (77.49), Afghanistan (76.23) and Honduras (76.11).

**2.2 Which countries were the least dangerous in 2020?**

In [22]:
# Filtering for 2020
c2020 = crime[crime["year"] == 2020]

# Using nlargest to find the bottom 5 ranked for crime index
c2020.nlargest(5, "rank")

Unnamed: 0,year,rank,country,crime_index,safety_index
1071,2020,129,Qatar,11.86,88.14
1070,2020,128,Taiwan,15.65,84.35
1069,2020,127,United Arab Emirates,15.7,84.3
1068,2020,126,Georgia,20.21,79.79
1067,2020,125,Japan,20.66,79.34


In 2020, Qatar was the least dangerous country with a crime index of 11.86, followed by Taiwan (15.65), UAE (15.70), Georgia (20.21) and Japan (20.66).

**2.3 What countries were the most dangerous over the years?**

In [23]:
# Filtering for rank = 1
highest_crime = crime[crime["rank"] == 1]

# Sorting by year
highest_crime.sort_values(by = ["year"])

Unnamed: 0,year,rank,country,crime_index,safety_index
0,2012,1,Venezuela,84.74,15.26
75,2013,1,Venezuela,85.7,14.3
193,2014,1,Afghanistan,82.51,17.49
321,2015,1,South Sudan,85.32,14.68
468,2016,1,Venezuela,84.44,15.56
585,2017,1,Venezuela,85.28,14.72
710,2018,1,Venezuela,82.59,17.41
825,2019,1,Venezuela,83.23,16.77
943,2020,1,Venezuela,84.49,15.51


Venezuela consistently is the most dangerous country, with the highest crime index scores in years 2012-2013 and 2016-2020. Afghanistan was the most dangerous country in 2014 with a crime index of 82.51, and South Sudan was the most dangerous country in 2015 with a crime index of 85.32.

**2.4 What was the average crime index for Australia over the years 2012-2020?**

In [24]:
# Filtering for Australia
aussie = crime[crime["country"] == "Australia"]

# Averaging the crime index values across all the years
avg_ci = aussie["crime_index"].mean()
avg_ci

42.29555555555556

Australia had an average crime index of 42.30 over the years 2012-2020.

### Dataset 3 - Human Freedom Index

In [25]:
freedom = pd.read_csv("human_freedom_cleaned.csv")

In [26]:
freedom.head()

Unnamed: 0,country,year,hf_score,hf_rank,hf_quartile,pf_rol_procedural,pf_rol_civil,pf_rol_criminal,pf_rol,pf_ss_homicide,...,ef_regulation_business_adm,ef_regulation_business_bureaucracy,ef_regulation_business_start,ef_regulation_business_bribes,ef_regulation_business_licensing,ef_regulation_business_compliance,ef_regulation_business,ef_regulation,ef_score,ef_rank
0,Albania,2008,7.68,45.0,2.0,5.7,5.1,4.1,4.9,8.8,...,4.0,6.7,9.7,4.3,4.8,7.3,6.1,6.6,7.28,46.0
1,Albania,2009,7.74,44.0,2.0,5.7,5.1,4.1,4.9,8.9,...,5.0,6.7,9.7,4.9,4.8,6.0,6.2,6.4,7.32,42.0
2,Albania,2010,7.71,46.0,2.0,5.7,5.1,4.1,4.9,8.3,...,5.7,6.5,9.5,4.8,4.8,5.8,6.2,6.8,7.37,39.0
3,Albania,2011,7.52,53.0,2.0,5.7,5.1,4.1,4.9,8.1,...,5.2,6.5,9.6,4.2,4.8,6.0,6.1,7.3,7.37,42.0
4,Albania,2012,7.44,55.0,2.0,5.0,4.9,3.6,4.5,7.8,...,4.8,6.0,9.6,3.4,,6.0,6.0,7.2,7.31,48.0


This dataset has 1620 rows and 118 columns.

In [27]:
freedom.shape

(1620, 118)

There are 162 unique countries in this dataset.

In [28]:
len(freedom["country"].unique())

162

In [29]:
freedom["country"].unique()

array(['Albania', 'Algeria', 'Angola', 'Argentina', 'Armenia',
       'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain',
       'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin',
       'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana',
       'Brazil', 'Brunei Darussalam', 'Bulgaria', 'Burkina Faso',
       'Burundi', "C?te d'Ivoire", 'Cambodia', 'Cameroon', 'Canada',
       'Cape Verde', 'Central Afr. Rep.', 'Chad', 'Chile', 'China',
       'Colombia', 'Congo, Dem. R.', 'Congo, Rep. Of', 'Costa Rica',
       'Croatia', 'Cyprus', 'Czech Rep.', 'Denmark', 'Dominican Rep.',
       'Ecuador', 'Egypt', 'El Salvador', 'Estonia', 'Eswatini',
       'Ethiopia', 'Fiji', 'Finland', 'France', 'Gabon', 'Gambia, The',
       'Georgia', 'Germany', 'Ghana', 'Greece', 'Guatemala', 'Guinea',
       'Guinea-Bissau', 'Guyana', 'Haiti', 'Honduras', 'Hong Kong',
       'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran', 'Iraq',
       'Ireland', 'Israel', 'Italy', 'Jamaic

It covers data over a 10 year period from 2008-2017.

In [30]:
freedom["year"].unique()

array([2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017],
      dtype=int64)

**3.1 What were the top 5 countries with the highest human freedom scores in 2017?**

In [31]:
# Filtering for 2017
f2017 = freedom[["country", "year", "hf_score", "hf_rank"]]
f2017 = f2017[f2017["year"] == 2017]

# Filtering for ranks 1-5 human freedom
f2017_top_5 = f2017[f2017["hf_rank"] <= 5]

# Or can use nsmallest
# f2017.nsmallest(5, "hf_rank")

# Sorting by rank
f2017_top_5.sort_values(by = ["hf_rank"])

Unnamed: 0,country,year,hf_score,hf_rank
1079,New Zealand,2017,8.88,1.0
1409,Switzerland,2017,8.82,2.0
649,Hong Kong,2017,8.81,3.0
289,Canada,2017,8.65,4.0
59,Australia,2017,8.62,5.0


New Zealand had the highest human freedom score of 8.88 in 2017, followed by Switzerland (8.82), Hong Kong (8.81), Canada (8.65) and Australia (8.62).

**3.2 What were the countries with the highest human freedom scores each year?**

In [32]:
# Filtering for rank 1 human freedom
highest_freedom = freedom[["country", "year", "hf_score", "hf_rank"]]
highest_freedom = highest_freedom[highest_freedom["hf_rank"] == 1]

# Sorting by year
highest_freedom.sort_values(by = ["year"])

Unnamed: 0,country,year,hf_score,hf_rank
640,Hong Kong,2008,9.12,1.0
641,Hong Kong,2009,9.06,1.0
642,Hong Kong,2010,9.02,1.0
643,Hong Kong,2011,9.02,1.0
644,Hong Kong,2012,8.99,1.0
645,Hong Kong,2013,8.94,1.0
1076,New Zealand,2014,8.95,1.0
1077,New Zealand,2015,8.89,1.0
1078,New Zealand,2016,8.9,1.0
1079,New Zealand,2017,8.88,1.0


Hong Kong had the highest human freedom score from years 2008-2013, until New Zealand took over from 2014-2017.

**3.3 What were the bottom 5 countries with the lowest personal freedom scores in 2016?**

In [33]:
# Filtering for 2016
p2016 = freedom[["country", "year", "pf_score", "pf_rank"]]
p2016 = p2016[p2016["year"] == 2016]

# Using nlargest to find the bottom 5 ranking in personal freedom
p2016.nlargest(5, "pf_rank")

Unnamed: 0,country,year,pf_score,pf_rank
1598,"Yemen, Rep.",2016,2.22,162.0
1418,Syria,2016,2.45,161.0
708,Iraq,2016,3.21,160.0
448,Egypt,2016,3.7,159.0
1378,Sudan,2016,3.82,158.0


Yemen had the lowest personal freedom score of 2.22 in 2016, followed closely by Syria (2.45), Iraq (3.21), Egypt (3.70) and Sudan (3.82).

**3.4 What were the top 5 countries with the highest economic freedom scores in 2015?**

In [34]:
# Filtering for 2015
e2015 = freedom[["country", "year", "ef_score", "ef_rank"]]
e2015 = e2015[e2015["year"] == 2015]

# Using nsmallest to find the top 5 ranking in economic freedom
e2015.nsmallest(5, "ef_rank")

Unnamed: 0,country,year,ef_score,ef_rank
647,Hong Kong,2015,8.88,1.0
1317,Singapore,2015,8.68,2.0
1077,New Zealand,2015,8.54,3.0
1407,Switzerland,2015,8.42,4.0
717,Ireland,2015,8.32,5.0


Hong Kong had the highest economic freedom score of 8.88 in 2015, followed by Singapore (8.68), New Zealand (8.54), Switzerland (8.42) and Ireland (8.32).

**3.5 What was the average human freedom score for Australia over the years 2008-2017?**

In [35]:
# Filtering for Australia
ozzie = freedom[freedom["country"] == "Australia"]

# Averaging the human freedom scores across all the years
avg_hf = ozzie["hf_score"].mean()
avg_hf

8.644000000000002

Australia had an average human freedom score of 8.64 over the years 2008-2017.