## Index of Economic Freedom

The data set is hosted by The Heritage Foundation and can be found under this link: https://www.heritage.org/index/explore

I added two columns to the original dataset (which had 14 columns), “latitude” and “longitude”. The information for latitude and longitude was taken from https://developers.google.com/public-data/docs/canonical/countries_csv 

After adding “latitude” and “longitude”, the data set now has 16 columns and 5152 rows.

### Contents of this Notebook
#### 01. Importing libraries
#### 02. Importing data frame
#### 03. Data consistency checks
#### 03.1 Checking datatypes
#### 03.2 Addressing missing values
#### 03.3 Looking for duplicates
#### 04. Basic statistics

### 01. Importing libraries

In [1]:
import pandas as pd
import numpy as np
import os

### 02. Importing data frame

In [10]:
df_freedom = pd.read_excel(r'C:\Users\veren\Python Data\Economic Freedom Index\02 Data\Prepared Data\data-economic-freedom-lat-long-noform.xlsx', index_col=False)

In [11]:
# Checking how big the data frame is

df_freedom.shape

(5152, 17)

In [13]:
# Displaying all rows and columns

pd.options.display.max_rows=None
df_freedom.head(5152)

Unnamed: 0,Name,Latitude,Longitude,Index Year,Overall Score,Property Rights,Government Integrity,Judicial Effectiveness,Tax Burden,Government Spending,Fiscal Health,Business Freedom,Labor Freedom,Monetary Freedom,Trade Freedom,Investment Freedom,Financial Freedom
0,Afghanistan,33.93911,67.709953,2022,,,,,,,,,,,,,
1,Albania,41.153332,20.168331,2022,66.6,55.5,35.6,49.8,89.1,72.1,70.6,70.7,51.1,82.0,82.6,70.0,70.0
2,Algeria,28.033886,1.659626,2022,45.8,27.9,30.1,29.7,67.2,57.1,38.6,50.0,51.5,80.1,57.4,30.0,30.0
3,Angola,-11.202692,17.873887,2022,52.6,39.8,20.6,25.3,86.6,86.4,80.0,37.6,53.9,61.2,70.0,30.0,40.0
4,Argentina,-38.416097,-63.616672,2022,50.1,35.1,45.1,57.9,73.3,53.0,16.8,55.1,51.0,37.9,60.6,55.0,60.0
5,Armenia,40.069099,45.038189,2022,65.3,50.4,50.8,33.1,86.9,78.9,75.5,64.9,47.2,77.5,73.6,75.0,70.0
6,Australia,-25.274398,133.775136,2022,77.7,91.7,87.0,95.2,62.5,51.6,52.0,84.6,64.2,83.2,90.0,80.0,90.0
7,Austria,47.516231,14.550072,2022,73.8,98.4,82.9,94.6,45.5,20.3,71.7,82.3,78.4,82.3,79.2,80.0,70.0
8,Azerbaijan,40.143105,47.576927,2022,61.6,53.6,28.6,15.9,87.7,62.7,99.1,64.6,55.9,74.5,66.6,70.0,60.0
9,Bahrain,25.930414,50.637772,2022,62.0,65.9,41.6,27.4,99.9,65.5,0.0,60.2,54.9,81.1,83.0,85.0,80.0


### 03. Data consistency checks

#### 03.1 Checking datatypes

In [16]:
df_freedom.dtypes

Name                       object
Latitude                  float64
Longitude                 float64
Index Year                  int64
Overall Score             float64
Property Rights           float64
Government Integrity      float64
Judicial Effectiveness    float64
Tax Burden                float64
Government Spending       float64
Fiscal Health             float64
Business Freedom          float64
Labor Freedom             float64
Monetary Freedom          float64
Trade Freedom             float64
Investment Freedom        float64
Financial Freedom         float64
dtype: object

#### 03.2 Addressing missing values

A check of the data in Excel as well as an investigation of the relevant deliverables on the website of The Heritage Foundation has already brought up that the columns “fiscal health” and “judicial effectiveness” hold no data (N/A) for 2016 and earlier. 

Additionally, there is missing data for the countries Afghanistan, Iraq, Libya, Syria, Yemen, Somalia and Liechtenstein (all years and all columns are N/A).

There are also countries which have a lot of missing data in the earlier years, such as Bhutan, Brunei, Comoros, Dominica, Eritrea, Kiribati, Saint Lucia, Sao Tome and Principe, Serbia, Seychelles, Solomon Islands, Tonga and Vanuatu.

For now, the missing values are kept as such (N/A) because the information that there is no available data for the indicators for these countries or years is valuable.

#### 03.3 Looking for duplicates

In [17]:
df_dups = df_freedom[df_freedom.duplicated()]

In [18]:
df_dups

Unnamed: 0,Name,Latitude,Longitude,Index Year,Overall Score,Property Rights,Government Integrity,Judicial Effectiveness,Tax Burden,Government Spending,Fiscal Health,Business Freedom,Labor Freedom,Monetary Freedom,Trade Freedom,Investment Freedom,Financial Freedom


There are no full duplicates within the data frame.

### 04. Basic statistics

In [19]:
# Generating all the descriptive statistics

df_freedom.describe()

Unnamed: 0,Latitude,Longitude,Index Year,Overall Score,Property Rights,Government Integrity,Judicial Effectiveness,Tax Burden,Government Spending,Fiscal Health,Business Freedom,Labor Freedom,Monetary Freedom,Trade Freedom,Investment Freedom,Financial Freedom
count,5152.0,5152.0,5152.0,4618.0,4655.0,4671.0,1099.0,4634.0,4650.0,1092.0,4667.0,3173.0,4656.0,4641.0,4656.0,4634.0
mean,19.268094,19.108538,2008.543284,59.627891,48.894651,41.144444,46.210828,73.485283,64.597269,66.768223,63.931412,60.241506,72.892955,69.41991,53.643686,49.715149
std,23.880525,65.503617,8.080626,11.711224,23.554082,22.354421,20.95764,15.327823,24.392674,31.065428,16.033107,15.966463,15.333376,15.740793,21.445657,19.857163
min,-40.900557,-175.198242,1995.0,1.0,0.0,0.0,3.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,4.454079,-7.375578,2002.0,53.0,30.0,25.5,30.0,65.5,51.0,49.075,55.0,49.9,69.8,61.2,40.0,30.0
50%,18.109581,20.939444,2009.0,59.6,50.0,34.6,43.6,75.6,70.65,78.95,64.8,60.2,76.3,72.2,50.0,50.0
75%,39.399872,50.637772,2016.0,67.4,70.0,52.0,60.0,83.4,84.1,92.125,73.3,71.2,81.3,80.8,70.0,70.0
max,64.963051,179.414413,2022.0,90.5,100.0,100.0,98.0,100.0,99.3,100.0,100.0,100.0,95.4,95.0,95.0,90.0
