# APEX STATS Dataset
Prepared by David Schuster

## Source Attribution

Author: Data Copyright &copy; 2021 by The World Bank Group, All Rights Reserved

Title: Health Nutrition and Population Statistics

Source: <a href="https://datacatalog.worldbank.org/search/dataset/0037652">The World Bank Data Catalog</a>

License: Creative Commons Attribution 4.0 (CC-BY 4.0)

Changes: Data have been adapted for APEX STATS by David Schuster; missing cases have been removed in the example version


## Description of the Original Data

The World Bank has gathered health, nutrition and population statistics about a variety of countries. The data are reported from a variety of sources. These data can inform about international differences and similarities in surgery, health, disease, medicine, nutrition, population dynamics, reproductive health, and healthcare.


## Description of the Example

Access this example using the file `example.csv`.

The example file is a subset of the original data. Only life expectancy at birth, for all genders, in years, is included.

The unit of analysis (each row) is a country.

The following variables are included:

y: The name of the country  
x: The life expectancy at birth for the year 2018, the latest data available at the time of this example  
x1: The life expectancy for 2008 (i.e., ten years before 2018)  
x2: The life expectancy for 1998  
x3: The life expectancy for 1988  
x4: The life expectancy for 1978  
x5: The life expectancy for 1968  

Due to missing data for one or more of these years, listwise deletion resulted in these these countries being removed from the data file:

- American Samoa
- Andorra
- Bermuda
- British Virgin Islands
- Cayman Islands
- Curacao
- Dominica
- Faroe Islands
- Gibraltar
- Greenland
- Isle of Man
- Kosovo
- Liechtenstein
- Marshall Islands
- Monaco
- Nauru
- Northern Mariana Islands
- Palau
- San Marino
- Serbia
- Seychelles
- Sint Maarten (Dutch part)
- St. Marin (French part)
- St. Kitts and Nevis
- Turks and Caicos Islands
- Tuvalu
- West Bank and Gaza

The resulting data file has life expectancies for 195 countries for the years 1968, 1978, 1988, 1998, 2008, and 2018.

The secondary file, `example2.csv` provides similar data organized in non-mutually exclusive regional categories. Note that this secondary file has not been cleaned and has missing data. This file is provided to manually pull test statistics (e.g., the worldwide average).

## Discipline(s) Represented

- Social Science
- Political Science
- Health Science

## Dataset Preview

In [None]:
#@title Setup Example Data: Health Nutrition

# Import library
import pandas as pd

# Read data file: Health Nutrition
data = pd.read_csv('https://raw.githubusercontent.com/vectrlab/apex-stats-datasets/main/healthnutritionandpopulation/example.csv')

# Preview data
data.head()

Unnamed: 0,y,x,x1,x2,x3,x4,x5
0,Afghanistan,64.833,60.484,55.376,49.64,42.585,36.9
1,Albania,78.573,76.221,73.587,71.86,69.991,66.689
2,Algeria,76.88,74.644,70.183,66.554,56.909,49.982
3,Angola,61.147,54.311,46.093,45.324,43.931,40.546
4,Antigua and Barbuda,77.016,75.683,73.705,71.267,68.154,65.616


## Exploratory Analyses (untested)

- Estimate worldwide life expectancy in 2018
- Descriptively examine life expectancy by country
- Describe the distribution of life expectancy for 2018
- Look for a linear relationship between year and life expectancy
- Compare the life expectancy between two years
- Repeated measures analysis of variance for life expectancy

## Potential Activity Starters

- Data exclusion and cleaning; why were data missing for some countries?
- Variability in life expectancy; how do we quantify and interpret variability in life expectancy, and what is its implication?
