# Analyzing the CIA Factbook Data

In this project, we'll work with data from the [CIA World Factbook](https://www.cia.gov/library/publications/the-world-factbook/), a compendium of statistics about all of the countries on Earth. 

The Factbook contains demographic information like:

* population - The population as of 2015.
* population_growth - The annual population growth rate, as a percentage.
* area - The total land and water area.

## Connect to the Database

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

## Exploring the database 

Display the first 5 rows from the 'facts' table.

In [5]:
%%sql
SELECT *
FROM facts
LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


## Calculating the Summary Statistics

Descriptions for some of the columns:

- name - The name of the country.
- area- The country's total area (both land and water).
- area_land - The country's land area in square kilometers.
- area_water - The country's waterarea in square kilometers.
- population - The country's population.
- population_growth- The country's population growth as a percentage.
- birth_rate - The country's birth rate, or the number of births a year per 1,000 people.
- death_rate - The country's death rate, or the number of death a year per 1,000 people.

Let's start by calculating some summary statistics and look for any outlier countries.

In [9]:
%%sql
SELECT MIN(population) AS smallest_pop, 
       MAX(population) AS biggest_pop,
       MIN(population_growth) AS smallest_pop_growth,
       MAX(population_growth) AS biggest_pop_growth
FROM facts;

Done.


smallest_pop,biggest_pop,smallest_pop_growth,biggest_pop_growth
0,7256490011,0.0,4.02


<b>Comment:</b> 

A few things stick out from the summary statistics which are the following.

- There's a country with a population of 0
- There's a country with a population of 7256490011 (or more than 7.2 billion people)

## Exploring the Outlier Countries 

### Countries with the minimum population 

In [12]:
%%sql
SELECT name AS country
FROM facts
WHERE population = (SELECT min(population) FROM FACTS);

Done.


country
Antarctica


<b>Comment:</b> It seems Antartica is the only country that has no citizens, which makes sense as it is mainly inhabited by an international community (i.e. Scientists)

### Countries with the maximum population 

In [13]:
%%sql
SELECT name AS country
FROM facts
WHERE population = (SELECT max(population) FROM FACTS);

Done.


country
World


<b>Comment:</b> It seems this row is an error as we have the aggregate of all the countries put into one row with the country name of 'World'. Thus, it is only fair to exclude this row to make our analysis more accurate. 

## Recomputing the Summary Statistics 

In [14]:
%%sql
SELECT MIN(population) AS smallest_pop, 
       MAX(population) AS biggest_pop,
       MIN(population_growth) AS smallest_pop_growth,
       MAX(population_growth) AS biggest_pop_growth
FROM facts
WHERE name <> 'World';

Done.


smallest_pop,biggest_pop,smallest_pop_growth,biggest_pop_growth
0,1367485388,0.0,4.02


## Average Population and Area 

In [15]:
%%sql
SELECT AVG(population) AS avg_pop, AVG(area) as avg_area
FROM facts;

Done.


avg_pop,avg_area
62094928.32231405,555093.546184739


## Finding Densely Populated Countries 

Lastly, we'll build on the query we wrote for the previous cell to find countries that are densely populated. 

We'll identify countries that have:
- Above average values for population.
- Below average values for area.

In [21]:
%%sql
SELECT name AS countries
FROM facts
WHERE population > (SELECT AVG(population) FROM facts)
    AND area < (SELECT AVG(area) FROM facts);

Done.


countries
Bangladesh
Germany
Japan
Philippines
Thailand
United Kingdom
Vietnam


<b>Comment:</b> These countries should be correct as they seem to be densely populated countries, especially places like Japan and the UK. This means we should