# CIA Factbook Data Analysis (Using SQL)

In this project, we work with and analyze data from the [CIA World Factbook](https://www.cia.gov/the-world-factbook/), a compendium of statistics about all countries on Earth.  It contains demographic information such as populations, population growths, and total land/water area.

We first connect our Jupyter Notebook to the database file and query information about its tables.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

In [2]:
%%sql
SELECT *
FROM sqlite_master
WHERE type = 'table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [3]:
%%sql
SELECT *
FROM facts
LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


## Descriptions of some columns in `facts`

* `name` - name of country
* `area` - total area (land and water)
* `area_land` - land area (in square kilometers)
* `area_water` - water area (in square kilometers)
* `population` - population
* `population_growth` - population growth as a percentage
* `birth_rate` - number of births per year per 1000 people
* `death_rate` - number of deaths per year per 1000 peopl

In [4]:
%%sql
SELECT MIN(population) AS population_min
,MAX(population) AS population_max
,MIN(population_growth) AS population_growth_min
,MAX(population_growth) AS population_growth_max
FROM facts;

 * sqlite:///factbook.db
Done.


population_min,population_max,population_growth_min,population_growth_max
0,7256490011,0.0,4.02


In [5]:
%%sql
SELECT name AS country
,population
FROM facts
WHERE population = (SELECT MIN(population) FROM facts);

 * sqlite:///factbook.db
Done.


country,population
Antarctica,0


In [6]:
%%sql
SELECT name AS country
,population
FROM facts
WHERE population = (SELECT MAX(population) FROM facts);

 * sqlite:///factbook.db
Done.


country,population
World,7256490011


We can see from the queries above that the table:
1. Contains a row for the whole world, which is not a country itself, and has a population of 7.26 billion
2. Contains the country Antarctica with the lowest population of zero

Let's analyze some summary statistics without these two rows:

In [7]:
%%sql
SELECT MIN(population) AS population_min
,MAX(population) AS population_max
,MIN(population_growth) AS population_growth_min
,MAX(population_growth) AS population_growth_max
FROM facts
WHERE name NOT IN ('Antarctica', 'World');

 * sqlite:///factbook.db
Done.


population_min,population_max,population_growth_min,population_growth_max
48,1367485388,0.0,4.02


In [8]:
%%sql
SELECT name AS country
,population
FROM facts
WHERE population = (SELECT MIN(population) FROM facts WHERE name != 'Antarctica');

 * sqlite:///factbook.db
Done.


country,population
Pitcairn Islands,48


In [9]:
%%sql
SELECT name AS country
,population
FROM facts
WHERE population = (SELECT MAX(population) FROM facts WHERE name != 'World');

 * sqlite:///factbook.db
Done.


country,population
China,1367485388


In [10]:
%%sql
SELECT CAST(AVG(population) AS INT) AS population_avg
,CAST(AVG(area) AS INT) AS area_avg
FROM facts
WHERE name NOT IN ('Antarctica', 'World');

 * sqlite:///factbook.db
Done.


population_avg,area_avg
32377011,555093


## Summary of observations

- The country with the lowest population (48) is Pitcairn Islands
- The country with the highest population (1.4 billion) is China
- The average country population is 32 million
- The average county area is 555,093 square kilometers

Finally, we look at countries with above average population and above average areas:

In [11]:
%%sql
SELECT *
FROM facts
WHERE population > (SELECT AVG(population) FROM facts WHERE name != 'WORLD')
AND area > (SELECT AVG(area) FROM facts WHERE name != 'WORLD')

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
24,br,Brazil,8515770,8358140.0,157630.0,204259812,0.77,14.46,6.58,0.14
37,ch,China,9596960,9326410.0,270550.0,1367485388,0.45,12.49,7.53,0.44
40,cg,"Congo, Democratic Republic of the",2344858,2267048.0,77810.0,79375136,2.45,34.88,10.07,0.27
53,eg,Egypt,1001450,995450.0,6000.0,88487396,1.79,22.9,4.77,0.19
58,et,Ethiopia,1104300,,104300.0,99465819,2.89,37.27,8.19,0.22
61,fr,France,643801,640427.0,3374.0,66553766,0.43,12.38,9.16,1.09
77,in,India,3287263,2973193.0,314070.0,1251695584,1.22,19.55,7.32,0.04
78,id,Indonesia,1904569,1811569.0,93000.0,255993674,0.92,16.72,6.37,1.16
79,ir,Iran,1648195,1531595.0,116600.0,81824270,1.2,17.99,5.94,0.07
114,mx,Mexico,1964375,1943945.0,20430.0,121736809,1.18,18.78,5.26,1.68
