# Introduction

In this project we will work with data from [CIA World Factbook](https://www.cia.gov/the-world-factbook/). This dataset contains statistical information about all the countries on earth.
It is commonly referred to as the Factbook and it contains demographic informations each information represented by a column. Some of which includes:

| column name | description |
| :---         |      :---  |
| name | the name of the country. |
| area | the country's total area (both land and water). |
| area_land |  the country's land area in square kilometers | 
| area_water | the country's waterarea in square kilometers. |
| population | the country's population. |
| population_growth | the country's population growth as a percentage. |
| birth_rate| the country's birth rate, or the number of births per year per 1,000 people. |
|death_rate | the country's death rate, or the number of death per year per 1,000 people. |

- population - the global population of the country.
- population_growth - the annual population growth rate, as a percentage.
- area - the total land and water area covered by the country

The data is stored in a table within an sqlite database.

### Aim
Our aim is to use SQL to analyze the data from the database by comming up with valuable insights.  

In [76]:
# connect Jupyter notebook to database
# %%capture
%load_ext sql
%sql sqlite:///factbook.db

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


'Connected: None@factbook.db'

In [77]:
%%sql
SELECT * 
FROM sqlite_master

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [78]:
%%sql
SELECT * 
    FROM facts
    LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


### Summary statistic 

let's try to identify some abnormal countries by calculating some summary data

In [79]:
%%sql
SELECT MIN(population) as min_population,
MAX(population) as max_population,
MIN(population_growth) as min_population_growth,
MAX(population_growth) as max_population_growth
FROM facts;

Done.


min_population,max_population,min_population_growth,max_population_growth
0,7256490011,0.0,4.02


### Analysing Outlier countries

Can you Identify some odd values? Yes the min_population and the min_population_growth columns both have a value of 0

Also, there is a country with a population of about 7.2Billion people. I didn't know such a country exist

Next, let's try to understand which are those countries that have 0 inhabitants and why. Let's also Identify the country with the population of 7.2billion

In [80]:
%%sql 
SELECT *
FROM facts
WHERE population = (SELECT MIN(population)
                       FROM facts);

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


Let's look at the country with the min popuplation growth as well

In [81]:
%%sql 
SELECT *
FROM facts
WHERE population = (SELECT MIN(population_growth)
                       FROM facts);

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


We can see that Antarctica is the crazy country with no inhabitant. According to [Wikipedia](https://en.wikipedia.org/wiki/Antarctica) Antartica is the coldest, driest, and windiest continent, and has the highest average elevation of all the continents. This conditions makes it impossible for human habitation

Now that we understand the reason behind the country with 0 population, let's look into the country with a more than a billion population

In [82]:
%%sql 
SELECT *
FROM facts
WHERE population = (SELECT MAX(population)
                       FROM facts);

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


Ooops!! it looks like we have gotten our country, the world. However, we all know that the world is not a country. It just happens that the team who worked on the dataset decided to include a row for the whole world.

Now that we have our two exceptions, let's recompute the above statistic while excluding them.

In [83]:
%%sql
SELECT MIN(population) as min_population,
MAX(population) as max_population,
MIN(population_growth) as min_population_growth,
MAX(population_growth) as max_population_growth
FROM facts
WHERE name NOT IN ('World', 'Antarctica');

Done.


min_population,max_population,min_population_growth,max_population_growth
48,1367485388,0.0,4.02


In [84]:
%%sql 
SELECT *
FROM facts
WHERE population = 48;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
238,pc,Pitcairn Islands,47,47,0,48,0.0,,,


The smallest country aside Antartica is Pitcairn Island with a population of 47 people.
You can read read more about it here [Pitcairn Island](https://www.government.pn/)

There is a country with a population of 1.4Billion, we can all guess right? China

## Exploring Population Density

let's determine the average country's population and the average country's area 

In [85]:
%%sql
SELECT AVG(population) as avg_population,
AVG(area) as avg_area
FROM facts
WHERE name NOT IN ('World');

Done.


avg_population,avg_area
32242666.56846473,555093.546184739


One interesting calculation we can do now is determining the countries that are over populated.

We can do that by identifying countries:
 - that falls below the average area
 - whose populations is above the average

In [86]:
%%sql
SELECT * 
FROM facts
WHERE area < (SELECT AVG(area)
                    FROM facts
                    WHERE name NOT IN ('World'))
AND population > (SELECT AVG(population)
                    FROM facts
                    WHERE name NOT IN ('World'));

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
80,iz,Iraq,438317,437367,950,37056169,2.93,31.45,3.77,1.62
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0
120,mo,Morocco,446550,446300,250,33322699,1.0,18.2,4.81,3.36
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31


We can be sure of our result since most of this countries are generally considered densily populated

### Exploring the land to water area proportion

Let's look at the various countries and Identify those whose total area is mainly composed of water

In [87]:
%%sql
SELECT *, (area_water - area_land) as difference_water_land 
FROM facts
ORDER BY difference_water_land DESC
LIMIT 10 ;


Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,difference_water_land
228,io,British Indian Ocean Territory,54400,60,54340,,,,,,54280
247,vq,Virgin Islands,1910,346,1564,103574.0,0.59,10.31,8.54,7.67,1218
190,vt,Holy See (Vatican City),0,0,0,842.0,0.0,,,,0
117,mn,Monaco,2,2,0,30535.0,0.12,6.65,9.24,3.83,-2
201,cr,Coral Sea Islands,3,3,0,,,,,,-3
198,at,Ashmore and Cartier Islands,5,5,0,,,,,,-5
244,bq,Navassa Island,5,5,0,,,,,,-5
253,pg,Spratly Islands,5,5,0,,,,,,-5
208,ip,Clipperton Island,6,6,0,,,,,,-6
233,gi,Gibraltar,6,6,0,29258.0,0.24,14.08,8.37,3.28,-6


British Indian Ocean Territory's water area 900 times larger than the land area.
Also, we can observe that the top 10 countries with this characteristic are mainly Island, which is not surprising.

### Exploring population increase

Which country do you thing will have the highest increase in population by next year ?
Let's find out together

In [88]:
%%sql
SELECT *, (population + population*population_growth) as population_increase 
FROM facts
WHERE name NOT IN ('World')
ORDER BY population_increase DESC
LIMIT 10 ;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,population_increase
77,in,India,3287263,2973193.0,314070.0,1251695584,1.22,19.55,7.32,0.04,2778764196.48
37,ch,China,9596960,9326410.0,270550.0,1367485388,0.45,12.49,7.53,0.44,1982853812.6
197,ee,European Union,4324782,,,513949445,0.25,10.2,10.2,2.5,642436806.25
129,ni,Nigeria,923768,910768.0,13000.0,181562056,2.45,37.64,12.9,0.22,626389093.2
186,us,United States,9826675,9161966.0,664709.0,321368864,0.78,12.49,8.15,3.86,572036577.9200001
78,id,Indonesia,1904569,1811569.0,93000.0,255993674,0.92,16.72,6.37,1.16,491507854.08
132,pk,Pakistan,796095,770875.0,25220.0,199085847,1.46,22.58,6.49,1.54,489751183.62
14,bg,Bangladesh,148460,130170.0,18290.0,168957745,1.6,21.14,5.61,0.46,439290137.0
58,et,Ethiopia,1104300,,104300.0,99465819,2.89,37.27,8.19,0.22,386922035.91
24,br,Brazil,8515770,8358140.0,157630.0,204259812,0.77,14.46,6.58,0.14,361539867.24


The results are surprising to me. The world's two most populated countries China and India will have the highest increase in population by next year. 

At the same time it is understandable given their large population



Let's  determine the country with the highest death rate with respect to death rate and the country with the highest growth rate

In [89]:
%%sql
SELECT * 
FROM facts
WHERE name NOT IN ('World') and death_rate > birth_rate 
ORDER BY death_rate DESC
LIMIT 10 ;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
183,up,Ukraine,603550,579330,24220,44429471,0.6,10.72,14.46,2.25
26,bu,Bulgaria,110879,108489,2390,7186893,0.58,8.92,14.44,0.29
96,lg,Latvia,64589,62249,2340,1986705,1.06,10.0,14.31,6.26
102,lh,Lithuania,65300,62680,2620,2884433,1.04,10.1,14.27,6.27
143,rs,Russia,17098242,16377742,720500,142423773,0.04,11.6,13.69,1.69
153,ri,Serbia,77474,77474,0,7176794,0.46,9.08,13.66,0.0
16,bo,Belarus,207600,202900,4700,9589689,0.2,10.7,13.36,0.7
75,hu,Hungary,93028,89608,3420,9897541,0.22,9.16,12.73,1.33
116,md,Moldova,33851,32891,960,3546847,1.03,12.0,12.59,9.67
57,en,Estonia,45228,42388,2840,1265420,0.55,10.51,12.4,3.6


Ukraine death rate is not only higher than it's birth rate, but it is also the country with the highest death rate world wide

What can be the reason according to you ?

#### Now, let's look into countries with the highest population growth rate

In [90]:
%%sql
SELECT * 
FROM facts
WHERE name NOT IN ('World')
ORDER BY population_growth DESC
LIMIT 10 ;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
162,od,South Sudan,644329.0,,,12042910,4.02,36.91,8.18,11.47
106,mi,Malawi,118484.0,94080.0,24404.0,17964697,3.32,41.56,8.41,0.0
29,by,Burundi,27830.0,25680.0,2150.0,10742276,3.28,42.01,9.27,0.0
128,ng,Niger,,1266700.0,300.0,18045729,3.25,45.45,12.42,0.56
182,ug,Uganda,241038.0,197100.0,43938.0,37101745,3.24,43.79,10.69,0.74
141,qa,Qatar,11586.0,11586.0,0.0,2194817,3.07,9.84,1.53,22.39
27,uv,Burkina Faso,274200.0,273800.0,400.0,18931686,3.03,42.03,11.72,0.0
109,ml,Mali,1240192.0,1220190.0,20002.0,16955536,2.98,44.99,12.89,2.26
219,cw,Cook Islands,236.0,236.0,0.0,9838,2.95,14.33,8.03,
80,iz,Iraq,438317.0,437367.0,950.0,37056169,2.93,31.45,3.77,1.62


South Sudan has the highest population growth rate world wide.
We can also noticed that the top countries with the highest population growth are all from Africa

## End of Analysis