## Inner join

Although this courses focuses on PostgreSQL, you'll find that these joins and the material here applies to different forms of SQL as well.

Throughout this course, you'll be working with the `countries` database containing information about the most populous world cities as well as country-level economic data, population data, and geographic data. This `countries` database also contains information on languages spoken in each country.

You can see the different tables in this database by clicking on the corresponding tabs. Click through them to get a sense for the types of data that each table contains before you continue with the course! Take note of the fields that appear to be shared across the tables.

Recall from the video the basic syntax for an `INNER JOIN`, here including all columns in **both** tables:

```
SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;
```

You'll start off with a `SELECT` statement and then build up to an `INNER JOIN` with the `cities` and `countries` tables. Let's get to it!

Instructions

1. Begin by selecting all columns from the `cities` table.
2. 
    1. Inner join the `cities` table on the left to the `countries` table on the right, keeping all of the fields in both tables.
    2. You should match the tables on the `country_code` field in `cities` and the `code` field in `countries`.
    3. **Do not** alias your tables here or in the next step. Using `cities` and `countries` is fine for now.
3. 
    1. Modify the `SELECT` statement to keep only the name of the city, the name of the country, and the name of the region the country resides in.
    2. Alias the name of the city `AS city` and the name of the country `AS country`.

In [None]:
-- Select all columns from cities
SELECT *
FROM cities;

# name          country_code   city_proper_pop   metroarea_pop   urbanarea_pop
# Abidjan       CIV            4765000           null            4765000
# Abu Dhabi     ARE            1145000           null            1145000
# Abuja         NGA            1235880           6000000         1235880
# Accra         GHA            2070460           4010050         2070460
# Addis Ababa   ETH            3103670           4567860         3103670
# ...

In [None]:
SELECT * 
FROM cities
-- Inner join to countries
INNER JOIN countries
-- Match on the country codes
ON cities.country_code = countries.code;

# name          country_code   city_proper_pop   metroarea_pop   urbanarea_pop   code   name                   continent   region           surface_area   indep_year   local_name                           gov_form             capital        cap_long   cap_lat
# Abidjan       CIV            4765000           null            4765000         CIV    Cote d'Ivoire          Africa      Western Africa   322463         1960         Cote d'Ivoire                        Republic             Yamoussoukro   -4.0305    5.332
# Abu Dhabi     ARE            1145000           null            1145000         ARE    United Arab Emirates   Asia        Middle East      83600          1971         Al-Imarat al-´Arabiya al-Muttahida   Emirate Federation   Abu Dhabi      54.3705    24.4764
# Abuja         NGA            1235880           6000000         1235880         NGA    Nigeria                Africa      Western Africa   923768         1960         Nigeria                              Federal Republic     Abuja          7.48906    9.05804
# Accra         GHA            2070460           4010050         2070460         GHA    Ghana                  Africa      Western Africa   238533         1957         Ghana                                Republic             Accra          -0.20795   5.57045
# Addis Ababa   ETH            3103670           4567860         3103670         ETH    Ethiopia               Africa      Eastern Africa   1104300        -1000        YeItyop´iya                          Republic             Addis Ababa    38.7468    9.02274
# ...

In [None]:
-- Select name fields (with alias) and region 
SELECT cities.name AS city,
       countries.name AS country,
       region
FROM cities
INNER JOIN countries
ON cities.country_code = countries.code;

# city          country                region
# Abidjan       Cote d'Ivoire          Western Africa
# Abu Dhabi     United Arab Emirates   Middle East
# Abuja         Nigeria                Western Africa
# Accra         Ghana                  Western Africa
# Addis Ababa   Ethiopia               Eastern Africa
# ...

## Inner join (2)

Instead of writing the full table name, you can use table aliasing as a shortcut. For tables you also use `AS` to add the alias immediately after the table name with a space. Check out the aliasing of `cities` and `countries` below.

```
SELECT c1.name AS city, c2.name AS country
FROM cities AS c1
INNER JOIN countries AS c2
ON c1.country_code = c2.code;
```

Notice that to select a field in your query that appears in multiple tables, you'll need to identify which table/table alias you're referring to by using a `.` in your `SELECT` statement.

You'll now explore a way to get data from both the `countries` and `economies` tables to examine the inflation rate for both 2010 and 2015.

Sometimes it's easier to write SQL code out of order: you write the `SELECT` statement after you've done the `JOIN`.

Instructions

1. Join the tables `countries` (left) and `economies` (right) aliasing `countries AS c` and `economies AS e`.
2. Specify the field to match the tables `ON`.
3. From this join, `SELECT`:
    1. `c.code`, aliased as `country_code`.
    2. `name`, `year`, and `inflation_rate`, not aliased.

In [None]:
-- Select fields with aliases
SELECT c.code AS country_code,
       name, year, inflation_rate
FROM countries AS c
-- Join to economies (alias e)
INNER JOIN economies AS e
-- Match on code
ON c.code = e.code;

# country_code   name          year   inflation_rate
# AFG            Afghanistan   2010   2.179
# AFG            Afghanistan   2015   -1.549
# AGO            Angola        2010   14.48
# AGO            Angola        2015   10.287
# ALB            Albania       2010   3.605
# ...

## Inner join (3)

The ability to combine multiple joins in a single query is a powerful feature of SQL, e.g:

```
SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id
INNER JOIN another_table
ON left_table.id = another_table.id;
```

As you can see here it becomes tedious to continually write long table names in joins. This is when it becomes useful to alias each table using the first letter of its name (e.g. `countries AS c`)! It is standard practice to alias in this way and, if you choose to alias tables or are asked to specifically for an exercise in this course, you should follow this protocol.

Now, for each country, you want to get the country name, its region, the fertility rate, and the unemployment rate for both 2010 and 2015.

Note that results should work throughout this course with or without table aliasing unless specified differently.

Instructions

1. 
    1. Inner join `countries` (left) and `populations` (right) on the `code` and `country_code` fields respectively.
    2. Alias `countries AS c` and `populations AS p`.
    3. Select `code`, `name`, and `region` from `countries` and also select `year` and `fertility_rate` from `populations` (5 fields in total).
2. 
    1. Add an additional `INNER JOIN` with `economies` to your previous query by joining on `code`.
    2. Include the `unemployment_rate` column that became available through joining with `economies`.
    3. Note that `year` appears in both `populations` and `economies`, so you have to explicitly use `e.year` instead of `year` as you did before.
3. 
    1. Scroll down the query result and take a look at the results for Albania from your previous query. Does something seem off to you?
    2. The trouble with doing your last join on `c.code = e.code` and not also including `year` is that e.g. the 2010 value for `fertility_rate` is also paired with the 2015 value for `unemployment_rate`.
    3. Fix your previous query: in your last `ON` clause, use `AND` to add an additional joining condition. In addition to joining on `code` in `c` and `e`, also join on `year` in `e` and `p`.

In [None]:
-- Select fields
SELECT code, name, region,
       year, fertility_rate
-- From countries (alias as c)
FROM countries AS c
-- Join with populations (as p)
INNER JOIN populations AS p
-- Match on country code
ON c.code = p.country_code

# code   name          region                      year   fertility_rate
# ABW    Aruba         Caribbean                   2010   1.704
# ABW    Aruba         Caribbean                   2015   1.647
# AFG    Afghanistan   Southern and Central Asia   2010   5.746
# AFG    Afghanistan   Southern and Central Asia   2015   4.653
# AGO    Angola        Central Africa              2010   6.416
# ...

In [None]:
-- Select fields
SELECT c.code, name, region, 
       fertility_rate,
       e.year, unemployment_rate
-- From countries (alias as c)
FROM countries AS c
-- Join to populations (as p)
INNER JOIN populations AS p
-- Match on country code
ON c.code = p.country_code
-- Join to economies (as e)
INNER JOIN economies AS e
-- Match on country code
ON e.code = c.code;

# code   name          region                      year   fertility_rate   unemployment_rate
# AFG    Afghanistan   Southern and Central Asia   2015   5.746            null
# AFG    Afghanistan   Southern and Central Asia   2010   5.746            null
# AFG    Afghanistan   Southern and Central Asia   2015   4.653            null
# AFG    Afghanistan   Southern and Central Asia   2010   4.653            null
# AGO    Angola        Central Africa              2015   6.416            null
# AGO    Angola        Central Africa              2010   6.416            null
# AGO    Angola        Central Africa              2015   5.996            null
# AGO    Angola        Central Africa              2010   5.996            null
# ALB    Albania       Southern Europe             2015   1.663            17.1
# ALB    Albania       Southern Europe             2010   1.663            14
# ALB    Albania       Southern Europe             2015   1.793            17.1
# ALB    Albania       Southern Europe             2010   1.793            14
# ...

In [None]:
-- Select fields
SELECT c.code, name, region, 
       fertility_rate, 
       e.year, unemployment_rate
-- From countries (alias as c)
FROM countries AS c
-- Join to populations (as p)
INNER JOIN populations AS p
-- Match on country code
ON c.code = p.country_code
-- Join to economies (as e)
INNER JOIN economies AS e
-- Match on country code and year
ON c.code = e.code AND e.year = p.year;

# code   name          region                      fertility_rate   year   unemployment_rate
# AFG    Afghanistan   Southern and Central Asia   5.746            2010   null
# AFG    Afghanistan   Southern and Central Asia   4.653            2015   null
# AGO    Angola        Central Africa              6.416            2010   null
# AGO    Angola        Central Africa              5.996            2015   null
# ALB    Albania       Southern Europe             1.663            2010   14
# ALB    Albania       Southern Europe             1.793            2015   17.1
# ...

## Review inner join using on

Why does the following code result in an error?

```
SELECT c.name AS country, 
       l.name AS language
FROM countries AS c
INNER JOIN languages AS l;
```

`INNER JOIN` requires a specification of the key field (or fields) in each table.

## Inner join with using

When joining tables with a common field name, e.g.

```
SELECT *
FROM countries
INNER JOIN economies
ON countries.code = economies.code;
```

You can use `USING` as a shortcut:

```
SELECT *
FROM countries
INNER JOIN economies
USING(code);
```

You'll now explore how this can be done with the `countries` and `languages` tables.

Instructions

1. Inner join `countries` on the left and `languages` on the right with `USING(code)`.
2. Select the fields corresponding to:
    1. country name `AS country`,
    2. continent name,
    3. language name `AS language`, and
    4. whether or not the language is official.

Remember to alias your tables using the first letter of their names.

In [None]:
-- Select fields
SELECT c.name AS country,
       continent,
       l.name AS language,
       official
-- From countries (alias as c)
FROM countries AS c
-- Join to languages (as l)
INNER JOIN languages as l
-- Match using code
USING(code);

# country       continent   language   official
# Afghanistan   Asia        Dari       true
# Afghanistan   Asia        Pashto     true
# Afghanistan   Asia        Turkic     false
# Afghanistan   Asia        Other      false
# Albania       Europe      Albanian   true
# ...

## Self-join

In this exercise, you'll use the `populations` table to perform a self-join to calculate the percentage increase in population from 2010 to 2015 for each country code!

Since you'll be joining the `populations` table to itself, you can alias `populations` as `p1` and also `populations` as `p2`. This is good practice whenever you are aliasing and your tables have the same first letter. Note that you are required to alias the tables with self-joins.

Instructions

1. 
    1. Join `populations` with itself `ON` `country_code`.
    2. Select the `country_code` from `p1` and the `size` field from both `p1` and `p2`. SQL won't allow same-named fields, so alias `p1.size` as `size2010` and `p2.size` as `size2015`.
2. 
    1. Notice from the result that for each `country_code` you have four entries laying out all combinations of 2010 and 2015.
    2. Extend the `ON` in your query to include only those records where the `p1.year` (2010) matches with `p2.year - 5` (2015 - 5 = 2010). This will omit the three entries per `country_code` that you aren't interested in.
3. As you just saw, you can also use SQL to calculate values like `p2.year - 5` for you. With two fields like `size2010` and `size2015`, you may want to determine the percentage increase from one field to the next: 

    With two numeric fields A and B, the percentage growth from A to B can be calculated as (B - A) / A * 100.0.

    Add a new field to `SELECT`, aliased as `growth_perc`, that calculates the percentage population growth from 2010 to 2015 for each country, using `p2.size` and `p1.size`.

In [None]:
-- Select fields with aliases
SELECT p1.country_code,
       p1.size AS size2010,
       p2.size AS size2015
-- From populations (alias as p1)
FROM populations AS p1
-- Join to itself (alias as p2)
INNER JOIN populations AS p2
-- Match on country code
USING(country_code);

# country_code   size2010   size2015
# ABW            101597     103889
# ABW            101597     101597
# ABW            103889     103889
# ABW            103889     101597
# AFG            27962200   32526600
# AFG            27962200   27962200
# AFG            32526600   32526600
# AFG            32526600   27962200
# ...

In [None]:
-- Select fields with aliases
SELECT p1.country_code,
       p1.size AS size2010,
       p2.size AS size2015
-- From populations (alias as p1)
FROM populations as p1
-- Join to itself (alias as p2)
INNER JOIN populations as p2
-- Match on country code
ON p1.country_code = p2.country_code
-- and year (with calculation)
AND p1.year = p2.year - 5;

# country_code   size2010   size2015
# ABW            101597     103889
# AFG            27962200   32526600
# AGO            21220000   25022000
# ALB            2913020    2889170
# AND            84419      70473
# ...

In [None]:
-- Select fields with aliases
SELECT p1.country_code,
       p1.size AS size2010, 
       p2.size AS size2015,
       -- Calculate growth_perc
       ((p2.size - p1.size)/p1.size * 100.0) AS growth_perc
-- From populations (alias as p1)
FROM populations AS p1
-- Join to itself (alias as p2)
INNER JOIN populations AS p2
-- Match on country code
ON p1.country_code = p2.country_code
-- and year (with calculation)
AND p1.year = p2.year - 5;

# country_code   size2010   size2015   growth_perc
# ABW            101597     103889     2.25597210228443
# AFG            27962200   32526600   16.32329672575
# AGO            21220000   25022000   17.9171919822693
# ALB            2913020    2889170    -0.818874966353178
# AND            84419      70473      -16.5199771523476
# ...

## Case when and then

Often it's useful to look at a numerical field not as raw data, but instead as being in different categories or groups.

    You can use `CASE` with `WHEN`, `THEN`, `ELSE`, and `END` to define a new grouping field.

Instructions

1. Using the `countries` table, create a new field `AS geosize_group` that groups the countries into three groups:
    1. If `surface_area` is greater than 2 million, `geosize_group` is `'large'`.
    2. If `surface_area` is greater than 350 thousand but not larger than 2 million, `geosize_group` is `'medium'`.
    3. Otherwise, `geosize_group` is `'small'`.

In [None]:
SELECT name, continent, code, surface_area,
-- First case
CASE WHEN surface_area > 2000000 THEN 'large'
    -- Second case
    WHEN surface_area > 350000 THEN 'medium'
    -- Else clause + end
    ELSE 'small' END
    -- Alias name
    AS geosize_group
-- From table
FROM countries;

# name             continent   code   surface_area   geosize_group
# Afghanistan      Asia        AFG    652090         medium
# Netherlands      Europe      NLD    41526          small
# Albania          Europe      ALB    28748          small
# Algeria          Africa      DZA    2381740        large
# American Samoa   Oceania     ASM    199            small
# ...

## Inner challenge

The table you created with the added `geosize_group` field has been loaded for you here with the name `countries_plus`. Observe the use of (and the placement of) the `INTO` command to create this `countries_plus` table:

```
SELECT name, continent, code, surface_area,
CASE WHEN surface_area > 2000000 THEN 'large'
    WHEN surface_area > 350000 THEN 'medium'
    ELSE 'small' END
    AS geosize_group
INTO countries_plus
FROM countries;
```

You will now explore the relationship between the size of a country in terms of surface area and in terms of population using grouping fields created with `CASE`.

By the end of this exercise, you'll be writing two queries back-to-back in a single script. You got this!

Instructions

1. Using the `populations` table focused only for the `year` 2015, create a new field aliased as `popsize_group` to organize population `size` into
    - `'large'` (> 50 million),
    - `'medium'` (> 1 million), and
    - `'small'` groups.
    
   Select only the country code, population size, and this new `popsize_group` as fields.

2. 
    1. Use `INTO` to save the result of the previous query as `pop_plus`. You can see an example of this in the `countries_plus` code in the assignment text. Make sure to include a `;` at the end of your `WHERE` clause!
    2. Then, include another query below your first query to display all the records in `pop_plus` using `SELECT * FROM pop_plus;` so that you generate results and this will display `pop_plus` in the query result.
3. 
    1. Keep the first query intact that creates `pop_plus` using `INTO`.
    2. Write a query to join `countries_plus AS c` on the left with `pop_plus AS p` on the right matching on the country code fields.
    3. Sort the data based on `geosize_group`, in ascending order so that `large` appears on top.
    4. Select the `name`, `continent`, `geosize_group`, and `popsize_group` fields.

In [None]:
SELECT country_code, size,
-- First case
CASE WHEN size > 50000000 THEN 'large'
    -- Second case
    WHEN size > 1000000 THEN 'medium'
    -- Else clause + end
    ELSE 'small' END
    -- Alias name (popsize_group)
    AS popsize_group
-- From table
FROM populations
-- Focus on 2015
WHERE year = 2015;

# country_code   size       popsize_group
# ABW            103889     small
# AFG            32526600   medium
# AGO            25022000   medium
# ALB            2889170    medium
# AND            70473      small
# ...

In [None]:
SELECT country_code, size,
CASE WHEN size > 50000000 THEN 'large'
    WHEN size > 1000000 THEN 'medium'
    ELSE 'small' END
    AS popsize_group
-- Into table
INTO pop_plus
FROM populations
WHERE year = 2015;

-- Select all columns of pop_plus
SELECT *
FROM pop_plus;

# country_code   size       popsize_group
# ABW            103889     small
# AFG            32526600   medium
# AGO            25022000   medium
# ALB            2889170    medium
# AND            70473      small
# ...

In [None]:
SELECT country_code, size,
CASE WHEN size > 50000000 THEN 'large'
    WHEN size > 1000000 THEN 'medium'
    ELSE 'small' END
    AS popsize_group
INTO pop_plus       
FROM populations
WHERE year = 2015;

-- Select fields
SELECT name, continent, geosize_group, popsize_group
-- From countries_plus (alias as c)
FROM countries_plus AS c
-- Join to pop_plus (alias as p)
INNER JOIN pop_plus AS p
-- Match on country code
ON c.code = p.country_code
-- Order the table    
ORDER BY geosize_group;

# name            continent       geosize_group   popsize_group
# Canada          North America   large           medium
# United States   North America   large           large
# Greenland       North America   large           small
# Argentina       South America   large           medium
# Kazakhstan      Asia            large           medium
# ...