## Subquery inside where

You'll now try to figure out which countries had high average life expectancies (at the country level) in 2015.

Instructions

1. Begin by calculating the average life expectancy across all countries for 2015.
2. Recall that you can use SQL to do calculations for you. Suppose we wanted only records that were above `1.15 * 100` in terms of life expectancy for 2015:

```
SELECT *
FROM populations
WHERE life_expectancy > 1.15 * 100
AND year = 2015;
```

Select all fields from `population`s with records corresponding to larger than 1.15 times the average you calculated in the first task for 2015. In other words, change the `100` in the example above with a subquery.

In [None]:
-- Select average life_expectancy
SELECT AVG(life_expectancy)
-- From populations
FROM populations
-- Where year is 2015
WHERE year = 2015;

# avg
# 71.6763415481105

In [None]:
-- Select fields
SELECT *
-- From populations
FROM populations
-- Where life_expectancy is greater than
WHERE life_expectancy >
    -- 1.15 * subquery
    1.15 * (SELECT AVG(life_expectancy)
            FROM populations
            WHERE year = 2015)
AND year = 2015;
    
# pop_id   country_code   year   fertility_rate   life_expectancy   size
# 21       AUS            2015   1.833            82.4512           23789800
# 376      CHE            2015   1.54             83.1976           8281430
# 356      ESP            2015   1.32             83.3805           46444000
# 134      FRA            2015   2.01             82.6707           66538400
# 170      HKG            2015   1.195            84.278            7305700
# ...

## Subquery inside where (2)

Use your knowledge of subqueries in `WHERE` to get the urban area population for only capital cities.

Instructions

1. Make use of the `capital` field in the `countries` table in your subquery.
2. Select the city name, country code, and urban area population fields.

In [None]:
-- Select fields
SELECT name, country_code, urbanarea_pop
-- From cities
FROM cities
-- Where city name in the field of capital cities
WHERE name IN
    -- Subquery
    (SELECT capital
     FROM countries)
ORDER BY urbanarea_pop DESC;

# name       country_code   urbanarea_pop
# Beijing    CHN            21516000
# Dhaka      BGD            14543100
# Tokyo      JPN            13513700
# Moscow     RUS            12197600
# Cairo      EGY            10230400
# ...

## Subquery inside select

In this exercise, you'll see how some queries can be written using either a join or a subquery.

You have seen previously how to use `GROUP BY` with aggregate functions and an inner join to get summarized information from multiple tables.

The code given in the first query selects the top nine countries in terms of number of cities appearing in the `cities` table. Recall that this corresponds to the most populous cities in the world. Your task will be to convert the second query to get the same result as the provided code.

Instructions

1. Submit the code to view the result of the provided query.
2. 
    1. Convert the `GROUP BY` code to use a subquery inside of `SELECT` by filling in the blanks to get a result that matches the one given using the `GROUP BY` code in the first query.
    2. Again, sort the result by `cities_num` descending and then by `country` ascending.

In [None]:
SELECT countries.name AS country,
       COUNT(*) AS cities_num
FROM cities
INNER JOIN countries
ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;

/* 
SELECT ___ AS ___,
  (SELECT ___
   FROM ___
   WHERE countries.code = cities.country_code) AS cities_num
FROM ___
ORDER BY ___ ___, ___
LIMIT 9;
*/

# country    cities_num
# China      36
# India      18
# Japan      11
# Brazil     10
# Pakistan   9
# ...

In [None]:
/*
SELECT countries.name AS country,
       COUNT(*) AS cities_num
FROM cities
INNER JOIN countries
ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;
*/

SELECT countries.name AS country,
    -- Subquery
    (SELECT COUNT(*)
     FROM cities
     WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;

# country    cities_num
# China      36
# India      18
# Japan      11
# Brazil     10
# Pakistan   9
# ...

## Subquery inside from

The last type of subquery you will work with is one inside of `FROM`.

You will use this to determine the number of languages spoken for each country, identified by the country's local name! (Note this may be different than the `name` field and is stored in the `local_name` field.)

Instructions

1. 
    1. Begin by determining for each country code how many languages are listed in the `languages` table using `SELECT`, `FROM`, and `GROUP BY`.
    2. Alias the aggregated field as `lang_num`.
2. 
    1. Include the previous query (aliased as `subquery`) as a subquery in the `FROM` clause of a new query.
    2. Select the local name of the country from `countries`.
    3. Also, select `lang_num` from `subquery`.
    4. Make sure to use `WHERE` appropriately to match `code` in `countries` and in `subquery`.
    5. Sort by `lang_num` in descending order.

In [None]:
-- Select fields (with aliases)
SELECT code, COUNT(*) AS lang_num
-- From languages
FROM languages
-- Group by code
GROUP BY code;

# code   lang_num
# PRY    2
# NRU    3
# MDG    3
# ASM    5
# TZA    4
# ...

In [None]:
SELECT local_name, subquery.lang_num
FROM countries,
    (SELECT code, COUNT(*) AS lang_num
     FROM languages
     GROUP BY code) AS subquery
WHERE countries.code = subquery.code
ORDER BY lang_num DESC;

# local_name     lang_num
# Zambia         19
# YeItyop´iya    16
# Zimbabwe       16
# Nepal          14
# Bharat/India   14
# ...

## Advanced subquery

You can also nest multiple subqueries to answer even more specific questions.

In this exercise, for each of the six continents listed in 2015, you'll identify which country had the maximum inflation rate, and how high it was, using multiple subqueries. The table result of your final query should look something like the following, where anything between `< >` will be filled in with appropriate values:

```
+------------+---------------+-------------------+
| name       | continent     | inflation_rate    |
|------------+---------------+-------------------|
| <country1> | North America | <max_inflation1>  |
| <country2> | Africa        | <max_inflation2>  |
| <country3> | Oceania       | <max_inflation3>  |
| <country4> | Europe        | <max_inflation4>  |
| <country5> | South America | <max_inflation5>  |
| <country6> | Asia          | <max_inflation6>  |
+------------+---------------+-------------------+
```

Again, there are multiple ways to get to this solution using only joins, but the focus here is on showing you an introduction into advanced subqueries.

Instructions

1. 
    1. Now it's time to append the second part's query to the first part's query using `AND` and `IN` to obtain the name of the country, its continent, and the maximum inflation rate for each continent in 2015!
    2. For the sake of practice, change all joining conditions to use `ON` instead of `USING` (based upon the same column, `code`).
2. 
    1. Select the maximum inflation rate in 2015 `AS max_inf` grouped by continent using the previous step's query as a subquery in the `FROM` clause.
    2. Thus, in your subquery you should:
        1. Create an inner join with `countries` on the left and `economies` on the right with `USING` (without aliasing your tables or columns).
        2. Retrieve the country name, continent, and inflation rate for 2015.
        3. Alias the subquery as `subquery`.
    3. This will result in the six maximum inflation rates in 2015 for the six continents as one field table. Make sure to not include `continent` in the outer `SELECT` statement.
3. 
    1. Now it's time to append your second query to your first query using `AND` and `IN` to obtain the name of the country, its continent, and the maximum inflation rate for each continent in 2015.
    2. For the sake of practice, change all joining conditions to use `ON` instead of `USING`.

In [None]:
-- Select fields
SELECT name, continent, inflation_rate
-- From countries
FROM countries
-- Join to economies
INNER JOIN economies
-- Match on code
USING(code)
-- Where year is 2015
WHERE year = 2015;

# name                   continent       inflation_rate
# Afghanistan            Asia            -1.549
# Angola                 Africa          10.287
# Albania                Europe          1.896
# United Arab Emirates   Asia            4.07
# Argentina              South America   null
# ...

In [None]:
-- Select the maximum inflation rate as max_inf
SELECT MAX(inflation_rate) AS max_inf
-- Subquery using FROM (alias as subquery)
FROM (SELECT name, continent, inflation_rate
      FROM countries
      INNER JOIN economies
      USING(code)
      WHERE year = 2015) AS subquery
-- Group by continent
GROUP BY continent;

# max_inf
# 21.858
# 39.403
# 121.738
# 7.524
# 48.684
# 9.784

In [None]:
-- Select fields
SELECT name, continent, inflation_rate
-- From countries
FROM countries
-- Join to economies
INNER JOIN economies
-- Match on code
ON countries.code = economies.code
-- Where year is 2015
WHERE year = 2015
-- And inflation rate in subquery (alias as subquery)
AND inflation_rate IN (SELECT MAX(inflation_rate) AS max_inf
                       FROM (SELECT name, continent, inflation_rate
                             FROM countries
                             INNER JOIN economies
                             ON countries.code = economies.code
                             WHERE year = 2015) AS subquery
-- Group by continent
GROUP BY continent);

# name        continent       inflation_rate
# Haiti       North America   7.524
# Malawi      Africa          21.858
# Nauru       Oceania         9.784
# Ukraine     Europe          48.684
# Venezuela   South America   121.738
# Yemen       Asia            39.403

## Subquery challenge

Let's test your understanding of the subqueries with a challenge problem! Use a subquery to get 2015 economic data for countries that do **not** have

- `gov_form` of `'Constitutional Monarchy'` or
- `'Republic'` in `their gov_form`.

Here, `gov_form` stands for the form of the government for each country. Review the different entries for `gov_form` in the `countries` table.

Instructions

1. Select the country code, inflation rate, and unemployment rate.
2. Order by inflation rate ascending.
3. Do not use table aliasing in this exercise.

In [None]:
-- Select fields
SELECT code, inflation_rate, unemployment_rate
-- From economies
FROM economies
-- Where year is 2015 and code is not in
WHERE year = 2015 AND code NOT IN
    -- Subquery
    (SELECT code
     FROM countries
     WHERE (gov_form = 'Constitutional Monarchy' OR gov_form LIKE '%Republic%'))
-- Order by inflation rate
ORDER BY inflation_rate;

# code   inflation_rate   unemployment_rate
# AFG    -1.549           null
# CHE    -1.14            3.178
# PRI    -0.751           12
# ROU    -0.596           6.812
# BRN    -0.423           6.9
# ...

## Subquery review

Within which SQL clause are subqueries most frequently found?

`WHERE`.

## Final challenge

Welcome to the end of the course! The next three exercises will test your knowledge of the content covered in this course and apply many of the ideas you've seen to difficult problems. Good luck!

Read carefully over the instructions and solve them step-by-step, thinking about how the different clauses work together.

In this exercise, you'll need to get the country names and other 2015 data in the `economies` table and the `countries` table for **Central American countries with an official language**.

Instructions

1. Select unique country names. Also select the total investment and imports fields.
2. Use a left join with `countries` on the left. (An inner join would also work, but please use a left join here.)
3. Match on `code` in the two tables `AND` use a subquery inside of `ON` to choose the appropriate `languages` records.
4. Order by country name ascending.
5. Use table aliasing but **not** field aliasing in this exercise.

In [None]:
-- Select fields
SELECT DISTINCT name, total_investment, imports
-- From table (with alias)
FROM countries AS c
-- Join with table (with alias)
LEFT JOIN economies AS e
-- Match on code
ON (c.code = e.code
    -- and code in Subquery
    AND c.code IN (SELECT l.code
                   FROM languages AS l
                   WHERE official = 'true'))
-- Where region and year are correct
WHERE region = 'Central America' AND year = 2015
-- Order by field
ORDER BY name;

# name          total_investment   imports
# Belize        22.014             6.743
# Costa Rica    20.218             4.629
# El Salvador   13.983             8.193
# Guatemala     13.433             15.124
# Honduras      24.633             9.353
# Nicaragua     31.862             11.665
# Panama        46.557             5.898

## Final challenge (2)

Whoofta! That was challenging, huh?

Let's ease up a bit and calculate the average fertility rate for each region in 2015.

Instructions

1. Include the name of region, its continent, and average fertility rate aliased as `avg_fert_rate`.
2. Sort based on `avg_fert_rate` ascending.
3. Remember that you'll need to `GROUP BY` all fields that aren't included in the aggregate function of `SELECT`.

In [None]:
-- Select fields
SELECT region, continent, AVG(fertility_rate) AS avg_fert_rate
-- From left table
FROM countries AS c
-- Join to right table
INNER JOIN populations AS p
-- Match on join condition
ON c.code = p.country_code
-- Where specific records matching some condition
WHERE year = 2015
-- Group appropriately
GROUP BY region, continent
-- Order appropriately
ORDER BY avg_fert_rate;

# region             continent   avg_fert_rate
# Southern Europe    Europe      1.42610000371933
# Eastern Europe     Europe      1.49088890022702
# Baltic Countries   Europe      1.60333331425985
# Western Europe     Europe      1.6325000077486
# Eastern Asia       Asia        1.69166668256124
# ...

## Final challenge (3)

Welcome to the last challenge problem. By now you're a query warrior! Remember that these challenges are designed to take you to the limit to solidify your SQL knowledge! Take a deep breath and solve this step-by-step.

You are now tasked with determining the top 10 capital cities in Europe and the Americas in terms of a calculated percentage using `city_proper_pop` and `metroarea_pop` in `cities`.

Do not use table aliasing in this exercise.

Instructions

1. Select the city name, country code, city proper population, and metro area population.
2. Calculate the percentage of metro area population composed of city proper population for each city in `cities`, aliased as `city_perc`.
3. Focus only on capital cities in Europe and the Americas in a subquery.
4. Make sure to exclude records with missing data on metro area population.
5. Order the result by `city_perc` descending.
6. Then determine the top 10 capital cities in Europe and the Americas in terms of this `city_perc` percentage.

In [None]:
-- Select fields
SELECT name, country_code, city_proper_pop, metroarea_pop,  
       -- Calculate city_perc
       city_proper_pop / metroarea_pop * 100 AS city_perc
-- From appropriate table
FROM cities
-- Where
WHERE name IN
    -- Subquery
    (SELECT capital
     FROM countries
     WHERE (continent = 'Europe'
     OR continent LIKE '%America'))
     AND metroarea_pop IS NOT null
-- Order appropriately
ORDER BY city_perc DESC
-- Limit amount
LIMIT 10;

# name         country_code   city_proper_pop   metroarea_pop   city_perc
# Lima         PER            8852000           10750000        82.3441863059998
# Bogota       COL            7878780           9800000         80.3957462310791
# Moscow       RUS            12197600          16170000        75.4334926605225
# Vienna       AUT            1863880           2600000         71.6877281665802
# Montevideo   URY            1305080           1947600         67.0096158981323
# Caracas      VEN            1943900           2923960         66.4818167686462
# Rome         ITA            2877220           4353780         66.0855233669281
# Brasilia     BRA            2556150           3919860         65.2101457118988
# London       GBR            8673710           13879800        62.4918222427368
# Budapest     HUN            1759410           2927940         60.090184211731