## Union

You have two new tables, `economies2010` and `economies2015`, available to you. The `economies` table is also included for reference.

Instructions

1. Combine the two new tables into one table containing all of the fields in `economies2010`.
2. Sort this resulting single table by country code and then by year, both in ascending order.

In [None]:
-- Select fields from 2010 table
SELECT *
-- From 2010 table
FROM economies2010
-- Set theory clause
UNION
-- Select fields from 2015 table
SELECT *
-- From 2015 table
FROM economies2015
-- Order by code and year
ORDER BY code, year;

# code   year   income_group          gross_savings
# AFG    2010   Low income            37.133
# AFG    2015   Low income            21.466
# AGO    2010   Upper middle income   23.534
# AGO    2015   Upper middle income   -0.425
# ALB    2010   Upper middle income   20.011
# ALB    2015   Upper middle income   13.84
# ...

## Union (2)

`UNION` can also be used to determine all occurrences of a field across multiple tables. Try out this exercise with no starter code.

Instructions

1. Determine all (non-duplicated) country codes in either the `cities` or the `currencies` table. The result should be a table with only one field called `country_code`.
2. Sort by `country_code` in alphabetical order.

In [None]:
-- Select field
SELECT country_code
-- From cities
FROM cities
-- Set theory clause
UNION
-- Select field
SELECT code
-- From currencies
FROM currencies
-- Order by country_code
ORDER BY country_code;

# country_code
# ABW
# AFG
# AGO
# AIA
# ALB
# ...

## Union all

As you saw, duplicates were removed from the previous two exercises by using `UNION`.

To include duplicates, you can use `UNION ALL`.

Instructions

1. Determine all combinations (include duplicates) of country code and year that exist in either the `economies` or the `populations` tables. Order by `code` then `year`.
2. The result of the query should only have two columns/fields. Think about how many records this query should result in.
3. You'll use code very similar to this in your next exercise after the video. Make note of this code after completing it.

In [None]:
-- Select fields
SELECT code, year
-- From economies
FROM economies
-- Set theory clause
UNION ALL
-- Select fields
SELECT country_code, year
-- From populations
FROM populations
-- Order by code, year
ORDER BY code, year;

# code   year
# ABW    2010
# ABW    2015
# AFG    2010
# AFG    2010
# AFG    2015
# ...

## Intersect

`UNION ALL` will extract all records from two tables, while `INTERSECT` will only return records that both tables have in common. In this exercise, you will create a similar query as before, however, this time you will look at the records in common for country code and year for the `economies` and `populations` tables.

Note the number of records from the result of this query compared to the similar `UNION ALL` query result of 814 records.

Instructions

1. Use `INTERSECT` to determine the records in common for country code and year for the `economies` and `populations` tables.
2. Again, order by `code` and then by `year`, both in ascending order.

In [None]:
-- Select fields
SELECT code, year
-- From economies
FROM economies
-- Set theory clause
INTERSECT
-- Select fields
SELECT country_code, year
-- From populations
FROM populations
-- Order by code and year
ORDER BY code, year;

# code   year
# AFG    2010
# AFG    2015
# AGO    2010
# AGO    2015
# ALB    2010
# ...

## Intersect (2)

As you think about major world cities and their corresponding country, you may ask _which countries also have a city with the same name as their country name?_

Instructions

1. Use `INTERSECT` to answer this question with `countries` and `cities`!

In [None]:
-- Select fields
SELECT name
-- From countries
FROM countries
-- Set theory clause
INTERSECT
-- Select fields
SELECT name
-- From cities
FROM cities;

# name
# Singapore

## Review union and intersect

Instructions 

1. Which of the following combinations of terms and definitions is correct?

`INTERSECT`: returns only records appearing in both tables.

## Except

Get the names of cities in `cities` which are not noted as capital cities in countries as a single field result.

Note that there are some countries in the world that are not included in the `countries` table, which will result in some cities not being labeled as capital cities when in fact they are.

Instructions

1. Order the resulting field in ascending order.
2. Can you spot the city/cities that are actually capital cities which this query misses?

In [None]:
-- Select field
SELECT name
-- From cities
FROM cities
-- Set theory clause
EXCEPT
-- Select field
SELECT capital
-- From countries
FROM countries
-- Order by result
ORDER BY name;

# name
# Abidjan
# Ahmedabad
# Alexandria
# Almaty
# Auckland
# ...

## Except (2)

Now you will complete the previous query in reverse!

Determine the names of capital cities that are **not** listed in the `cities` table.

Instructions

1. Order by `capital` in ascending order.
2. The `cities` table contains information about 236 of the world's most populous cities. The result of your query may surprise you in terms of the number of capital cities that **do not** appear in this list!

In [None]:
-- Select field
SELECT capital
-- From countries
FROM countries
-- Set theory clause
EXCEPT
-- Select field
SELECT name
-- From cities
FROM cities
-- Order by ascending capital
ORDER BY capital;

# capital
# Agana
# Amman
# Amsterdam
# Andorra la Vella
# Antananarivo
# ...

## Semi-join

You are now going to use the concept of a semi-join to identify languages spoken in the Middle East.

Instructions

1. Begin by selecting all country codes in the Middle East as a single field result using `SELECT`, `FROM`, and `WHERE.
2. 
    1. Below the commented code, select only unique `languages` by name appearing in the languages table.
    2. Order the resulting single field table by `name` in ascending order.
3. 
    1. Combine the previous two queries into one query by adding a `WHERE IN` statement to the `SELECT DISTINCT` query to determine the unique languages spoken in the Middle East.
    2. Order the result by `name` in ascending order.

In [None]:
-- Select code
SELECT code
-- From countries
FROM countries
-- Where region is Middle East
WHERE region LIKE 'Middle East';

# code
# ARE
# ARM
# AZE
# BHR
# GEO
# ...

In [None]:
-- Query from step 1:
/*
SELECT code
FROM countries
WHERE region = 'Middle East';
*/

-- Select field
SELECT DISTINCT name
-- From languages
FROM languages
-- Order by name
ORDER BY name;

# name
# Afar
# Afrikaans
# Akyem
# Albanian
# Alsatian
# ...

In [None]:
-- Query from step 2
SELECT DISTINCT name
FROM languages
-- Where in statement
WHERE code IN
    -- Query from step 1
    -- Subquery
    (SELECT code
    FROM countries
    WHERE region = 'Middle East')
-- Order by name
ORDER BY name;

# name
# Arabic
# Aramaic
# Armenian
# Azerbaijani
# Azeri
# ...

## Relating semi-join to a tweaked inner join

Let's revisit the code from the previous exercise, which retrieves languages spoken in the Middle East.

```
SELECT DISTINCT name
FROM languages
WHERE code IN
    (SELECT code
    FROM countries
    WHERE region = 'Middle East')
ORDER BY name;
```

Sometimes problems solved with semi-joins can also be solved using an inner join.

```
SELECT languages.name AS language
FROM languages
INNER JOIN countries
ON languages.code = countries.code
WHERE region = 'Middle East'
ORDER BY language;
```

This inner join isn't quite right. What is missing from this second code block to get it to match with the correct answer produced by the first block?

`DISTINCT`.

## Diagnosing problems using anti-join

Another powerful join in SQL is the anti-join. It is particularly useful in identifying which records are causing an incorrect number of records to appear in join queries.

You will also see another example of a subquery here, as you saw in the first exercise on semi-joins. Your goal is to identify the currencies used in Oceanian countries!

Instructions

1. Begin by determining the number of countries in `countries` that are listed in Oceania using `SELECT`, `FROM`, and `WHERE`.
2. 
    1. Complete an inner join with `countries AS c1` on the left and `currencies AS c2` on the right to get the different currencies used in the countries of Oceania.
    2. Match `ON` the `code` field in the two tables.
    3. Include the country `code`, country `name`, and `basic_unit AS currency`.
    4. Observe the query result and make note of how many _different_ countries are listed here.
3. 
    1. Note that not all countries in Oceania were listed in the resulting inner join with `currencies`. Use an anti-join to determine which countries were not included!
    2. Use `NOT IN` and `(SELECT code FROM currencies)` as a subquery to get the country code and country name for the Oceanian countries that are not included in the `currencies` table.

In [None]:
-- Select statement
SELECT COUNT(name)
-- From countries
FROM countries
-- Where continent is Oceania
WHERE continent LIKE 'Oceania';

# count
# 19

In [None]:
-- Select fields (with aliases)
SELECT c1.code, name, continent, basic_unit AS currency
-- From countries (alias as c1)
FROM countries AS c1
-- Join with currencies (alias as c2)
INNER JOIN currencies AS c2
-- Match on code
ON c1.code = c2.code
-- Where continent is Oceania
WHERE c1.continent LIKE 'Oceania';

# code   name               continent   currency
# AUS    Australia          Oceania     Australian dollar
# PYF    French Polynesia   Oceania     CFP franc
# KIR    Kiribati           Oceania     Australian dollar
# MHL    Marshall Islands   Oceania     United States dollar
# NRU    Nauru              Oceania     Australian dollar
# ...

In [None]:
-- Select fields
SELECT code, name
-- From Countries
FROM countries
-- Where continent is Oceania
WHERE continent = 'Oceania'
-- And code not in
AND code NOT IN
-- Subquery
    (SELECT code
    FROM currencies);
    
# code   name               continent   currency
# AUS    Australia          Oceania     Australian dollar
# PYF    French Polynesia   Oceania     CFP franc
# KIR    Kiribati           Oceania     Australian dollar
# MHL    Marshall Islands   Oceania     United States dollar
# NRU    Nauru              Oceania     Australian dollar
# ...

## Set theory challenge

Congratulations! You've now made your way to the challenge problem for this third chapter. Your task here will be to incorporate two of `UNION`/`UNION ALL`/`INTERSECT`/`EXCEPT` to solve a challenge involving three tables.

In addition, you will use a subquery as you have in the last two exercises! This will be great practice as you hop into subqueries more in Chapter 4!

Instructions

1. Identify the country codes that are included in either `economies` or `currencies` but not in `populations`.
2. Use that result to determine the names of cities in the countries that match the specification in the previous instruction.

In [None]:
-- Select the city name
SELECT name
-- Alias the table where city name resides
FROM cities AS c1
-- Choose only records matching the result of multiple set theory clauses
WHERE country_code IN
    (-- Select appropriate field from economies AS e
    SELECT e.code
    FROM economies AS e
    -- Get all additional (unique) values of the field from currencies AS c2  
    UNION
    SELECT c2.code
    FROM currencies AS c2
    -- Exclude those appearing in populations AS p
    EXCEPT
    SELECT p.country_code
    FROM populations AS p);
    
# name
# Bucharest
# Kaohsiung
# New Taipei City
# Taichung
# Tainan
# ...