**Not python codes. Just notes.**

# Introduction to joins


### Inner Joins

**inner join**
```
SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;
```
**aliasing**
```
SELECT c1.name AS city, c2.name AS country
FROM cities AS c1
INNER JOIN countries AS c2
ON c1.country_code = c2.code;
```
**multiple joins**
```
SELECT *
FROM left_table
  INNER JOIN right_table
    ON left_table.id = right_table.id
  INNER JOIN another_table
    ON left_table.id = another_table.id;
```

In [0]:
-- 6. Select fields
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
  -- 1. From countries (alias as c)
  FROM countries AS c
  -- 2. Join to populations (as p)
  INNER JOIN populations AS p
    -- 3. Match on country code
    ON c.code = p.country_code
  -- 4. Join to economies (as e)
  INNER JOIN economies AS e
    -- 5. Match on country code and year
    on e.code=c.code and e.year=p.year

## had to match year also, because various meaningless combinations were being made

**using**
- if the 'on' matching key is same in both table, we can use 'using' clause


When joining tables with a common field name, e.g.
```
SELECT *
FROM countries
  INNER JOIN economies
    ON countries.code = economies.code
```
You can use USING as a shortcut:
```
SELECT *
FROM countries
  INNER JOIN economies
    USING(code)
```

In [0]:
-- 4. Select fields
select c.name as country, continent, l.name as language, official
  -- 1. From countries (alias as c)
  from countries as c
  -- 2. Join to languages (as l)
  inner join languages as l
    -- 3. Match using code
    using (code)

### Self-ish joins
- Extend the ON in your query to include only those records where the p1.year (2010) matches with p2.year - 5 (2015 - 5 = 2010). This will omit the three entries per country_code that you aren't interested in.

In [0]:
SELECT p1.country_code,
       p1.size AS size2010, 
       p2.size AS size2015,
       -- 1. calculate growth_perc
       ((p2.size - p1.size)/p1.size * 100.0) AS growth_perc
-- 2. From populations (alias as p1)
FROM populations AS p1
  -- 3. Join to itself (alias as p2)
  INNER JOIN populations AS p2
    -- 4. Match on country code
    ON p1.country_code = p2.country_code
        -- 5. and year (with calculation)
        AND p1.year = p2.year - 5;

### Case when and then
Often it's useful to look at a numerical field not as raw data, but instead as being in different categories or groups.

You can use CASE with WHEN, THEN, ELSE, and END to define a new grouping field.

In [0]:
SELECT name, continent, code, surface_area,
    -- 1. First case
    CASE WHEN surface_area > 2000000 THEN 'large'
        -- 2. Second case
        WHEN surface_area > 350000 THEN 'medium'
        -- 3. Else clause + end
        ELSE 'small' END
        -- 4. Alias name
        AS geosize_group
-- 5. From table
FROM countries;

## geosize_group is added as a column

### Inner challenge
The table you created with the added geosize_group field has been loaded for you here with the name countries_plus. Observe the use of (and the placement of) the INTO command to create this countries_plus table:
```
SELECT name, continent, code, surface_area,
    CASE WHEN surface_area > 2000000
            THEN 'large'
       WHEN surface_area > 350000
            THEN 'medium'
       ELSE 'small' END
       AS geosize_group
INTO countries_plus
FROM countries;
```

# Outer joins and cross joins


Outer joins
  - left
  - right
  - full

### left join

In [0]:
SELECT c1.name AS city, code, c2.name AS country,
       region, city_proper_pop
FROM cities AS c1
  -- 1. Join right table (with alias)
  left JOIN countries AS c2
    -- 2. Match on country code
    ON c1.country_code = c2.code
-- 3. Order by descending country code
ORDER BY code DESC;

### right join

In [0]:
-- convert this code to use RIGHT JOINs instead of LEFT JOINs
/*
SELECT cities.name AS city, urbanarea_pop, countries.name AS country,
       indep_year, languages.name AS language, percent
FROM cities
  LEFT JOIN countries
    ON cities.country_code = countries.code
  LEFT JOIN languages
    ON countries.code = languages.code
ORDER BY city, language;
*/

SELECT cities.name AS city, urbanarea_pop, countries.name AS country,
       indep_year, languages.name AS language, percent
FROM languages
right JOIN countries
    ON countries.code = languages.code
  right JOIN cities
    ON cities.country_code = countries.code
ORDER BY city, language;

```
left         |      right

cities       |      languages

countries    |      countries

languages    |       cities
```

### full join

In [0]:
SELECT name AS country, code, region, basic_unit
-- 3. From countries
FROM countries
  -- 4. Join to currencies
  FULL JOIN currencies
    -- 5. Match on code
    USING (code)
-- 1. Where region is North America or null
WHERE region = 'North America ' OR region IS NULL
-- 2. Order by region
ORDER BY region;

In [0]:
-- 7. Select fields (with aliases)
SELECT c1.name AS country, region, l.name as language,
       basic_unit, frac_unit
-- 1. From countries (alias as c1)
FROM countries AS c1
  -- 2. Join with languages (alias as l)
  FULL JOIN languages AS l
    -- 3. Match on code
    USING (code)
  -- 4. Join with currencies (alias as c2)
  FULL JOIN currencies AS c2
    -- 5. Match on code
    USING (code)
-- 6. Where region like Melanesia and Micronesia
WHERE region LIKE 'M%esia';

### cross join
-  (Recall that cross joins do not use ON or USING.)
- produces all possible combinations of the two table

In [0]:
-- 4. Select fields
SELECT c.name AS city, l.name AS language
-- 1. From cities (alias as c)
FROM cities AS c        
  -- 2. Join to languages (alias as l)
  CROSS JOIN languages AS l
-- 3. Where c.name like Hyderabad
WHERE c.name LIKE 'Hyder%';

# Set theory clauses

### unions:
- union = normal one
- union all = normal union but the intersention is counted twice
- union does not do any look up or matching like joins do, tey just stack records on top of each other.
- the first mentioned set of fields will be the set of fields to display

![alt text](sets.png)

In [0]:
-- Select fields from 2010 table
select *
  -- From 2010 table
  from economies2010
	-- Set theory clause
	union
-- Select fields from 2015 table
select *
  -- From 2015 table
  from economies2015
-- Order by code and year
order by code,year;

UNION can also be used to determine all occurrences of a field across multiple tables.

In [0]:
-- Select fields
SELECT code, year
  -- From economies
  FROM economies
	-- Set theory clause
	union all
-- Select fields
SELECT country_code, year
  -- From populations
  FROM populations
-- Order by code, year
ORDER BY code, year;

### Intersect
- only includes those records that are common to both tables and field selected

In [0]:
-- Select fields
SELECT code, year
  -- From economies
  FROM economies
	-- Set theory clause
	intersect
-- Select fields
SELECT country_code, year
  -- From populations
  FROM populations
-- Order by code, year
ORDER BY code, year;

### Except
- allows to include things that are on one table and not in the other

In [0]:
-- Get the names of cities in cities which are not noted as capital cities in countries as a single field result.
-- Select field
SELECT name
  -- From cities
  FROM cities
	-- Set theory clause
	except
-- Select field
SELECT capital
  -- From countries
  FROM countries
-- Order by result
ORDER BY name;

### semi-join
- using sub query
- chose the records if the condition is met for the second table(sub query)

### anti-join
- chose the recors if the condition is not met in the subquery

You are now going to use the concept of a semi-join to identify languages spoken in the Middle East.


In [0]:
-- Select distinct fields
select distinct name
  -- From languages
  from languages
-- Where in statement
WHERE code IN
  -- Subquery
  (select code
   from countries
   where region='Middle East')
-- Order by name
order by name;

### Relating semi-join to a tweaked inner join
Let's revisit the code from the previous exercise, which retrieves languages spoken in the Middle East.
```
SELECT DISTINCT name
FROM languages
WHERE code IN
  (SELECT code
   FROM countries
   WHERE region = 'Middle East')
ORDER BY name;
```
Sometimes problems solved with semi-joins can also be solved using an inner join.
```
SELECT languages.name AS language
FROM languages
INNER JOIN countries
ON languages.code = countries.code
WHERE region = 'Middle East'
ORDER BY language;
```
This inner join isn't quite right. What is missing from this second code block to get it to match with the correct answer produced by the first block?

```
  Ans: DISTINCT
```

all countries in Oceania were not listed in the resulting inner join with currencies. Use an anti-join to determine which countries were not included!

In [0]:
-- 3. Select fields
select code,name
  -- 4. From Countries
  from countries
  -- 5. Where continent is Oceania
  where continent='Oceania'
  	-- 1. And code not in
  	and code not in
  	-- 2. Subquery
  	(select code
  	 from currencies);

Identify the country codes that are included in either economies or currencies but not in populations.
Use that result to determine the names of cities in the countries that match the specification in the previous instruction.

In [0]:
-- Select the city name
select name
  -- Alias the table where city name resides
  from cities AS c1
  -- Choose only records matching the result of multiple set theory clauses
  WHERE country_code IN
(
    -- Select appropriate field from economies AS e
    SELECT e.code
    FROM economies AS e
    -- Get all additional (unique) values of the field from currencies AS c2  
    union
    SELECT c2.code
    FROM currencies AS c2
    -- Exclude those appearing in populations AS p
    except
    SELECT p.country_code
    FROM populations AS p
);

# Subqueries

### Subquery inside where
You'll now try to figure out which countries had high average life expectancies (at the country level) in 2015.

Select all fields from populations with records corresponding to larger than 1.15 times the average you calculated in the first task for 2015.

In [0]:
-- Select fields
select *
  -- From populations
  from populations
-- Where life_expectancy is greater than
where year=2015 and
  -- 1.15 * subquery
  life_expectancy > 1.15*
   (select avg(life_expectancy)
   from populations
  where year=2015);

### Subquery inside select
In this exercise, you'll see how some queries can be written using either a join or a subquery.

You have seen previously how to use GROUP BY with aggregate functions and an inner join to get summarized information from multiple tables.

The code given in query.sql selects the top nine countries in terms of number of cities appearing in the cities table. Recall that this corresponds to the most populous cities in the world. Your task will be to convert the commented out code to get the same result as the code shown.

In [0]:
SELECT countries.name AS country, COUNT(*) AS cities_num
  FROM cities
    INNER JOIN countries
    ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;

/* 
SELECT ___ AS ___,
  (SELECT ___
   FROM ___
   WHERE countries.code = cities.country_code) AS cities_num
FROM ___
ORDER BY ___ ___, ___
LIMIT 9;
*/

In [0]:
-- Ans

SELECT countries.name AS country,
  (SELECT count(*)
   FROM cities
   WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;

### Subquery inside from
The last type of subquery you will work with is one inside of FROM.

You will use this to determine the number of languages spoken for each country, identified by the country's local name! (Note this may be different than the name field and is stored in the local_name field.)

In [0]:
-- Select fields
select local_name, lang_num
  -- From countries
  from countries,
  	-- Subquery (alias as subquery)
  	(select code, count(*) as lang_num
  	 from languages
  	 group by code) AS subquery
  -- Where codes match
  where countries.code=subquery.code
-- Order by descending number of languages
order by lang_num desc

Determine the maximum inflation rate for each continent in 2015

In [0]:
-- Select fields
select max(inflation_rate) as max_inf
  -- Subquery using FROM (alias as subquery)
  FROM (select name, continent, inflation_rate
        from countries
      	inner join economies
        using(code)
        where year=2015) AS subquery
-- Group by continent
group by continent;

Obtain the name of the country, its continent, and the maximum inflation rate for each continent in 2015. 

In [0]:
-- Select fields
select name, continent, inflation_rate
  -- From countries
  from countries
	-- Join to economies
	inner join economies
	-- Match on code
	on countries.code=economies.code
  -- Where year is 2015
  where year=2015 and inflation_rate in
    -- And inflation rate in subquery (alias as subquery)
    (select max(inflation_rate) as max_inf
      FROM (select name, continent, inflation_rate
            from countries
          	inner join economies
            using(code)
            where year=2015) AS subquery
    group by continent);

### subquery in ON statement
In this exercise, you'll need to get the country names and other 2015 data in the economies table and the countries table for Central American countries with an official language.

In [0]:
-- Select fields
SELECT DISTINCT name, total_investment, imports
  -- From table (with alias)
  FROM countries AS c
    -- Join with table (with alias)
    LEFT JOIN economies AS e
      -- Match on code
      ON (c.code = e.code
      -- and code in Subquery
        AND c.code IN (
          SELECT l.code
          FROM languages AS l
          WHERE official = 'true'
        ) )
  -- Where region and year are correct
  WHERE region = 'Central America' AND year = 2015 
-- Order by field
ORDER BY name;

You are now tasked with determining the top 10 capital cities in Europe and the Americas in terms of a calculated percentage using city_proper_pop and metroarea_pop in cities.

In [0]:
SELECT name, country_code, city_proper_pop, metroarea_pop,  
      -- Calculate city_perc
      city_proper_pop / metroarea_pop * 100 AS city_perc
  -- From appropriate table
  FROM cities
  -- Where 
  WHERE name IN
    -- Subquery
    (SELECT capital
     FROM countries
     WHERE (continent = 'Europe'
        OR continent LIKE '%America%'))
       AND metroarea_pop IS not null
-- Order appropriately
ORDER BY city_perc desc
-- Limit amount
limit 10;