# Subqueries

Subqueries are queries embedded in a query. They are also called inner queries or nested queries

- they are part of another query, caller an outer query
- a subquery should always be placed within parentheses
- a subquery may return a single value (a scalar), a single row, a single column, or an entire table
- you can have a lot moren than one subquery in your outer query
- allow for better structuring of the outer query
    - thus, each inner query can be thought of in isolation

1. the SQL engine starts by running the inner query
2. then it uses it returned output, which is intermediate, to execute the outer query
3. it is possible to nest inner queries within other inner queries
    - in that case, the SQL engine would execute the innermost query first, and then each subsequent query, until it runs the outermost query last

In the following example we will 

Select all fields from populations with records corresponding to larger than 1.15 times the average of the life_expectancy field, considering data only for the 2015 year. 

~~~sql
-- Select fields
SELECT *
  -- From populations
  FROM populations
-- Where life_expectancy is greater than
WHERE life_expectancy > 1.15 *
  -- 1.15 * subquery
  (SELECT AVG(life_expectancy)
   FROM populations
   WHERE year = 2015)
  AND year = 2015;
~~~
 
## Subqueries with EXISTS - NOT EXISTS Nested Inside WHERE

Exists checks whether certain row values are found within a subquery
- this check is conducted row by row
- it returns a Boolean value
    - if a row value of a subquery exists it returns TRUE - then, the corresponding record of the outer query is extracted
    - if a row value of a subquery doesn't exists it returns FALSE - then, no row value from the outer query is extracted
    
# Subqueries nested in SELECT and FROM
You will use this to determine the number of languages spoken for each country, identified by the country's local name!
~~~~sql
-- Select fields
SELECT local_name, subquery.lang_num
  -- From countries
  FROM countries,
  	-- Subquery (alias as subquery)
  	(SELECT code, COUNT(name) as lang_num
  	 FROM languages
     GROUP BY code) AS subquery
  -- Where codes match
  WHERE countries.code = subquery.code
-- Order by descending number of languages
ORDER BY lang_num DESC;
~~~~

# CASE statements
Case statements are SQL's version of an "IF this THEN that" statement. Case statements have three parts -- a WHEN clause, a THEN clause, and an ELSE clause. 

~~~~sql
CASE WHEN x = 1 THEN 'a'
     WHEN x = 2 THEN 'b'
     ELSE 'c' END as new_colum
~~~~

- When you have completed your statement, be sure to include the term END and give it an alias.

~~~~sql
-- Identify the home team as Bayern Munich, Schalke 04, or neither
SELECT 
	CASE WHEN hometeam_id = 10189 THEN 'FC Schalke 04'
         WHEN hometeam_id = 9823 THEN 'FC Bayern Munich'
         ELSE 'Other' END AS home_team,
	COUNT(id) AS total_matches
FROM matches_germany
-- Group by the CASE statement alias
GROUP BY home_team;
~~~~

## CASE WHEN ... AND THEN
- Add multiple logical conditions to your WHEN clause!
~~~~sql
SELECT date, hometeam_id, awayteam_id,
    CASE WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 'Chelsea home win!' 
    WHEN awayteam_id = 8455AND home_goal < away_goal THEN 'Chelsea away win!' 
    ELSE 'Loss or tie :(' END AS outcome
    FROM match
WHERE hometeam_id = 8455 OR awayteam_id = 8455
~~~~


~~~~sql

SELECT 
	date,
	CASE WHEN hometeam_id = 8634 THEN 'FC Barcelona' 
         ELSE 'Real Madrid CF' END as home,
	CASE WHEN awayteam_id = 8634 THEN 'FC Barcelona' 
         ELSE 'Real Madrid CF' END as away,
	-- Identify all possible match outcomes
	CASE WHEN home_goal > away_goal AND hometeam_id = 8634 THEN 'Barcelona win!'
        WHEN home_goal >  away_goal AND hometeam_id = 8633 THEN 'Real Madrid win!'
        WHEN home_goal < away_goal AND awayteam_id = 8634 THEN 'Barcelona win!'
        WHEN home_goal < away_goal AND awayteam_id = 8633 THEN 'Real Madrid win!'
        ELSE 'Tie!' END AS outcome
FROM matches_spain
WHERE (awayteam_id = 8634 OR hometeam_id = 8634)
      AND (awayteam_id = 8633 OR hometeam_id = 8633);
~~~~

Using CASE tp exclude games where Bologna not won:
We put the CASE WHEN after the WHERE using IS NOT NULL
~~~~sql
-- Select the season, date, home_goal, and away_goal columns
SELECT 
	season,
    date,
	home_goal,
	away_goal
FROM matches_italy
WHERE 
-- Exclude games not won by Bologna
	CASE WHEN hometeam_id = 9857 AND home_goal > away_goal THEN 'Bologna Win'
		WHEN awayteam_id = 9857 AND away_goal > home_goal THEN 'Bologna Win' 
		END IS NOT NULL;
~~~~

## CASE WHEN with aggregate functions
~~~~sql
SELECT season,
       COUNT(CASE WHEN hometeam_id = 8560 AND home_goal > away_goal THEN id END) as home_wins,
       COUNT(CASE WHEN awayteam_id = 8560 AND home_goal < away_goal THEN id END) as away_wins
FROM match
GROUP BY season;
~~~~

### Percentages with CASE and AVG 
~~~~sql
SELECT  season,
        ROUND(AVG(CASE WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 1 
                  WHEN hometeam_id = 8455AND home_goal < away_goal THEN 0 END), 2) AS pct_homewins,
        ROUND(AVG(CASE WHEN awayteam_id = 8455 AND away_goal > home_goal THEN 1
                       WHEN awayteam_id = 8455AND away_goal < home_goal THEN 0 END), 2) AS pct_awaywins
FROM match
GROUPBY season;
~~~~

# Subqueries Again

## Simple subqueries
- Is only processed once in the entire statement

### Subqueries in the WHERE clause
~~~~sql
SELECT home_goal
FROM match
WHERE home_goal > (
    SELECT AVG(home_goal)
    FROM match);
~~~~

~~~~sql
SELECT  team_long_name,  team_short_name AS abbr 
FROM team
WHERE team_api_id IN (
    SELECT hometeam_id
    FROM match
    WHERE country_id = 15722);
~~~~

### Subqueries in the FROM statement
- Useful tool to restructure and transform your data
    - transforming data from long to wide before selecting
    - prefiltering data
- Calculating aggregates of aggregates

~~~~sql
SELECT
	-- Select country name and the count match IDs
    c.name AS country_name,
    COUNT(sub.id) AS matches
FROM country AS c
-- Inner join the subquery onto country
-- Select the country id and match id columns
INNER JOIN (SELECT country_id, id 
           FROM match
           -- Filter the subquery by matches with 10+ goals
           WHERE (home_goal + away_goal) >=10) AS sub
ON c.id = sub.country_id
GROUP BY country_name;
~~~~

~~~~sql
SELECT
	-- Select country, date, home, and away goals from the subquery
    country,
    date,
    home_goal,
    away_goal
FROM 
	-- Select country name, date, and total goals in the subquery
	(SELECT c.name AS country, 
     	    m.date, 
     		m.home_goal, 
     		m.away_goal,
           (m.home_goal + m.away_goal) AS total_goals
    FROM match AS m
    LEFT JOIN country AS c
    ON m.country_id = c.id) AS subq
-- Filter by total goals scored in the main query
WHERE total_goals >= 10;
~~~~

### Subqueries in SELECT
- are used to return a single aggregated value
~~~~sql
SELECT date,  (home_goal + away_goal) AS goals,
       (home_goal + away_goal) -
       (SELECT AVG(home_goal + away_goal)
        FROM match
        WHERE season = '2011/2012') AS diff 
FROM match 
WHERE season = '2011/2012'; 
~~~~

- Need to return a SINGLE value, will generate an error otherwise
- Make sure you have all filters in the right places
    - Properly filter both the main and the subquery!
   
### Best Practices
- Subqueries can be multiple included in SELECT, FROM, WHERE..
- FORMAT YOUR QUERIES!
- Annotate your queries - what it does?

# Correlated Subqueries
Correlated Subqueries are a special kind of subquery that use values from the outer query in order to generate the final results.
- The subquery is re-executed each time a new row in the final data set is returned, in order to properly generate each new piece of information.
- Correlated subqueries are used for special types of calculations, such as advanced joining, filtering, and evaluating of data in the database.