# Joining Data in SQL

Joins - the SQL tool that allow us to construct a relationship between objects

- a join shows a result set, containing fields derived from two or more tables
- we must find a related column from the two tables that contains the same type of data
- we will be free to add columns from these two tables to our output
- the columns you use to relate tables must represent the same object such as id
- the tables you are considering need to be logically adjacent

### JOIN + WHERE
- JOIN: is used for connecting the 'table_a' and 'table_b'
- WHERE: used to define the condition or conditions that will determine which will be the connecting points between the two tables

## Inner Join
Inner joins extract only records in which the values in the related columns match. Null values, or values appearing in just one of the two tables and not appearing in the other, are not displayed.
The result will be empty when the matching values does not exist.

The SQL keyword for inner join can be JOIN or INNER JOIN they have the same function.
~~~~sql
SELECT
table_1.column_name(s), table_2.column_name(s)
FROM
table_1
JOIN
table_2 ON table_1.column_name = table_2.column_name;
~~~~
##### Using Aliases
~~~~sql
SELECT
t1.column_name, t1.column_name, …, t2.column_name, …
FROM
 -- table_1 t1 means table_1 as t1
table_1 t1
JOIN
table_2 t2 ON t1.column_name = t2.column_name;
~~~~

#### Dealing with duplicates
You cannot allow yourself to assume there are no duplicate rows in your data. Thus, you can use group by to deal with it
~~~~sql
SELECT t1.column_name, t1.column_name, t2.column_name, t2.column_name
FROM table_1 t1
JOIN table_2 t2 ON t1.column_name = t2.column_name
GROUP BY t1.column_name;
~~~~

### INNER JOIN via USING
When joining tables with a common field name you can use USING as a shortcut:
~~~~sql
SELECT *
FROM countries
INNER JOIN economies
USING(code)
~~~~

## Self Join
applied when a table must join itself
- if you would like to combine certain rows of a table with other rows fo the same table, you need a self-join
- the self-join will reference both implied tables and will treat them as two separate tables in its operations

### Case when and then
Often it's useful to look at a numerical field not as raw data, but instead as being in different categories or groups.
You can use CASE with WHEN, THEN, ELSE, and END to define a new grouping field.

Ex:
Using the countries table, create a new field AS geosize_group that groups the countries into three groups:

- If surface_area is greater than 2 million, geosize_group is 'large'.
- If surface_area is greater than 350 thousand but not larger than 2 million, geosize_group is 'medium'.
- Otherwise, geosize_group is 'small'.

~~~~sql
SELECT name, continent, code, surface_area,
    -- 1. First case
    CASE WHEN surface_area > 2000000 THEN 'large'
        -- 2. Second case
        WHEN surface_area > 350000  THEN 'medium'
        -- 3. Else clause + end
        ELSE 'small' END
        -- 4. Alias name
        AS geosize_group
-- 5. From table
FROM countries;
~~~~

If we want to save the results we can use INTO

~~~~sql
SELECT name, continent, code, surface_area,
    CASE WHEN surface_area > 2000000
            THEN 'large'
       WHEN surface_area > 350000
            THEN 'medium'
       ELSE 'small' END
       AS geosize_group
INTO countries_plus
FROM countries;

SELECT country_code, size,
  CASE WHEN size > 50000000
            THEN 'large'
       WHEN size > 1000000
            THEN 'medium'
       ELSE 'small' END
       AS popsize_group
INTO pop_plus       
FROM populations
WHERE year = 2015;

-- 5. Select fields
SELECT c.name, c.continent, c.geosize_group, p.popsize_group
-- 1. From countries_plus (alias as c)
FROM countries_plus c
  -- 2. Join to pop_plus (alias as p)
  INNER JOIN pop_plus p
    -- 3. Match on country code
    ON c.code = p.country_code
-- 4. Order the table    
ORDER BY geosize_group;
~~~~


## LEFT AND RIGHT JOIN
The LEFT JOIN can deliver a list with all records from the left table, including that does not match any rows from the right table

The RIGHT JOIN has its funcionality identical to LEFT JOIN, with the only difference being that the direction of the operation is inverted.
- right joins are seldom applied in practice.


## FULL JOINS
A FULL JOIN combines a LEFT JOIN and a RIGHT JOIN bringing in all records from both the left and the right table and keep track of the missing values accodingly.

## CROSS JOINS
A a CROSS JOIN will take the values from a certain table and connect them with all the values from the tables we want to join it with.
- connects all the values, not just those that match
- the Cartesian product of the values of two or more sets
- particularly useful when the tables in a database are not well connected
- Recall that cross joins do not use ON or USING


### Tips and Tricks for JOINS
- one should look for a key columns, which are common between the tables involved in the analysis and are necessary to solve the task to hand
- these columns do not neet to be foreign or private keys;

# THEORY CLAUSES

## UNIONs
used to combine a few SELECT statements in a single output
- you can think of it as a tool that allows you to unify tables
### UNION
UNION displays only distinct values in the output
- UNION uses more computational resources (power and storage space)

### UNION ALL
UNION ALL retrieves the duplicates as well

Both can be used by the following approach:
~~~~sql
SELECT N columns
FROM table_1
UNION ALL SELECT N columns
FROM table_2;
~~~~

It is important to know that:
- We have to select the same number of columns from each table.
- These columns should have the same name, should be in the same order, and should contain related data types.

## INTERSECT
INTERSECT only includes those records in common to both tables and fields selected.
- INTERSECT looks for RECORDS in common, not individual key fields like what a join does to match.

## EXCEPT
EXCEPT allows you to include only the records that are in one table, but not the other.
- Only the records that appear in the left table BUT DO NOT appear in the right table are included.

## Semi-joins and Anti-joins
Are used to determine which records to keep in the left table. 
- Semi-joins: In order to combine the two tables together we use a WHERE clause and then use the first query as the condition to check in the WHERE clause.

~~~~sql
-- Select distinct fields
SELECT DISTINCT name
  -- From languages
  FROM languages
-- Where in statement
WHERE country_code IN
  -- Subquery
  (SELECT country_code
   FROM countries
   WHERE region = 'Middle East')
-- Order by name
ORDER BY name;
~~~~

- Anti-joins: Fill in the other space with a NOT to exclude those selected in the subquery.