# JOIN statements
© Explore Data Science Academy

## Learning Objectives

In this train we will explore different SQL JOIN statements:

- Inner JOIN 
- Left JOIN
- Cross JOIN
- Full outer JOIN
- Union operator

## Outline
This train is structured as follows:

- Inner JOIN - Finding common information between tables
- Left JOIN - Checking for missing information
- Cross JOIN - Finding combinations of table rows
- Union operator - Stacking the rows of similar tables
- Full outer JOIN - Finding the column-wise Union of two tables

### Loading the database
Load SQL magic commands

In [1]:
%load_ext sql

Load Chinook SQLite database

In [2]:
%%sql 

sqlite:///chinook.db

Chinook database ER diagram:

<img src="https://github.com/Explore-AI/Pictures/blob/master/sqlite-sample-database-color.jpg?raw=true" width=70%/>

_[Image source](https://www.sqlitetutorial.net/sqlite-sample-database/)_

## JOIN Statements

A JOIN statement is a query that returns records from two tables in the form of a single table. Joins are performed using **keys** as the link. There are four main join statements, namely: the left, right, inner and outer join. In this train, we explore them in detail and provide examples of what the SQL queries for implementing them look like. 
Previously, to write queries spanning multiple tables we had to connect the designated tables through adding conditions in the `WHERE` clause that aligned data (i.e. records) between tables that shared the same value for a given column. JOIN statements are a much faster and efficient way of connecting tables in the same way. We also don't have to worry about in-between connections over the tables.

Join statements in SQL are closely related to **set operations** where we regard each of the tables to be joined as a set, and then define the JOIN as an a set operation around overlapping data between the two tables. 

The general syntax of a JOIN statement is as follows:

```sql
SELECT column(s)
FROM table1
<join_type> JOIN table2
ON <Join condition>
```

## 1. INNER JOIN

An INNER JOIN gives the intersection of the two tables, returning the records that have matching values along a given column in both tables. 

Finding the INNER JOIN of two tables is the same as finding the intersection of the two tables. 

Let's consider some examples:

### 1.1 Finding common information between tables
Sometimes artists add a title track to their albums. This is a track that has the same title as the album. Let' write a query that returns albums that have a title track. In English:

    Return rows in the AlbumId column from the albums table, Title from the albums table, and the Name from the tracks table, where the album table Title column matches the tracks table Name column.
    
In SQL:

In [4]:
%%sql

SELECT a.AlbumId, a.Title AS "Album Title", t.Name AS "Track Name" 
FROM albums a
INNER JOIN tracks AS t
ON a.Title = t.Name
LIMIT 10;  -- Remove this line to see the full query output

 * sqlite:///chinook.db
Done.


AlbumId,Album Title,Track Name
2,Balls to the Wall,Balls to the Wall
3,Restless and Wild,Restless and Wild
4,Let There Be Rock,Let There Be Rock
152,Master Of Puppets,Master Of Puppets
11,Out Of Exile,Out Of Exile
16,Black Sabbath,Black Sabbath
18,Body Count,Body Count
19,Chemical Wedding,Chemical Wedding
21,Prenda Minha,Prenda Minha
23,Minha Historia,Minha Historia


As expected, our query returned only the rows from both tables that have the same value on the joining column. Another way to state this is that the query has excluded albums that don't have a title track and tracks that don't have the same name as the album.

### 1.2. Joining multiple tables
Suppose that in the previous example, we were additionally interested in knowing who the artists of the listed albums are. Let's write a query that can achieve this. In English:

    Return rows in the AlbumId from the albums table, Title from the albums table, the Name from the tracks table, and the Name from the artists table, where the album table Title column matches the tracks table Name column, and the artists table Name column where the artists table ArtistId is equal to the album table ArtistId.
    
In SQL:

In [5]:
%%sql

SELECT a.AlbumId, a.Title AS "Album Title", t.Name AS "Track Name", ar.Name AS "Artist Name"
FROM albums AS a
INNER JOIN tracks AS t
ON a.Title = t.Name
INNER JOIN artists AS ar
ON ar.ArtistId = a.ArtistId
LIMIT 10;  -- Remove this line to see the full query output

 * sqlite:///chinook.db
Done.


AlbumId,Album Title,Track Name,Artist Name
2,Balls to the Wall,Balls to the Wall,Accept
3,Restless and Wild,Restless and Wild,Accept
4,Let There Be Rock,Let There Be Rock,AC/DC
11,Out Of Exile,Out Of Exile,Audioslave
16,Black Sabbath,Black Sabbath,Black Sabbath
16,Black Sabbath,Black Sabbath,Black Sabbath
18,Body Count,Body Count,Body Count
19,Chemical Wedding,Chemical Wedding,Bruce Dickinson
21,Prenda Minha,Prenda Minha,Caetano Veloso
23,Minha Historia,Minha Historia,Chico Buarque


Other types of JOINS can be concatenated in the same way. 

## 2. LEFT JOIN

When joining two tables, A LEFT JOIN returns all records from the left table (table 1) and matched records from the right table (table 2).

If no match is found, then the result from the right table is NULL on that row. 

### 2.1. Checking for missing information
One use case for a LEFT JOIN is that it can be used to check for missing information. In this case, let's try to find out what media items have not been bought yet (i.e. are not an item in any invoice). 

**Note that we limit our query results to 10 rows in order to make our output more legible*

In [6]:
%%sql

SELECT t.TrackId, ii.InvoiceId 
FROM tracks t
LEFT JOIN invoice_items ii
ON t.TrackId = ii.TrackId
LIMIT 10;  -- Remove this line to see the full query output

 * sqlite:///chinook.db
Done.


TrackId,InvoiceId
1,108.0
6,2.0
7,
8,2.0
8,214.0
9,108.0
9,319.0
10,2.0
11,
12,2.0


In the results, the tracks that have a value of `None` (i.e. `NULL`) for InvoiceId are the ones that have not been purchased yet. If we want to only focus on these 'unpopular' tracks, we can do that by adding a WHERE clause as follows:

In [7]:
%%sql

SELECT t.TrackId, ii.InvoiceId 
FROM tracks AS t
LEFT JOIN invoice_items AS ii
ON t.TrackId = ii.TrackId
WHERE ii.InvoiceId IS NULL
LIMIT 10;  -- Remove this line to see the full query output

 * sqlite:///chinook.db
Done.


TrackId,InvoiceId
7,
11,
17,
18,
22,
23,
27,
29,
33,
34,


Armed with this information, Chinook can either take down  these 'unpopular' tracks or promote them by recommending them to their customers.

A RIGHT JOIN works exactly the same way as a LEFT JOIN except that it keeps all information in the right table instead of the left. We can also easily implement a RIGHT JOIN by swapping the tables around in a LEFT JOIN statement.

## 3. CROSS JOIN

Taking the CROSS JOIN of two or more tables is similar to taking their **cartesian product**. The result is a set of rows containing all possible ordered combinations of rows from the two tables. 

Let's illustrate this with an example.

### 3.1. Finding combinations of table rows
Let's suppose that, as part of a new business strategy, Chinook wants to develop new product categories for their media items that are based on genre and media type. To do this, we write a query that will list all possible product categories (i.e. all possible genre and media type combinations).

In [9]:
%%sql 

SELECT g.Name AS "Genre", m.Name AS "Media Type"
FROM genres AS g
CROSS JOIN media_types AS m
LIMIT 10;  -- Remove this line to see the full query output

 * sqlite:///chinook.db
Done.


Genre,Media Type
Rock,MPEG audio file
Rock,Protected AAC audio file
Rock,Protected MPEG-4 video file
Rock,Purchased AAC audio file
Rock,AAC audio file
Jazz,MPEG audio file
Jazz,Protected AAC audio file
Jazz,Protected MPEG-4 video file
Jazz,Purchased AAC audio file
Jazz,AAC audio file


## 4. UNION
The UNION operator is useful for when we need to join tables or query result tables along the vertical axis, i.e., when we want to combine/stack rows from the two tables. However, there are a few rules that we have to follow:

1. The number of columns in both tables needs to be equal.
2. The columns that we want to combine need to have compatible datatypes.
3. We can only apply the ORDER BY clause to the combined (i.e. UNIONised) table and not to the individual tables.
4. The GROUP BY clause can only be applied the individual tables (i.e. before the UNION operation) and not the combined result.

Let's see what this looks like in practice.

### 4.1. Stacking the rows of similar tables
We can use the UNION operator to combine information from the tracks and albums tables into a single list as follows:

In [11]:
%%sql

SELECT t.AlbumId, t.Name, "Artist" AS "Category"
FROM tracks AS t

UNION

SELECT a.AlbumId, a.Title, "Album" AS "Category"
FROM albums AS a

LIMIT 10;  -- Remove this line to see the full query output

 * sqlite:///chinook.db
Done.


AlbumId,Name,Category
1,Breaking The Rules,Artist
1,C.O.D.,Artist
1,Evil Walks,Artist
1,For Those About To Rock (We Salute You),Artist
1,For Those About To Rock We Salute You,Album
1,Inject The Venom,Artist
1,Let's Get It Up,Artist
1,Night Of The Long Knives,Artist
1,Put The Finger On You,Artist
1,Snowballed,Artist


As you can see, the two result tables have been combined into one longer table containing rows from both of them. We aso used a trick here to label where each row came from. First we used `SELECT` on a constant string, which returned a column that has that string for every row in the table. For example: 

In [12]:
%%sql 

SELECT Name, "A string to broadcast" AS "Label"
FROM genres
LIMIT 10;  -- Remove this line to see the full query output

 * sqlite:///chinook.db
Done.


Name,Label
Rock,A string to broadcast
Jazz,A string to broadcast
Metal,A string to broadcast
Alternative & Punk,A string to broadcast
Rock And Roll,A string to broadcast
Blues,A string to broadcast
Latin,A string to broadcast
Reggae,A string to broadcast
Pop,A string to broadcast
Soundtrack,A string to broadcast


**Broadcasting** a string to all rows of each resulting table in this way means that we can 'label' each row when using the UNION operator.

The UNION operator will not emit any duplicate rows between the two tables. However, we can include these duplicate rows by adding the `ALL` keyword to the UNION clause, i.e., `UNION ALL`. 

## 5. OUTER JOIN

We perform an OUTER JOIN of two tables when we want to keep all rows from at least one of the tables. There are three types of outer joins:

- LEFT OUTER JOIN - this is exactly the same as a LEFT JOIN, it retains all the rows in the left table and only returns values in the right table where the rows have a match on the joining column. Rows without such matches will have NULL values.

- RIGHT OUTER JOIN - this is exactly the same as a RIGHT JOIN, it retains all the rows in the right table and only returns values in the left table where the rows have a match on the joining column. Rows without such matches will have NULL values.

- FULL OUTER JOIN - this will retain all the rows from both tables where there is a match on the joining column. In other words, a FULL OUTER JOIN is the same as taking the left and right joins of two tables simultaneously. This particular type of join statement is similar to the union set operator. 

Let's do an example.

### 5.1.  FULL OUTER JOIN
In some cases, artists will create eponymous albums, i.e., albums with the same name as the artist. Write a query that will highlight all eponymous albums as well as non-eponymous albums.

At the moment SQLite does not support the `FULL OUTER JOIN` statement. As such, we will emulate it using a LEFT JOIN and a UNION operator.

In [13]:
%%sql

SELECT ar.Name, a.Title
FROM artists ar
LEFT JOIN albums AS a
ON ar.Name = a.Title

UNION ALL

SELECT ar.Name, a.Title
FROM albums AS a 
LEFT JOIN artists AS ar
ON ar.Name = a.Title

LIMIT 10;  -- Remove this line to see the full query output

 * sqlite:///chinook.db
Done.


Name,Title
AC/DC,
Accept,
Aerosmith,
Alanis Morissette,
Alice In Chains,
Antônio Carlos Jobim,
Apocalyptica,
Audioslave,Audioslave
BackBeat,
Billy Cobham,


As you can see, this query combines information from both tables. But only has complete information only on rows where the joining condition is satisfied. 

## Conclusion

In this train, we discussed how different tables can be combined along the horizontal and vertical axis using JOIN statements, particularly:
- LEFT JOIN
- INNER JOIN
- CROSS JOIN
- UNION
- FULL OUTER JOIN

JOIN statements can be a powerful tool in SQL when used correctly. It is important to note that key columns are a good starting point when thinking about which columns to join any two tables on.

## Additional Links 

- [SQL JOIN statement quick reference](https://db.apache.org/derby/docs/10.13/ref/rrefsqlj29840.html)
- [The UNION operator](https://www.sqlitetutorial.net/sqlite-union/)
- [Emulating the FULL OUTER JOIN in SQLite](https://www.sqlitetutorial.net/sqlite-full-outer-join/)