# Introduction to Set Theory 
© Explore Data Science Academy

## Objectives
In this train you will learn how to:
- Understand the use of set theory in SQL;
- Learn how to apply the UNION, INTERSECT, and EXCEPT operators on a database;
- Understand the rules of set operations in SQL.


## Outline
This train is structured as follows:
- An introduction to set theory 
- The UNION Operators 
- The INTERSECT Operators
- The EXCEPT Operators 

## An introduction to set theory 

Although you can interact with the rows of data from a database table or from multiple tables joined together using JOIN statements, relational databases are really centered around sets. In this train we will be learning about set theory operators namely, `UNION`, `INTERSECT` and `EXCEPT` in SQL. These fundamental operators have been derived from [set theory](https://www.youtube.com/watch?v=tyDKR4FG3Yw) and are used to perform various operations on a database. Before we jump into the use of set operators in SQL, let's go ahead and look at the theory behind it all. 

### What is a set?

- A *set* is a collection of *objects*, each of which is an element of the set. For example:

    - The set of outcomes of a single dice roll is {1, 2, 3, 4, 5, 6}. 1 is an element of this set, but 7 is not.

    - The set of possible traffic light signals is {Red, Amber, Green}. Green is an element of this set, but Purple is not.
    
    
### Subsets

- A subset is a set which is wholly contained in another set.

    - In the case of traffic light signal possibilities, i.e., the set: {Red, Amber, Green},  {Red, Green} is a subset of the full set, as are {Amber} and {Amber, Green}.
    
    - However, {Red, Purple} is not a subset.
    
    
From a set theory perspective, you can consider a database table as a set and each of its rows as individual elements. This way, queries are a way of selecting a desired **subset** or portion of rows of data in the database. In general, the result set of a query can be considered as a new table that has similar characteristics to a standard database table – this query result is now a set and can be used to write new queries. 


### Set operators

**Union**
* $A \cup B$ – union of sets $A$ and $B$; 
* Combines Table $A$ and Table $B$ removing all duplicates 

**Intercept**
* $A \cap B$ – intersection of sets $A$ and $B$; 
* Returns subset found in Table $A$ and Table $B$

**Except**
* $A$ – $B$ –  everything in set $A$ except set $B$;
* Returns Table $A$ minus any overlap with Table $B$.  

We are almost there, we just need to go over some rules before we can start using set operators. 

### Set operation rules

These are the set operation rules that must be followed when using the above listed set operators in a query:

1. The number of columns in both tables needs to be equal.
2. The columns from each table that we want to combine must contain compatible datatypes.
3. We can only apply the ORDER BY clause to the combined (i.e. UNIONised) table and not to the individual tables.
4. The GROUP BY clause can only be applied the individual tables (i.e. before the UNION operation) and not the combined result.

Now that we have all our bases covered, let's explore how each operator works using the Northwind database. 

## Loading the database
Load SQL magic command to set up SQL environment for operations. 

In [1]:
%load_ext sql

Load Northwind SQLite database

In [2]:
%%sql 

sqlite:///Northwind_small.sqlite

In this train we will be using the Northwind database, which contains the sales data for a fictitious company called “Northwind Traders,". The primary operations of the company include global imports and exports of specialty foods. 

For your convenience, below is an ER diagram of the Northwind database:

<img src= "https://github.com/Explore-AI/Pictures/blob/master/Northwind_ERD.png?raw=true" width=100%/>



_[Source](https://github.com/jpwhite3/northwind-SQLite3)_

## The UNION Operator
The UNION operator is used to combine table rows from two or more different queries into one result. 
  
Below is a VENN diagram that will help illustrate how the UNION operator works: 

<img src= "https://github.com/Explore-AI/Pictures/blob/master/set_union.png?raw=true" width=70%/>


We can define the **union** between Table $A$ and $B$ as $A \cup B$ - the set of all elements that belong in *either* $A$ or $B$.

When a UNION operator is employed, it systematically combines two or more sets together removing any duplication of rows by elimination. However, if you want to keep all rows from both tables (i.e. including duplicates) you can use the `ALL` keyword. We will see how this works in a bit. 

The general syntax of a `UNION` operator is as follows:

```sql
SELECT column(s) FROM table 1
UNION 
SELECT column(s) FROM table 2 

``` 

## The UNION of Tables 
Now that we have learnt the basics of UNION operators let's see what it looks like in practice. 

We can use the UNION operator to combine information from the customer and supplier tables into a single table. 


In [6]:
%%sql 

SELECT ID, address, region, country, ContactTitle, ContactName  FROM Customer
UNION 
SELECT ID, address, region, country, ContactTitle, ContactName FROM Supplier
LIMIT 10; -- Remove this line to see the full query output

 * sqlite:///Northwind_small.sqlite
Done.


Id,Address,Region,Country,ContactTitle,ContactName
1,49 Gilbert St.,British Isles,UK,Purchasing Manager,Charlotte Cooper
2,P.O. Box 78934,North America,USA,Order Administrator,Shelley Burke
3,707 Oxford Rd.,North America,USA,Sales Representative,Regina Murphy
4,9-8 Sekimai Musashino-shi,Eastern Asia,Japan,Marketing Manager,Yoshi Nagase
5,Calle del Rosal 4,Southern Europe,Spain,Export Administrator,Antonio del Valle Saavedra
6,92 Setsuko Chuo-ku,Eastern Asia,Japan,Marketing Representative,Mayumi Ohno
7,74 Rose St. Moonie Ponds,Victoria,Australia,Marketing Manager,Ian Devling
8,29 King's Way,British Isles,UK,Sales Representative,Peter Wilson
9,Kaloadagatan 13,Northern Europe,Sweden,Sales Agent,Lars Peterson
10,Av. das Americanas 12.890,South America,Brazil,Marketing Manager,Carlos Diaz


### Sorting UNION query results
We can sort the result of a query by adding the `ORDER` clause after the last query. When specifying column names in the `ORDER BY` clause you will need to choose from the column names in the first table of the query. Let's give this a try. 

In [8]:
%%sql 

SELECT City, Region FROM Customer
WHERE Region='North America'
UNION
SELECT City, Region FROM Supplier
WHERE Region='North America'
ORDER BY City
LIMIT 10; -- Remove this line to see the full query output

 * sqlite:///Northwind_small.sqlite
Done.


City,Region
Albuquerque,North America
Anchorage,North America
Ann Arbor,North America
Bend,North America
Boise,North America
Boston,North America
Butte,North America
Elgin,North America
Eugene,North America
Kirkland,North America


As mentioned, an extension of the UNION operator is the addition of the `ALL` keyword i.e `UNION ALL` let's put this to practice. 

In [10]:
%%sql 

SELECT city, region, country FROM Employee
UNION ALL 
SELECT city, region, country FROM Supplier
LIMIT 10; -- Remove this line to see the full query output

 * sqlite:///Northwind_small.sqlite
Done.


City,Region,Country
Seattle,North America,USA
Tacoma,North America,USA
Kirkland,North America,USA
Redmond,North America,USA
London,British Isles,UK
London,British Isles,UK
London,British Isles,UK
Seattle,North America,USA
London,British Isles,UK
London,British Isles,UK


It's clear to see that this new table contains information combined from the Employee and Supplier tables but unlike the `UNION` operator the `UNION ALL` statement includes all the duplicate entries of Cities taken from both tables.  

## The INTERSECT Operator

An INTERSECT operator is used to returns rows that are in common between two tables; it returns only the unique rows that are in both Table A *and* Table B. This operator proves imperative when you want to find results that are in common between two queries, saving the subset to a new table.  

Below is a VENN diagram that will help illustrate how the INTERSECT operator works: 

<img src= "https://github.com/Explore-AI/Pictures/blob/master/intersect.png?raw=true" width=70%/>

$A \cap B$ – intersection of sets $A$ and $B$- An intersection results in a new table containing only the shared rows from Table A and Table B. 

The general syntax of a INTERSECT operator is as follows:

```sql
SELECT column(s) FROM table 1
INTERSECT
SELECT column(s) FROM table 2 

```

## The INTERSECTION of Tables 

Often there is correlation in the supply and demand of goods when customers and employees are from the same Region and City. Let's verify this fact using the INTERSECT operator:

In [11]:
%%sql 

SELECT Region, City
FROM Customer
INTERSECT 
SELECT Region, City
FROM Employee
ORDER BY City

 * sqlite:///Northwind_small.sqlite
Done.


Region,City
North America,Kirkland
British Isles,London
North America,Seattle


In a similar way, we can find the Countries where there is a correlation between customers and employees. 

In [12]:
%%sql 

SELECT Country FROM Customer
INTERSECT 
SELECT Country FROM Employee

 * sqlite:///Northwind_small.sqlite
Done.


Country
UK
USA


As expected, there is a correlation in employees and customers from the UK and USA, this may be due to larger companies taking on more employees in these regions where there is a high demand for goods by customers. Now that we have learned about the UNION and INTERSECT operator, let's discuss the EXCEPT operator next.

## The EXCEPT Operator 

Just like the UNION and INTERSECT operator, the EXCEPT operator has its own set of uses in SQL queries. 

The EXCEPT operator is employed to return all records from one table, while excluding all alike records from another. It returns rows that are unique to one table.  When an EXCEPT operator is executed, it will include all rows in TABLE A, excluding all the rows it has in common with TABLE B. 

Below is a VENN diagram that will help illustrate how the EXCEPT operator works:

<img src= "https://github.com/Explore-AI/Pictures/blob/master/Except.png?raw=true" width=70%/> 

The general syntax of the EXCEPT operator is as follows:

```sql
SELECT column(s) FROM table 1
EXCEPT
SELECT column(s) FROM table 2 
```

## The EXCEPTION between Tables

We can use the EXCEPT statement to exclude commonly shared postal codes, regions and cities between the customer and employee tables. The information generated from this query may highlight the regional areas in a Country where potential company branches can be opened to meet the needs of customers. 

In [14]:
%%sql 

SELECT Postalcode, region AS 'Regional area', city 
FROM Customer
EXCEPT
SELECT Postalcode, region, city 
FROM Employee
ORDER BY City
LIMIT 10; -- Remove this line to see the full query output

 * sqlite:///Northwind_small.sqlite
Done.


PostalCode,Regional area,City
52066,Western Europe,Aachen
87110,North America,Albuquerque
99508,North America,Anchorage
8022,Southern Europe,Barcelona
3508,South America,Barquisimeto
24100,Southern Europe,Bergamo
12209,Western Europe,Berlin
3012,Western Europe,Bern
83720,North America,Boise
14776,Western Europe,Brandenburg


The supplier company wants to find out all the companies it deals excluding the companies it uses specifically for the shipment of goods. Let's see how we can help them by using the EXCEPT statement. 

In [16]:
%%sql 

SELECT CompanyName AS 'Company' FROM Supplier
EXCEPT 
SELECT CompanyName FROM Shipper
LIMIT 10; -- Remove this line to see the full query output

 * sqlite:///Northwind_small.sqlite
Done.


Company
Aux joyeux ecclésiastiques
Bigfoot Breweries
Cooperativa de Quesos 'Las Cabras'
Escargots Nouveaux
Exotic Liquids
Formaggi Fortini s.r.l.
Forêts d'érables
"G'day, Mate"
Gai pâturage
Grandma Kelly's Homestead


## Conclusion 

In this train we learned about set theory in the context of SQL. We went on to learn about set operations namely UNION, UNION ALL, EXCEPT, and INTERSECT operators which work on complete rows from two queries to generate one result. We learned the basic syntax used for each operator and the set theory rules that must be adhered to, to create successful queries. 

Set operations provide an ingenious way to work with different result sets in SQL. Set operators are integral for comparing all columns involved in the query at once. We encourage you to test yourself and the knowledge you have learned by performing more SQL queries using set operations. 

## Appendix
Links to additional resources to help with the understanding of concepts presented in this train:

- [Set Operators](https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm)
- [EXCEPT and INTERSECT operators in SQL](https://www.red-gate.com/simple-talk/sql/performance/the-except-and-intersect-operators-in-sql-server/)