# Spring 2019 | CS 6400

## Relational Algebra
___

#### Closed Algebra

Closed algebra is the reason we don't run around in loincloth anymore.<br>
Take the expression
$$
\left(\left(\left(7*\left(5+3\right)-21\right)*3\right)/\left(10+5\right)\right) * 3 = \:\:?
$$

for example.

Closed algebra tells us that each operation returns a rational number:
$$
\begin{align*}
\left(\left(\left(7*\left(8\right)-21\right)*3\right)/\left(10+5\right)\right) * 3 &= ?\\ 
\left(\left(\left(56-21\right)*3\right)/\left(10+5\right)\right) * 3 &= ?\\ 
\left(\left(35*3\right)/\left(10+5\right)\right) * 3 &= ?\\
\left(105/\left(10+5\right)\right) * 3 &= ?\\ 
\left(105/15\right) * 3 &= ?\\ 
7 * 3 &= 21\\
\end{align*}
$$

The power of closed algebra is the fact that each operation is a thought that builds upon the previous operation or thought. Building thoughts sequentially this way allows us to build higher level models.
___

#### Relational Algebra Operators

Set operators:

***Union***: $R \cup S$<br>
***Intersection***: $R \cap S$<br>
***Set Difference***: $R\: \backslash \: S$<br>
***Cartesian Product***: $R \times  S$<br>

Projection operator:

***Projection***: $\pi_{\text{A1}, \text{A2}, \ldots, \text{An}}\left(R\right)$

The projection operator eliminates columns from a result.

Selection operator:

***Selection***: $\sigma_{\text{Expression}}\left(R\right)$

The selection operator eliminates rows from a result.

Constructor or Join operations:

***Natural Join***: $R * S$ or $R \Join S$<br>
***Left Join***: $R =\Join S$<br>
***Right Join***: $R \Join= S$<br>
***Full Outer Join***: $R =\Join= S$<br>
***Theta Join***: $R \Join_{\Theta} S$<br>

Special operators:

***Divideby***: $R \div S$

Divideby allows universal quantification as in relational calculus.

***Rename***: $\rho_{\left[\text{A1B1}, \text{A2B2}, \ldots, \text{AnBn}\right]}$

Rename lets us rename and alias columns.

#### Selection $\sigma_{\text{Expression}}\left(R\right)$

Now let's deep dive the selection operator.

*Given*:

**RegularUser**

| Email | BirthYear | Sex | CurrentCity | HomeTown
| --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | Austin | Austin |
| user3@gatech.edu | 1982 | F | Portland | Austin |
| user4@gatech.edu | 1975 | M | Dallas | Tucson |
| user5@gatech.edu | 1975 | M | Dallas | Atlanta |
___
##### `SELECT *`
Query:<br>
*Find all RegularUsers.*<br>

$\sigma\left(RegularUser\right)$

***Result***:

| Email | BirthYear | Sex | CurrentCity | HomeTown
| --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | Austin | Austin |
| user3@gatech.edu | 1982 | F | Portland | Austin |
| user4@gatech.edu | 1975 | M | Dallas | Tucson |
| user5@gatech.edu | 1975 | M | Dallas | Atlanta |
___
##### `SELECT * WHERE CONDITION`
Query:<br>
*Find all RegularUsers with `HomeTown` = Austin.*<br>

$\sigma_{\text{HomeTown = 'Austin'}}\left(RegularUser\right)$

***Result***:

| Email | BirthYear | Sex | CurrentCity | HomeTown
| --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | Austin | Austin |
| user3@gatech.edu | 1982 | F | Portland | Austin |

Simple expressions are allowed within the selection operator:
* AttributeName = , < , $\leq$, > , $\geq$ , $\neq$ constant
* AttributeName_1 = , < , $\leq$, > , $\geq$ , $\neq$ AttributeName_2
___
##### `SELECT * WHERE CONDITION_1 OR CONDITION_2`
Query:<br>
*Find all RegularUsers with `CurrentCity` = `HomeTown` Or `HomeTown` = Atlanta.*<br>

$\sigma_{\text{CurrentCity=HomeTown OR HomeTown='Atlanta'}}\left(RegularUser\right)$

***Result***:

| Email | BirthYear | Sex | CurrentCity | HomeTown
| --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | Austin | Austin |
| user5@gatech.edu | 1975 | M | Dallas | Atlanta |

The following composite expressions are allowed within the selection operator:
* Expression_1 AND $\left(\wedge\right)$ Expression_2 
* Expression_1 OR $\left(\vee\right)$ Expression_2
* (Expression)
* NOT(Expression)
___
#### Projection $\pi_{\text{A1}, \text{A2}, \ldots, \text{An}}\left(R\right)$
##### `SELECT COL_1, COL_2, COL_n WHERE CONDITION`
Query:<br>
*Find `Email`, `BirthYear`, `Sex` for RegularUsers with `HomeTown` = Atlanta.*<br>

$\pi_{\text{Email}, \text{BirthYear}, \text{Sex}}\left(\sigma_{HomeTown='Atlanta'}\left(RegularUser\right)\right)$

`
SELECT Email, BirthYear, Sex
FROM RegularUser
WHERE HomeTown = 'Atlanta'
`

***Result***:

| Email | BirthYear | Sex
| --- | --- | --- |
| user5@gatech.edu | 1975 | M |
___
#### Relations are sets!

Relations are sets - consider what is returned by the following query.

Query:<br>
*Find `Sex` for RegularUsers with `HomeTown` = Austin.*<br>

$\pi_{\text{Sex}}\left(\sigma_{HomeTown='Austin'}\left(RegularUser\right)\right)$

***Result***:

| Sex |
| --- |
| M |
| F |

Notice how the rows in **RegularUser** start out as a list of tuples or sets.<br>
When we use SQL to return results from the database, our result is a list of tuples or sets.<br>
This is why SQL is referred to as a closed query language, we start with a set and end with a set.
___
#### Union $\cup$

Query:<br>
*Find all cities that are a `CurrentCity` or a `HomeTown` for a RegularUser.*<br>

"or" should tip us off to use a union operation.

$\pi_{\text{CurrentCity}}\left(RegularUser\right) \cup \pi_{\text{HomeTown}}\left(RegularUser\right)$

***Result***:

|  |
| --- |
| Austin |
| Portland |
| Dallas |
| Tucson |
| Atlanta |
___
#### Intersection $\cap$

Query:<br>
*Find all cities that are a `CurrentCity` for someone and a `HomeTown` for some RegularUser.*<br>

"and" should tip us off to use an intersection operation.

$\pi_{\text{CurrentCity}}\left(RegularUser\right) \cap \pi_{\text{HomeTown}}\left(RegularUser\right)$

***Result***:

|  |
| --- |
| Austin |

Only Austin appears in both columns.
___
#### Set Difference \

Query:<br>
*Find all cities that are a `CurrentCity` for someone, but exclude those that are a `HomeTown` for some RegularUser.*<br>

"exclude" should tip us off to use a set difference operation.

$\pi_{\text{CurrentCity}}\left(RegularUser\right) \backslash \: \pi_{\text{HomeTown}}\left(RegularUser\right)$

***Result***:

|  |
| --- |
| Portland |
| Dallas |
___
#### Natural or Inner Join $\Join$

Now let's look at constructor or join operations. We'll start with inner joins.

*Given*:

**RegularUser**

| Email | Year | Sex | CurrentCity | HomeTown
| --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | Austin | Austin |
| user3@gatech.edu | 1982 | F | Portland | Austin |
| user4@gatech.edu | 1968 | M | Dallas | Tucson |
| user5@gatech.edu | 1975 | M | Dallas | Atlanta |

**Major60sEvents**

| Event | Year
| --- | --- |
| USA Lands on Moon | 1969|

Query:<br>
*Find `Email`, `Year`, `Sex` and the `Event` for a RegularUser born during the same year as a Major60sEvent.*<br>

$RegularUser \Join Major60sEvents$

***Result***:

| Email | Year | Sex | Event |
| --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | USA Lands on Moon |

A natural join works by joining the two tables on the column names found to be in common between the two tables. (`Year` in this case)
___
#### Theta Join $\Join_{\Theta}$

Now let's look at joins with conditions.

*Given*:

**RegularUser**

| Email | BirthYear | Sex | CurrentCity | HomeTown
| --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | Austin | Austin |
| user3@gatech.edu | 1982 | F | Portland | Austin |
| user4@gatech.edu | 1968 | M | Dallas | Tucson |
| user5@gatech.edu | 1975 | M | Dallas | Atlanta |

**Major60sEvents**

| Event | EventYear
| --- | --- |
| USA Lands on Moon | 1969|

Query:<br>
*Find `Email`, `BirthYear`, `Sex`, `EventYear` and the `Event` for a RegularUser born before a Major60sEvent.*<br>

$RegularUser \Join_{BirthYear \: < \: EventYear} Major60sEvents$

***Result***:

| Email | BirthYear | Sex | EventYear | Event |
| --- | --- | --- | --- |
| user4@gatech.edu | 1968 | M | 1969 | USA Lands on Moon |

$\Theta$ just means we are allowed to use simple expressions in our join clause.
* All attributes are preserved
* Also an "inner" join
___
#### Left Outer Join $=\Join$

Let's revisit our example from the inner join section, however, this time we would like to preserve all RegularUser records even if they were not born during the same year as a Major60sEvents:

Query:<br>
*Find `Email`, `Year`, `Sex` and the `Event` for a RegularUser born during the same year as a Major60sEvent. Preserve all RegularUser records even if they were not born during during the same year as a Major60sEvents.*<br>

$RegularUser =\Join Major60sEvents$

***Result***:

| Email | BirthYear | Sex | Event |
| --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | USA Lands on Moon |
| user3@gatech.edu | 1982 | F | NaN |
| user4@gatech.edu | 1968 | M | NaN |
| user5@gatech.edu | 1975 | M | NaN |

Notice how we end up with NaNs in our result set? This happens sometimes with left joins.<br>
An outer join is a special case of a theta join.<br>

The mechanics for a right outer join and a full outer join are the same. Only difference is which tables' records we end up preserving in our result set.
___
#### Cartesian Product $\times$

Now let's look at combining all `RegularUser` rows or tuples with all `UserInterests` rows or tuples.

Query:<br>
*Combine all `RegularUser` tuples with all `UserInterests` tuples.*<br>

*Given*:

**RegularUser**

| RUEmail | BirthYear | Sex |
| --- | --- | --- |
| user2@gatech.edu | 1969 | M |
| user3@gatech.edu | 1982 | F |
| user4@gatech.edu | 1968 | M |
| user5@gatech.edu | 1966 | M |
| user6@gatech.edu | 1984 | F |
| user7@gatech.edu | 1963 | M |

**UserInterests**

| UEmail | SinceAge | Interests|
| --- | --- | --- |
| user2@gatech.edu | 10 | Music |
| user3@gatech.edu | 5 | Reading |
| user4@gatech.edu | 14 | Tennis |
| user5@gatech.edu | 11 | Music |
| user6@gatech.edu | 6 | Reading |
| user7@gatech.edu | 18 | Swimming |

***Result***:

| RUEmail | BirthYear | Sex | UEmail | SinceAge | Interests|
| --- | --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | user2@gatech.edu | 10 | Music |
| $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ |
| user2@gatech.edu | 1968 | M | user4@gatech.edu | 14 | Tennis |
| $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ |
| user4@gatech.edu | 1984 | F | user4@gatech.edu | 6 | Reading |
| $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ |
| user2@gatech.edu | 1968 | M | user6@gatech.edu | 14 | Tennis |

All rows and columns are combined - result is typically sparse if misalignment exists in the table column names.

#### Usefulness of Cartesian Product $\times$

Below is a useful business example of the cartesian product.

*Given*:

**RegularUser**

| Email | BirthYear | Sex |
| --- | --- | --- |
| user2@gatech.edu | 1969 | M |
| user3@gatech.edu | 1982 | F |
| user4@gatech.edu | 1968 | M |
| user5@gatech.edu | 1966 | M |
| user6@gatech.edu | 1984 | F |
| user7@gatech.edu | 1963 | M |

**UserInterests**

| Email | SinceAge | Interests|
| --- | --- | --- |
| user2@gatech.edu | 10 | Music |
| user3@gatech.edu | 5 | Reading |
| user4@gatech.edu | 14 | Tennis |
| user5@gatech.edu | 11 | Music |
| user6@gatech.edu | 6 | Reading |
| user7@gatech.edu | 18 | Swimming |


Query:<br>

*In preparation for an email blast, combine all RegularUsers with all UserInterests they are not currently related to.*<br>

$\left(\pi_{\text{Email}}\left(\text{RegularUser}\right)\times\pi_{\text{Interest}}\left(\text{UserInterests}\right) \backslash \: \pi_{\text{Email, Interest}}\left(\text{UserInterests}\right)\right)$
___
#### Divideby $\div$

Same concept as universal quantification in relation calculus:. For example:

$\pi_{\text{Email, Interest}}\left(\text{UserInterests}\right)\div\pi_{\text{Interest}}\left(\sigma_{\text{Email='user1@gatech.edu'}}\left(\text{UserInterests}\right)\right)$

In general:

$R\left(A,B\right) \div S\left(B\right) = \left \{ r.A \: | \: r \, \varepsilon \, R \: \text{and} \: \forall \left(s \, \varepsilon \, S\right) \exists \: \left(t \, \varepsilon \, R\right)\left(t.A=r.A \: \text{and} \: t.B=s.B\right) \right \}$
___
#### Rename $\rho$

Useful for renaming or aliasing columns to control natural joins, theta joins, etc. For example:

$\rho \,_{\text{RUser}\left[\text{Year BirthYear, Gender Sex}\right]}\left(\text{RegularUser}\right)$

Read from right to left:

From

**RegularUser**

| Email | BirthYear | Sex | CurrentCity | HomeTown
| --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | Austin | Austin |
| user3@gatech.edu | 1982 | F | Portland | Austin |
| user4@gatech.edu | 1968 | M | Dallas | Tucson |
| user5@gatech.edu | 1975 | M | Dallas | Atlanta |

To

**RUser**

| Email | Year | Gender | CurrentCity | HomeTown
| --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | Austin | Austin |
| user3@gatech.edu | 1982 | F | Portland | Austin |
| user4@gatech.edu | 1968 | M | Dallas | Tucson |
| user5@gatech.edu | 1975 | M | Dallas | Atlanta |