# Spring 2019 | CS 6400

## Relational Calculus
___

Relational algebra is procedural in nature and operator based: *i.e. we describe step-by-step what to do to get to the end result.*<br>
***Relational calculus*** is *declarative* in nature *i.e. we simply describe the end result we want.*<br>
Relational algebra and relational calculus can be shown to be equivalent with respect to data retrieval horsepower.<br>
SQL is mostly based on relational tuple calculus.
___

### Relational Calculus Expressions $\{t \, | \, P(t)\}$

Second formalism we can use to express queries on a relational database is called relational tuple calculus.
* "Tuple" calculus because queries have variables that range over sets of tuples
* $\{t \, | \, P(t)\}$ - *find the set of tuples t that satisfy the predicate P*

Predicates are built from atoms:

**range expression**: $t \, \varepsilon \, R \text{ and } R(t) \text{ denote that } t \text{ is a tuple of relation } R$<br>
**attribute value**: t.A denotes the value of t on attribute $A$<br>
**constant**: $c$ denotes a constant<br>
**comparison operators $\Theta$**: $=, \neq, \leq, \geq, <, >$<br>
**atoms**: $t \, \varepsilon \, R$, r.A $\Theta$ s.B, or r.A $\Theta$ c

**predicate**: an atom is a predicate.
* if P$_{1}$ and P$_{2}$ are predicates, so are
    * $\left(P_{1}\right)$
    * **not**$\left(P_{1}\right)$
    * P$_{1}$ **or** P$_{2}$
    * P$_{1}$ **and** P$_{2}$
    * P$_{1}\Rightarrow$ P$_{2}$
    
if $P(t)$ is a predicate, $t$ is a free variable in $P$, and $R$ is a relation then
$$
\exists \left(t \, \varepsilon \, R\right)\left(P(t)\right) \text{ and } \forall \left(t \, \varepsilon \, R\right)\left(P(t)\right) \text{ are predicates. }
$$

*If P(t) is a predicate and t is a free variable in P and R is a relation, then there exists a t in R that satisfies P as a predicate.*<br>
*Likewise, for all t and R, for which P is true, is also a predicate*<br>
*If t is free in the predicate, P, then using the **existential or universal quantifier**, binds t.*
___
### Selection, Projection in Relational Tuple Calculus

#### Selection-CALC

Let's look at how to do Selection in Relational Tuple Calculus.

*Given*:

**RegularUser**

| Email | BirthYear | Sex | CurrentCity | HomeTown
| --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | Austin | Austin |
| user3@gatech.edu | 1982 | F | Portland | Austin |
| user4@gatech.edu | 1975 | M | Dallas | Tucson |
| user5@gatech.edu | 1975 | M | Dallas | Atlanta |
___
##### `SELECT *`
Query:<br>
*Find all RegularUsers.*<br>

Calculus:<br>
$\{r \: | \: r \,\epsilon \, \text{RegularUser}\}$

***Result***:

| Email | BirthYear | Sex | CurrentCity | HomeTown
| --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | Austin | Austin |
| user3@gatech.edu | 1982 | F | Portland | Austin |
| user4@gatech.edu | 1975 | M | Dallas | Tucson |
| user5@gatech.edu | 1975 | M | Dallas | Atlanta |
___
##### `SELECT * WHERE CONDITION_1 OR CONDITION_2`
Query:<br>
*Find all RegularUsers with `CurrentCity` = `HomeTown` Or `HomeTown` = Atlanta.*<br>

Calculus:<br>
$\{r \: | \: r \,\epsilon \, \text{RegularUser} \text{ and (r.CurrentCity=r.HomeTown or r.HomeTown='Atlanta')\}}$

***Result***:

| Email | BirthYear | Sex | CurrentCity | HomeTown
| --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | Austin | Austin |
| user5@gatech.edu | 1975 | M | Dallas | Atlanta |
___
#### Projection-CALC
Query:<br>
*Find `Email`, `BirthYear`, `Sex` for RegularUsers with `HomeTown` = Atlanta.*<br>

Calculus:<br>
$\{\text{r.Email, r.BirthYear, r.Sex} \: | \: r \,\epsilon \, \text{RegularUser} \text{ and (r.HomeTown='Atlanta')\}}$<br>

`
SELECT Email, BirthYear, Sex
FROM RegularUser
WHERE HomeTown = 'Atlanta'
`

***Result***:

| Email | BirthYear | Sex
| --- | --- | --- |
| user5@gatech.edu | 1975 | M |
___
#### Union-CALC

Query:<br>
*Find all cities that are a `CurrentCity` or a `HomeTown` for a RegularUser.*<br>

"or" should tip us off to use a union operation.

Calculus:<br>
$\{s.City \: | \: \exists \text{(r } \epsilon \text{ RegularUser)(s.City=r.CurrentCity) or }\exists \text{(t }\epsilon \text{ RegularUser)(s.City=t.HomeTown)} \}$

***Result***:

|  |
| --- |
| Austin |
| Portland |
| Dallas |
| Tucson |
| Atlanta |
___
#### Intersection-CALC

Query:<br>
*Find all cities that are a `CurrentCity` for someone and a `HomeTown` for some RegularUser.*<br>

"and" should tip us off to use an intersection operation.

Calculus:<br>
$\{s.City \: | \: \exists \text{(r } \epsilon \text{ RegularUser)(s.City=r.CurrentCity) and } \exists \text{(t }\epsilon \text{ RegularUser)(s.City=t.HomeTown)}\}$

***Result***:

|  |
| --- |
| Austin |

Only Austin appears in both columns.
___
#### Set Difference-CALC

Query:<br>
*Find all cities that are a `CurrentCity` for someone, but exclude those that are a `HomeTown` for some RegularUser.*<br>

"exclude" should tip us off to use a set difference operation.

Calculus:<br>
$\{s.City \: | \: \exists \text{(r } \epsilon \text{ RegularUser)(s.City=r.CurrentCity) and not(} \exists \text{(t }\epsilon \text{ RegularUser)(s.City=t.HomeTown))}\}$

***Result***:

|  |
| --- |
| Portland |
| Dallas |
___
#### Natural or Inner Join-CALC

Now let's look at constructor or join operations. We'll start with inner joins.

*Given*:

**RegularUser**

| Email | Year | Sex | CurrentCity | HomeTown
| --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | Austin | Austin |
| user3@gatech.edu | 1982 | F | Portland | Austin |
| user4@gatech.edu | 1968 | M | Dallas | Tucson |
| user5@gatech.edu | 1975 | M | Dallas | Atlanta |

**Major60sEvents**

| Event | Year
| --- | --- |
| JFK Assassination | 1963|
| USA Lands on Moon | 1969|

Query:<br>
*Find `Email`, `Year`, `Sex` and the `Event` for a RegularUser born during the same year as a Major60sEvent.*<br>

Calculus:<br>
$\{\text{t.Email, t.Year, t.Sex, t.Event | }\exists \: \text{(r }\epsilon \text{ RegularUser) } \exists \text{(s } \epsilon \text{Major60sEvents)(r.Year=s.Year and t.Email=r.Email and t.Year=r.Year and t.Sex=r.Sex and t.Event=s.Event)}\}$

In the calculus above, t is resultant tuple set.<br>
In the calculus above, r is `RegularUser` tuple set.<br>
In the calculus above, s is `Major60sEvents` tuple set.<br>

***Result***:

| Email | Year | Sex | Event |
| --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | USA Lands on Moon |

A natural join works by joining the two tables on the column names found to be in common between the two tables. (`Year` in this case)
___
#### Cartesian Product-CALC

Now let's look at combining all `RegularUser` rows or tuples with all `UserInterests` rows or tuples.

Query:<br>
*Combine all `RegularUser` tuples with all `UserInterests` tuples.*<br>

Calculus:<br>
$\{\text{r,s | r } \epsilon \text{ RegularUser and s }\epsilon \text{ UserInterests}\}$

*Given*:

**RegularUser**

| RUEmail | BirthYear | Sex |
| --- | --- | --- |
| user2@gatech.edu | 1969 | M |
| user3@gatech.edu | 1982 | F |
| user4@gatech.edu | 1968 | M |
| user5@gatech.edu | 1966 | M |
| user6@gatech.edu | 1984 | F |
| user7@gatech.edu | 1963 | M |

**UserInterests**

| UEmail | SinceAge | Interests|
| --- | --- | --- |
| user2@gatech.edu | 10 | Music |
| user3@gatech.edu | 5 | Reading |
| user4@gatech.edu | 14 | Tennis |
| user5@gatech.edu | 11 | Music |
| user6@gatech.edu | 6 | Reading |
| user7@gatech.edu | 18 | Swimming |

***Result***:

| RUEmail | BirthYear | Sex | UEmail | SinceAge | Interests|
| --- | --- | --- | --- | --- | --- |
| user2@gatech.edu | 1969 | M | user2@gatech.edu | 10 | Music |
| $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ |
| user2@gatech.edu | 1968 | M | user4@gatech.edu | 14 | Tennis |
| $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ |
| user4@gatech.edu | 1984 | F | user4@gatech.edu | 6 | Reading |
| $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ | $\cdots$ |
| user2@gatech.edu | 1968 | M | user6@gatech.edu | 14 | Tennis |

All rows and columns are combined - result is typically sparse if misalignment exists in the table column names.
___
#### Usefulness of Cartesian Product-CALC

Below is a useful business example of the cartesian product.

*Given*:

**RegularUser**

| Email |
| --- |
| user1@gatech.edu |
| user2@gatech.edu |

**UserInterests**

| Email | Interests|
| --- | --- |
| user1@gatech.edu | Music |
| user2@gatech.edu | Reading |
| user2@gatech.edu | Tennis |
| user3@gatech.edu | Music |


Query:<br>

*In preparation for an email blast, combine all RegularUsers with all UserInterests they are not currently related to.*<br>

Calculus:<br>
$\{\text{r.Email, s.Interest | r } \epsilon \text{ RegularUser and s } \epsilon \text{ UserInterests and not(}\exists\text{ (t } \epsilon \text{ UserInterests)(r.Email=t.Email and s.Interest=t.Interest))}\}$

***Result***:

| Email | Interests|
| --- | --- |
| user1@gatech.edu | Tennis |
| user1@gatech.edu | Reading |
| user2@gatech.edu | Music |
___
#### Divideby-CALC

Let's look at relational tuple calculus example of the $\div$ operator.

*Given*:

| Email | SinceAge | Interests|
| --- | --- | --- |
| user1@gatech.edu | 10 | Music |
| user1@gatech.edu | 5 | Reading |
| user1@gatech.edu | 14 | Tennis |
| user2@gatech.edu | 1 | Swimming |
| user2@gatech.edu | 12 | Tennis |
| user3@gatech.edu | 15 | Swimming |
| user3@gatech.edu | 9 | Tennis |
| user3@gatech.edu | 11 | Music |
| user3@gatech.edu | 6 | Reading |
| user4@gatech.edu | 18 | DIY |
| user4@gatech.edu | 18 | Music |
| user4@gatech.edu | 18 | Reading |

Query:<br>

*Find *`Email`*of all users with at least all the *`Interests` of 'user1@gatech.edu'<br>

Calculus:<br>
$\{\text{r.Email | r } \epsilon \text{ UserInterests and } \forall \text{ s } \epsilon \text{ UserInterests)((s.Email } \neq \text{ 'User1' or } \exists \text{ (t } \epsilon \text{ UserInterests)(r.Email=t.Email and t.Interest=s.Interest))}\}$

***Result***:

| Email |
| --- |
| user1@gatech.edu |
| user3@gatech.edu |