# Advanced databases

## Like and Similar to, group by, aggregation function, set operations
### dr  inż. Waldemar Bauer

## Like and ILike

1. It allows you to find simple patterns in strings.
2. The Like match draws attention to characters and ILike does not.
3. You can use it in the following forms:
    - ~~ is equivalent to LIKE
    - ~~* is equivalent to ILIKE
    - !~~ is equivalent to NOT LIKE
    - !~~* is equivalent to NOT ILIKE

## Like pattern 

- Percent ( %)  for matching any sequence of characters.
- Underscore ( _)  for matching any single character.

```sql
SELECT
	'xyz' LIKE 'xyz', -- true
	'xyz' LIKE 'x%', -- true
	'xyz' LIKE '_y_', -- true
	'xyz' LIKE 'x_', -- false
    'XYZ' LIKE 'xyz', -- false
    'XYZ' ILIKE 'xyz' -- true
    'xyz' ILIKE 'XYZ' -- true
```

## Example Like

```sql
Select first_name, last_name from actor where first_name ~~ 'B%' and last_name ~~* '%S'
```

| first_name 	| last_name 	|
|:----------:	|:---------:	|
|   "Burt"   	| "Dukakis" 	|
|    "Ben"   	|  "Willis" 	|
|    "Ben"   	|  "Harris" 	|

## Similar to

1. SIMILAR TO operator succeeds only if its pattern matches the entire string; this is unlike common regular expression behavior where the pattern can match any part of the string. 
2. It uses _ and % as wildcard characters denoting any single character and any string.


## Similar to pattern
- | denotes alternation (either of two alternatives).
- \* denotes repetition of the previous item zero or more times.
- \+ denotes repetition of the previous item one or more times.
- ? denotes repetition of the previous item zero or one time.
- {m} denotes repetition of the previous item exactly m times.
- {m,} denotes repetition of the previous item m or more times.
- {m,n} denotes repetition of the previous item at least m and not more than n times.
- Parentheses () can be used to group items into a single logical item.
- A bracket expression [...] specifies a character class, just as in POSIX regular expressions.

## Similar to example of use
```sql
Select
'xyz' SIMILAR TO 'xyz'      --true
'xyz' SIMILAR TO 'x'        --false
'xyz' SIMILAR TO '%(y|a)%'  --true
'xyz' SIMILAR TO '(y|z)%'   --false
```

## Similar to example

```sql
Select first_name, last_name from actor where first_name similar to 'B%' 
and 
last_name similar to '%s'
```
| first_name 	| last_name 	|
|:----------:	|:---------:	|
|   "Burt"   	| "Dukakis" 	|
|    "Ben"   	|  "Willis" 	|
|    "Ben"   	|  "Harris" 	|

## Explain Like and Similar to

Like
```sql
"Seq Scan on actor  (cost=0.00..5.00 rows=1 width=13) (actual time=0.023..0.036 rows=3 loops=1)"
"  Filter: (((first_name)::text ~~ 'B%'::text) AND ((last_name)::text ~~ '%s'::text))"
"  Rows Removed by Filter: 197"
"Planning Time: 0.121 ms"
"Execution Time: 0.044 ms"
```

Similar to:
```sql
"Seq Scan on actor  (cost=0.00..5.00 rows=1 width=13) (actual time=0.304..0.859 rows=3 loops=1)"
"  Filter: (((first_name)::text ~ '^(?:B.*)$'::text) AND ((last_name)::text ~ '^(?:.*s)$'::text))"
"  Rows Removed by Filter: 197"
"Planning Time: 1.621 ms"
"Execution Time: 0.994 ms"
```

## Group by

- Divides the rows returned from the SELECT statement into groups.
- For each group, you can apply an aggregate function

```sql
SELECT 
   column_1, 
   column_2,
   aggregate_function(column_3)
FROM 
   table_name
GROUP BY 
   column_1,
   column_2;
```

##  Group by example

```sql
SELECT
   actor_id
FROM
   film_actor
GROUP BY
   actor_id;
```

| actor_id 	|
|:----:	|
|150|
|140|
|139|
|193|
|12|
|164|
|137|
|...|


##  Group by example

```sql
SELECT
   first_name, last_name, count(title) 
FROM
   actor a join film_actor fa on a.actor_id = fa.actor_id join film f 
   on f.film_id = fa.film_id 
GROUP BY
   first_name, last_name
order by count;
```

| first_name 	|  last_name  	| count 	|
|:----------:	|:-----------:	|:-----:	|
|   "Emily"  	|    "Dee"    	|   14  	|
|   "Julia"  	|  "Fawcett"  	|   15  	|
|   "Judy"   	|    "Dean"   	|   15  	|
|   "Julia"  	| "Zellweger" 	|   16  	|
|   "Adam"   	|   "Grant"   	|   18  	|
|   "Sissy"  	|  "Sobieski" 	|   18  	|
| "Penelope" 	|  "Guiness"  	|   19  	|
|     ...    	|     ...     	|   ..  	|

## Having

- Clause in conjunction with the GROUP BY clause
- Filter group rows that do not satisfy a specified condition

```sql
SELECT
	column_1,
	aggregate_function (column_2)
FROM
	tbl_name
GROUP BY
	column_1
HAVING
	condition;
```

## HAVING example

```sql

SELECT
   actor_id 
FROM
   film_actor
GROUP BY
   actor_id
HAVING
	actor_id< 10 and actor_id > 5  
```
| actor_id 	|
|:----:	|
|6|
|9|
|7|
|8|


## HAVING example

```sql
SELECT
   first_name, last_name, count(title) 
FROM
   actor a join film_actor fa on a.actor_id = fa.actor_id join film f on 
   f.film_id = fa.film_id 
GROUP BY
   first_name, last_name
HAVING 
    count(title) > 40
order by count;
```
 
 | first_name 	|  last_name  	| count 	|
|:----------:	|:-----------:	|:-----:	|
|   "Walter"  	|    "Torn"    	|   41  	|
|   "Gina"  	|  "Degeneres"  	|   42  	|
|   "Susan"   	|    "Davis"   	|   54  	|

 

## Equivalent example
 
```sql
SELECT
   first_name, last_name, count(title) 
FROM
   actor a join film_actor fa on a.actor_id = fa.actor_id join film f on f.film_id = fa.film_id 
GROUP BY
   first_name, last_name
HAVING 
    count > 40
order by count;
```

is equivalent of 

```sql
SELECT tab.first_name, tab.last_name, tab.count  from (SELECT
   first_name, last_name, count(title) 
FROM
   actor a join film_actor fa on a.actor_id = fa.actor_id join film f on f.film_id = fa.film_id 
GROUP BY
   first_name, last_name) as tab
WHERE tab.count > 40
```

## Explain have

```sql
"Sort  (cost=254.04..254.15 rows=43 width=21) (actual time=3.240..3.240 rows=3 loops=1)"
"  Sort Key: (count(f.title))"
"  Sort Method: quicksort  Memory: 25kB"
"  ->  HashAggregate  (cost=251.27..252.87 rows=43 width=21) (actual time=3.223..3.231 rows=3 loops=1)"
"        Group Key: a.first_name, a.last_name"
"        Filter: (count(f.title) > 40)"
"        Rows Removed by Filter: 196"
"        ->  Hash Join  (cost=83.00..196.65 rows=5462 width=28) (actual time=0.291..2.182 rows=5462 loops=1)"
"              Hash Cond: (fa.film_id = f.film_id)"
"              ->  Hash Join  (cost=6.50..105.76 rows=5462 width=15) (actual time=0.060..1.208 rows=5462 loops=1)"
"                    Hash Cond: (fa.actor_id = a.actor_id)"
"                    ->  Seq Scan on film_actor fa  (cost=0.00..84.62 rows=5462 width=4) (actual time=0.008..0.312 rows=5462 loops=1)"
"                    ->  Hash  (cost=4.00..4.00 rows=200 width=17) (actual time=0.046..0.047 rows=200 loops=1)"
"                          Buckets: 1024  Batches: 1  Memory Usage: 18kB"
"                          ->  Seq Scan on actor a  (cost=0.00..4.00 rows=200 width=17) (actual time=0.007..0.021 rows=200 loops=1)"
"              ->  Hash  (cost=64.00..64.00 rows=1000 width=19) (actual time=0.226..0.226 rows=1000 loops=1)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 60kB"
"                    ->  Seq Scan on film f  (cost=0.00..64.00 rows=1000 width=19) (actual time=0.004..0.122 rows=1000 loops=1)"
"Planning Time: 0.311 ms"
"Execution Time: 6.744 ms"
```

## Explain select with subquery

```sql
"HashAggregate  (cost=251.27..252.87 rows=43 width=21) (actual time=11.709..11.737 rows=3 loops=1)"
"  Group Key: a.first_name, a.last_name"
"  Filter: (count(f.title) > 40)"
"  Rows Removed by Filter: 196"
"  ->  Hash Join  (cost=83.00..196.65 rows=5462 width=28) (actual time=1.168..7.836 rows=5462 loops=1)"
"        Hash Cond: (fa.film_id = f.film_id)"
"        ->  Hash Join  (cost=6.50..105.76 rows=5462 width=15) (actual time=0.198..4.220 rows=5462 loops=1)"
"              Hash Cond: (fa.actor_id = a.actor_id)"
"              ->  Seq Scan on film_actor fa  (cost=0.00..84.62 rows=5462 width=4) (actual time=0.026..1.040 rows=5462 loops=1)"
"              ->  Hash  (cost=4.00..4.00 rows=200 width=17) (actual time=0.155..0.156 rows=200 loops=1)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 18kB"
"                    ->  Seq Scan on actor a  (cost=0.00..4.00 rows=200 width=17) (actual time=0.015..0.063 rows=200 loops=1)"
"        ->  Hash  (cost=64.00..64.00 rows=1000 width=19) (actual time=0.952..0.952 rows=1000 loops=1)"
"              Buckets: 1024  Batches: 1  Memory Usage: 60kB"
"              ->  Seq Scan on film f  (cost=0.00..64.00 rows=1000 width=19) (actual time=0.012..0.492 rows=1000 loops=1)"
"Planning Time: 0.989 ms"
"Execution Time: 11.897 ms"
```

## Aggregate Functions

| Aggregate function 	| Description                                                                                                                                           	|
|:------------------:	|:-------------------------------------------------------------------------------------------------------------------------------------------------------	|
|         AVG        	| The AVG() aggregate function calculates the average of non-NULL values in a set.                                                                      	|
|    CHECKSUM_AGG    	| The CHECKSUM_AGG() function calculates a checksum value based on a group of rows.                                                                     	|
|        COUNT       	| The COUNT() aggregate function returns the number of rows in a group, including rows with NULL values.                                                	|
|      COUNT_BIG     	| The COUNT_BIG() aggregate function returns the number of rows (with BIGINT data type) in a group, including rows with NULL values.                    	|
|         MAX        	| The MAX() aggregate function returns the highest value (maximum) in a set of non-NULL values.                                                         	|
|         MIN        	| The MIN() aggregate function returns the lowest value (minimum) in a set of non-NULL values.                                                          	|
|        STDEV       	| The STDEV() function returns the statistical standard deviation of all values provided in the expression based on a sample of the data population.    	|
|       STDEVP       	| The STDEVP() function also returns the standard deviation for all values in the provided expression, but does so based on the entire data population. 	|
|         SUM        	| The SUM() aggregate function returns the summation of all non-NULL values a set.                                                                      	|
|         VAR        	| The VAR() function returns the statistical variance of values in an expression based on a sample of the specified population.                         	|
|        VARP        	| The VARP() function returns the statistical variance of values in an expression but does so based on the entire data population.                      	|

## Exampel of use AVG, MIN, MAX and SUM

```sql
select first_name, last_name, round(avg(length),2), sum(length), min(length), max(length)
from actor a 
inner join  film_actor fa on a.actor_id = fa.actor_id 
inner join  film f on f.film_id = fa.actor_id
group by first_name, last_name
Having max(length) >= 180
order by last_name, first_name;
```

| firs_name 	|   last_name   	|   avg  	|  sum 	| min 	| max 	|
|:---------:	|:-------------:	|:------:	|:----:	|:---:	|:---:	|
|  "Debbie" 	|    "Akroyd"   	| 185.00 	| 4440 	| 185 	| 185 	|
| "Michael" 	|    "Bening"   	| 180.00 	| 4320 	| 180 	| 180 	|
|   "Fred"  	|   "Costner"   	| 180.00 	| 4860 	| 180 	| 180 	|
|   "Cate"  	|    "Harris"   	| 185.00 	| 5180 	| 185 	| 185 	|
| "Natalie" 	|   "Hopkins"   	| 182.00 	| 5824 	| 182 	| 182 	|
|   "Mary"  	|    "Keitel"   	| 184.00 	| 7360 	| 184 	| 184 	|
|   "Cate"  	|   "Mcqueen"   	| 183.00 	| 5490 	| 183 	| 183 	|
|   "Jeff"  	| "Silverstone" 	| 184.00 	| 4600 	| 184 	| 184 	|
| "Cameron" 	|    "Streep"   	| 181.00 	| 4344 	| 181 	| 181 	|

## Any with subquery

- The subquery must return exactly one column.
- The ANY operator must be preceded by one of the following comparison operator =, <=, >, <, > and <>
- The ANY operator returns true if any value of the subquery meets the condition, otherwise, it returns false.



## Any with subquery example

```sql
SELECT title, length, rating
FROM film
WHERE length >= ANY(
    SELECT Count( length )
    FROM film
    INNER JOIN film_category USING(film_id)
    GROUP BY  category_id )
order by length;
```

Subquery result:

| count 	|
|:-----:	|
|   57  	|
|   61  	|
|   60  	|
|   61  	|
|   62  	|
|   63  	|
|   73  	|
|   64  	|
|   58  	|
|   ...  	|




##  Any with subquery example result

|         title         	|  length  	|  rating 	|
|:---------------------:	|:--------:	|:-------:	|
|     "Hall Cassidy"    	|    51    	| "NC-17" 	|
| "Champion Flatliners" 	|    51    	|   "PG"  	|
|     "Deep Crusade"    	|    51    	| "PG-13" 	|
|     "Simon North"     	|    51    	| "NC-17" 	|
|   "English Bulworth"  	|    51    	| "PG-13" 	|
|    "Excitement Eve"   	|    51    	|   "G"   	|
|    "Frisco Forrest"   	|    51    	|   "PG"  	|
|     "Harper Dying"    	|    52    	|   "G"   	|
|       ...       	| ... 	|  ... 	|

## All with subquery
- The ALL operator must be followed by a subquery.
- The ALL operator must be preceded by a comparison operator

ALL operators works:
1. column1 > ALL (subquery) - true if a value is greater than the biggest value returned by the subquery.
1. column1 >= ALL (subquery) - true if a value is greater than or equal to the biggest value returned by the subquery.
1. column1 < ALL (subquery) - true if a value is less than the smallest value returned by the subquery.
1. column1 <= ALL (subquery) - true if a value is less than or equal to the smallest value returned by the subquery.
1. column1 = ALL (subquery) - if a value is equal to any value returned by the subquery.
1. column1 != ALL (subquery) - true if a value is not equal to any value returned by the subquery.

## All example

```sql
SELECT
    title, length
FROM
    film
WHERE length > ALL (
    SELECT AVG(length)
    FROM film GROUP BY rating
    )
ORDER BY
    length;
```

|        title        	| length 	|
|:-------------------:	|:------:	|
|  "Dangerous Uptown" 	|   121  	|
|   "Boogie Amelie"   	|   121  	|
|    "Harry Idaho"    	|   121  	|
| "Brannigan Sunrise" 	|   121  	|
|    "Pure Runner"    	|   121  	|
|    "Arizona Bang"   	|   121  	|
|   "Paris Weekend"   	|   121  	|
|         ...         	|   ...  	|


## Exist

- If the subquery returns at least one row, the result is true. 
- In othere case result is false.
- EXISTS is often used with the correlated subquery.

```sql
SELECT 
    title 
FROM 
    film
WHERE EXISTS( SELECT category_id 
     FROM 
       category 
     WHERE 
       name = 'Comedy');
```

**Returns all titles. Why?** 

## SQL set operation
- Union
- Intersect
- Except

General rules:
- Both queries must return the same number of columns.
- The corresponding columns in the queries must have compatible data types.

## Union


```sql
(select  name, title 
 from film f 
 join film_category fa using(film_id) 
 join category c using (category_id) 
 where name = 'Comedy' order by title limit 5)
Union
(select  name, title from film f 
 join film_category fa using(film_id) 
 join category c using (category_id) 
 where name = 'Animation' order by title limit 5)
order by title
```

## Union example result

|   category  	|         title          	|
|:-----------:	|:----------------------:	|
|   "Comedy"  	|    "Airplane Sierra"   	|
| "Animation" 	|     "Alter Victory"    	|
| "Animation" 	| "Anaconda Confessions" 	|
|   "Comedy"  	|      "Anthem Luke"     	|
| "Animation" 	|    "Argonauts Town"    	|
| "Animation" 	|   "Bikini Borrowers"   	|
| "Animation" 	|   "Blackout Private"   	|
|   "Comedy"  	|  "Bringing Hysterical" 	|
|   "Comedy"  	|     "Caper Motions"    	|
|   "Comedy"  	|     "Cat Coneheads"    	|

## Intersect
```sql
(select  name, title 
 from film f 
 join film_category fa using(film_id) 
 join category c using (category_id) 
 where name = 'Comedy' order by title limit 5)
Intersect
(select  name, title from film f 
 join film_category fa using(film_id) 
 join category c using (category_id) 
 where name = 'Animation' order by title limit 5)
order by title
```
|   category  	|         title          	|
|:-----------:	|:----------------------:	|


## Except
```sql
(select  name, title 
 from film f 
 join film_category fa using(film_id) 
 join category c using (category_id) 
 where name = 'Comedy' order by title limit 5)
Except
(select  name, title from film f 
 join film_category fa using(film_id) 
 join category c using (category_id) 
 where name = 'Animation' order by title limit 5)
order by title
```

| category 	|         title         	|
|:--------:	|:---------------------:	|
| "Comedy" 	|   "Airplane Sierra"   	|
| "Comedy" 	|     "Anthem Luke"     	|
| "Comedy" 	| "Bringing Hysterical" 	|
| "Comedy" 	|    "Caper Motions"    	|
| "Comedy" 	|    "Cat Coneheads"    	|

## Grouping sets

- Define multiple grouping sets in the same query.
- Query generated a single result set with the aggregates for all grouping sets.

```sql 
SELECT name, title, round(avg(length),2), SUM (rental_duration)
FROM film join film_category using (film_id) join category using (category_id)
GROUP BY
	GROUPING SETS (
		(name, title), -- group by name, title
		(name),        -- or group by name
		(title)        -- or group by title
	)
ORDER BY name, title;
```

## Example results

| category 	|         title         	|  round 	| sum 	|
|:--------:	|:---------------------:	|:------:	|:---:	|
| "Action" 	|     "Amadeus Holy"    	| 113.00 	|  6  	|
| "Action" 	|   "American Circus"   	| 129.00 	|  3  	|
| "Action" 	|  "Antitrust Tomatoes" 	| 168.00 	|  5  	|
| "Action" 	|    "Ark Ridgemont"    	|  68.00 	|  6  	|
| "Action" 	| "Barefoot Manchurian" 	| 129.00 	|  6  	|
| "Action" 	|     "Berets Agent"    	|  77.00 	|  5  	|
|    ...   	|          ...          	|   ...  	| ... 	|
| "Action" 	|          null         	|   111.6  	|  317 	|
|    ...   	|          ...          	|   ...  	| ... 	|
|   null   	|     "Zhivago Core"    	| 105.00 	|  6  	|
|    ...   	|          ...          	|   ...  	| ... 	|

## Cube

- Generate multiple grouping sets.
- Generate all posible grouping sets. 

```sql
SELECT
    c1,
    c2,
    c3,
    aggregate (c4)
FROM
    table_name
GROUP BY
    CUBE (c1, c2, c3);
```

Explain:
```sql
CUBE(c1,c2,c3) <=> GROUPING SETS (
    
    (c1,c2,c3), 
    (c1,c2),
    (c1,c3),
    (c2,c3),
    (c1),
    (c2),
    (c3), 
    ()
 ) 
```

## Cube example

```sql 
SELECT name, title, round(avg(length),2), SUM (rental_duration)
FROM
	film join film_category using (film_id) join category using (category_id)
GROUP BY
	CUBE  (name, title)
ORDER BY
	name,
	title;
```

## Example results

| category 	|         title         	|  round 	| sum 	|
|:--------:	|:---------------------:	|:------:	|:---:	|
| "Action" 	|     "Amadeus Holy"    	| 113.00 	|  6  	|
| "Action" 	|   "American Circus"   	| 129.00 	|  3  	|
| "Action" 	|  "Antitrust Tomatoes" 	| 168.00 	|  5  	|
| "Action" 	|    "Ark Ridgemont"    	|  68.00 	|  6  	|
| "Action" 	| "Barefoot Manchurian" 	| 129.00 	|  6  	|
| "Action" 	|     "Berets Agent"    	|  77.00 	|  5  	|
|    ...   	|          ...          	|   ...  	| ... 	|
| "Action" 	|          null         	|   111.6  	|  317 	|
|    ...   	|          ...          	|   ...  	| ... 	|
|   null   	|     "Zhivago Core"    	| 105.00 	|  6  	|
|    ...   	|          ...          	|   ...  	| ... 	|
|   null    |          null             |  115.27   | 4985  |

## Roll up

```sql
SELECT
    c1,
    c2,
    c3,
    aggregate (c4)
FROM
    table_name
GROUP BY
    ROLLUP (c1, c2, c3);
```

Explain:
```sql
ROLLUP(c1,c2,c3) <=> GROUPING SETS (
    
    (c1, c2, c3)
    (c1, c2)
    (c1)
    ()
 ) 
```

## ROLLUP example

```sql 
SELECT name, title, round(avg(length),2),
	SUM (rental_duration)
FROM
	film join film_category using (film_id) join category using (category_id)
GROUP BY
	ROLLUP  (name, title)
ORDER BY
	name,
	title;
```

## Example results

| category 	|         title         	|  round 	| sum 	|
|:--------:	|:---------------------:	|:------:	|:---:	|
| "Action" 	|     "Amadeus Holy"    	| 113.00 	|  6  	|
| "Action" 	|   "American Circus"   	| 129.00 	|  3  	|
| "Action" 	|  "Antitrust Tomatoes" 	| 168.00 	|  5  	|
| "Action" 	|    "Ark Ridgemont"    	|  68.00 	|  6  	|
| "Action" 	| "Barefoot Manchurian" 	| 129.00 	|  6  	|
| "Action" 	|     "Berets Agent"    	|  77.00 	|  5  	|
|    ...   	|          ...          	|   ...  	| ... 	|
| "Action" 	|          null         	|   111.6  	|  317 	|
|    ...   	|          ...          	|   ...  	| ... 	|
|   null    |          null             |  115.27   | 4985  |