# Lesson 2.2: Operators & Functions in SQL

### Lesson Duration: 3 hours

> Purpose: The purpose of this lesson is to learn how to use different operators and functions available in SQL to filter data and extract relevant information from the database.

---

### Setup

To start this lesson, students should have:

- Completed lesson 2.1
- All previous Setup

### Learning Objectives

After this lesson, students will be able to:

- Interpret arithmetic, comparison, and logical operators
- Use different pre-built functions (numeric and string) for data processing
- Use different operators and functions together to extract relevant information from the database

---

### Lesson 1 key concepts

> :clock10: 20 min

> [**Slides**](https://docs.google.com/presentation/d/1EqeLKQDowPYbtfyG9PnlLtcqKi_1TbukUZDimAKm7qo/edit?usp=sharing)

- Simple Queries with the `WHERE` clause
- _OLTP_ vs. _OLAP_
- Database vs. Data Warehouse vs. Data Mart vs. Data Lakes

<details>
  <summary> Click for Code Sample </summary>

:exclamation: Note for instructor: Keep working on `bank` database.

```sql
select * from bank.order
where amount > 1000;

select * from bank.order
where k_symbol = 'SIPO';

select order_id, account_id, bank_to, amount from bank.order
where k_symbol = 'SIPO';

select order_id as 'OrderID', account_id as 'AccountID', bank_to as 'DestinationBank', amount  as 'Amount'
from bank.order
where k_symbol = 'SIPO';

-- limiting results in the above query
select order_id as 'OrderID', account_id as 'AccountID', bank_to as 'DestinationBank', amount  as 'Amount'
from bank.order
where k_symbol = 'SIPO'
limit 100;
```

# 2.02 Activity 1

Keep working on the `bank` database. (_In case you need to load data again, refer to `files_for_lab` to get the database._)

#### Simple queries

1. Select districts and salaries (from the `district` table) where salary is greater than 10000. Suggestion is to use the case study extended here (you have it already after getting the 2.01 activity) to work out which columns are required here. Return columns as `district_name` and `average_salary`.
2. Select those loans whose contract finished and were not paid.

   **Hint**: You should be looking at the `loan` column and you will need the extended case study information to tell you which value of status is required.

3. Select cards of type `junior`. Return just first 10 in your query.

### Lesson 2 key concepts

> :clock10: 20 min

- Arithmetic operators in SQL (add (`+`), subtract (`-`), multiply (`*`), divide (`/`), modulo (`%`))
- Comparison operators in SQL (equal to (`=`), greater than (`>`), less than (`<`), greater than equal to (`>=`), less than equal to (`<=`), not equal to (`<>`))
- Limiting results in SQL (similar to `head()` in data frames)

<details>
  <summary>Click for code: Arithmetic Operators</summary>

```sql
select *, amount-payments as balance
from bank.loan;

select loan_id, account_id, date, duration, status, amount-payments as balance
from bank.loan;

select loan_id, account_id, date, duration, status, (amount-payments)/1000 as 'balance in Thousands'
from bank.loan;

-- this is the modulus operator that gives the remainder. This is a dummy example:
select duration%2
from bank.loan;

select 10%3;
```

</details>

<details>
  <summary>Click for code: Comparison Operators</summary>

> These comparison operators are used with the `WHERE` clause, for filtering data:

```sql
select * from bank.loan
where status = 'B';
-- In this case status B is for those clients where the contract has finished but the loan is not paid yet

select * from bank.loan
where status in ('B','b');

select * from bank.loan
where status in ('B','b') and amount > 100000;
```

</details>

<details>
  <summary>Click for code: Limiting Results</summary>

```sql
select * from bank.loan
limit 10;

-- to get the bottom rows of a table, there is no predefined function
-- but you can sort the results in descending order and then use the LIMIT function
select * from bank.account
order by account_id desc
limit 10;
-- In this case, we were able to do it because the data was arranged
-- in ascending order of the account_id
```

# 2.02 Activity 2

#### Simple queries

1. Select those loans whose contract finished and not paid back. Return the debt value from the status you identified in the last activity, together with `loan_id` and `account_id`.
2. Calculate the urban population for all districts.

   **Hint**: You are looking for the number of inhabitants and the % of urban inhabitants in each district. Return columns as **district_name** and **urban_population**.

3. On the previous query result - re-run it but filtering out districts where the rural population is greater than 50%.

### Lesson 3 key concepts

> :clock10: 20 min

- Using multiple conditions with the `WHERE` clause
- Using logical operators

<details>
  <summary>Click for code: Logical Operators</summary>

```sql
-- two conditions applied on the table
select *
from bank.loan
where status = 'B' and amount > 100000;

-- we can have as many conditions as we need
select *
from bank.loan
where status = 'B' and amount > 100000 and duration <= 24;

--
select *
from bank.loan
where status = 'B' or status = 'D';
-- Status B and D are the clients that were bad for business for the bank

select *
from bank.loan
where (status = 'B' or status ='D') and amount > 200000;

-- logical NOT operator - it negates the boolean expression that we are evaluating
select *
from bank.order
where not k_symbol = 'SIPO';

select *
from bank.order
where not k_symbol = 'SIPO' and not amount < 1000;
```

# 2.02 Activity 3

#### Simple queries

1. Get all `junior` cards issued last year.

   **Hint**: Use the numeric value (980000).

2. Get the first 10 transactions for withdrawals that are not in cash. You will need the extended case study information to tell you which values are required here, and you will need to refer to conditions on two columns.
3. Refine your query from last activity on loans whose contract finished and not paid back - filtered to loans where they were left with a debt bigger than 1000. Return the debt value together with `loan_id` and `account_id`. Sort by the highest debt value to the lowest.

### Lesson 4 key concepts

> :clock10: 20 min

- Using numeric functions
- Using string functions

<details>
  <summary> Click for Code: Numeric Functions </summary>

```sql
select order_id, round(amount/1000,2)
from bank.order;

-- checking the number of rows in the table, both methods give the same result
-- given that there are no nulls in the column in the second case:
select count(*) from bank.order;

select count(order_id) from bank.order;

select max(amount) from bank.order;
select min(amount) from bank.order;

select floor(avg(amount)) from bank.order;
select ceiling(avg(amount)) from bank.order;
```

> There are other numeric functions including `acos()`, `asin()`, `atan()`, `log()`, `log10()`, `power()`, and `sqrt()`.

</details>

<details>
  <summary> Click for Code: String Functions</summary>

```sql
select length('himanshu');
select *, length(k_symbol) as 'Symbol_length' from bank.order;
select *, concat(order_id, account_id) as 'concat' from bank.order;

-- formats the number to a form with commas,
-- 2 is the number of decimal places, converts numeric to string as well
select *, format(amount, 2) from bank.loan;

select *, lower(A2), upper(A3) from bank.district;
-- It is interesting to note that select lower(A2), upper(A3), * from bank.district; doesn't work

select A2, left(A2,5), A3, ltrim(A3) from bank.district;
-- Similar to ltrim() there is rtrim() and trim(). And similar to left() there is right()
```

- More string functions can be found [here](https://www.w3resource.com/slides/mysql-string-functions.php) or [here](https://www.w3schools.com/sql/sql_ref_sqlserver.asp).

</details>

<details>
  <summary> Click for Code: String Functions</summary>

```sql
-- Splitting strings using substring_index

select substring_index(issued, ' ', 1) from bank.card;
```

:exclamation: Note:  The idea behind the last query is to select the date part from the column 'issued' in the `card` table. Even though it looks like data is in DateTime format, but it is actually a string. We will use this later to convert the extracted _date_ which will be in the _string_ format to the _date_ format.

# 2.02 Activity 4

#### Simple queries

1. Get the biggest and the smallest transaction with non-zero values in the database (use the `trans` table in the `bank` database).
2. Get account information with an extra column year showing the opening year as 'YY'. Eg., 1995 will show as 95.

   **Hint**: Look at the first two characters of the string date in the `account` table.

# Lab | SQL Queries 2

In this lab, you will be using the [Sakila](https://dev.mysql.com/doc/sakila/en/) database of movie rentals. You can follow the steps listed here to get the data locally: [Sakila sample database - installation](https://dev.mysql.com/doc/sakila/en/sakila-installation.html).

The database is structured as follows:
![DB schema](https://education-team-2020.s3-eu-west-1.amazonaws.com/data-analytics/database-sakila-schema.png)

<br><br>

### Instructions

1. Select all the actors with the first name ‘Scarlett’.
2. Select all the actors with the last name ‘Johansson’.
3. How many films (movies) are available for rent?
4. How many films have been rented?
5. What is the shortest and longest rental period?
6. What are the shortest and longest movie duration? Name the values `max_duration` and `min_duration`.
7. What's the average movie duration?
8. What's the average movie duration expressed in format (hours, minutes)?
9. How many movies are longer than 3 hours?
10. Get the name and email formatted. Example: Mary SMITH - *mary.smith@sakilacustomer.org*.
11. What's the length of the longest film title?

### Additional Resources

- [Data Warehouse schemas, Facts vs. dimensions table](http://gkmc.utah.edu/ebis_class/2003s/Oracle/DMB26/A73318/schemas.htm)
- [Data Warehouses vs. Data Marts](https://www.talend.com/resources/what-is-data-mart/)