# Toy employees database (R/MySQL)

In [1]:
# Libraries
library(tidyverse)
library(odbc)
library(DBI)

"package 'ggplot2' was built under R version 4.5.2"
── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 4.0.1     [32m✔[39m [34mtibble   [39m 3.3.0
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.1.0     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
"package 'odbc' was built under R version 4.5.2"
"package 'DBI' was built under R version 4.5.2"


In this notebook, we query some toy schemas provided by IBM. The schema has already been created. All therefore need to do is connect to the MySQL database.

We

- connect to an existing MySQL database
- solve some sorting and grouping problems in IBM's employees schema
- solve some queries involving function in IBM's pet rescue schema

### Cheat sheet

Order of SQL execution:

| order | key word | desc |
| ----- | -------- | ---- |
| 1. | `FROM` | get the table |
| 2. | `WHERE` | filter individual rows |
| 3. | `GROUP BY` | group rows |
| 4. | `HAVING` | filter groups |
| 5. | `SELECT` | choose columns |
| 6. | `ORDER BY` | sort results |

## Check R-MySQL data type conversion

In [2]:
# Data types
menagerie <- c(1, 1L, "1", TRUE, list(raw(1)))

for (animal in menagerie) {
    animal_type <- typeof(animal)

    tryCatch({
        print(
            paste(
                animal_type, ":",
                #dbDataType(RSQLite::SQLite(), animal)
                dbDataType(RMariaDB::MariaDB(), animal)
                #dbDataType(RMySQL::MySQL(), animal)
                #dbDataType(RPostgres::Postgres(), animal)
            )
        )
    }, error = function(e) {
        warning(paste(animal_type, ": unsupported type"))
    })
}

[1] "double : DOUBLE"
[1] "integer : INTEGER"
[1] "character : VARCHAR(1)"
[1] "logical : TINYINT"


"raw : unsupported type"


## Connection

In [3]:
# List available drivers
odbcListDrivers() |> 
    tibble() |>
    filter(str_detect(name, "SQL"))

name,attribute,value
<chr>,<chr>,<chr>
SQL Server,APILevel,2
SQL Server,ConnectFunctions,YYY
SQL Server,CPTimeout,60
SQL Server,DriverODBCVer,03.50
SQL Server,FileUsage,0
SQL Server,SQLLevel,1
SQL Server,UsageCount,1
MySQL ODBC 9.6 ANSI Driver,UsageCount,1
MySQL ODBC 9.6 Unicode Driver,UsageCount,1
PostgreSQL ANSI(x64),UsageCount,1


In [4]:

# Establish connection
c <- dbConnect(
	drv = odbc(),
	driver = "MySQL ODBC 9.6 Unicode Driver", # MySQL, PostgreSQL, SQLite3
	database = "mysql", # MySQL mysql, PostgreSQL postgres, SQLite :memory:
	server = "localhost",
	uid = "r_user",
	pwd = "sql_r",
	port = 3306 # MySQL 3306, PostgreSQL 5432, SQLite 5432, SQL Server 1433
)

In [5]:
# Display databases
#c |> dbGetQuery("SHOW DATABASES;")

## String patterns, sorting, grouping

In this set of problems, we're querying a synthetic schema of tables emulating a human resources database. After surveying the schema, we solve some basic filtering and implicit join queries.

In [36]:
# Select database
c |> dbGetQuery("USE ibm_employees_02;")

In [7]:
# Display available tables
q <- "
SHOW TABLES;
"
dbGetQuery(c, q)

Tables_in_ibm_employees_02
<chr>
departments
employees
job_history
jobs
locations


In [8]:
# List the first name, last name, and birth date of all employees who were born before January 1, 1980
q <- "
SELECT f_name, l_name, b_date FROM employees
WHERE YEAR(b_date) < 1980;
"
dbGetQuery(c, q)

f_name,l_name,b_date
<chr>,<chr>,<date>
John,Thomas,1976-09-01
Alice,James,1972-07-31
Nancy,Allen,1978-06-02
Mary,Thomas,1975-05-05


In [9]:
# Retrieve the first name, last name, and birth date of all employees who were born between January 1, 1980 and December 31, 1989 (inclusive)
q <- "
SELECT f_name, l_name, b_date FROM employees
WHERE b_date BETWEEN DATE('1980-01-01') AND DATE('1989-12-31');
"
dbGetQuery(c, q)

f_name,l_name,b_date
<chr>,<chr>,<date>
Steve,Wells,1980-10-08
Santosh,Kumar,1985-07-20
Ahmed,Hussain,1981-04-01
Bharath,Gupta,1985-06-05
Ann,Jacob,1982-03-30


In [10]:
# Find the first and last names of all employees whose first name ends with the letter y
q <- "
SELECT f_name, l_name FROM employees
WHERE f_name LIKE '%y';
"
dbGetQuery(c, q)

f_name,l_name
<chr>,<chr>
Nancy,Allen
Mary,Thomas


In [11]:
# Display the last name and department ID of all employees who work in department 2 or department 7
q <- "
SELECT l_name, dep_id FROM employees
WHERE dep_id IN (2, 7);
"
dbGetQuery(c, q)

l_name,dep_id
<chr>,<chr>
Thomas,2
Hussain,2
Allen,2
Thomas,7
Gupta,7
Jones,7


In [12]:
# Return all columns for all employees, ordered by birth date from oldest to youngest
q <- "
SELECT * FROM employees
ORDER BY b_date DESC;
"
dbGetQuery(c, q)

EMP_ID,F_NAME,L_NAME,SSN,B_DATE,SEX,ADDRESS,JOB_ID,SALARY,MANAGER_ID,DEP_ID
<chr>,<chr>,<chr>,<chr>,<date>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>
E1009,Andrea,Jones,123414,1990-09-07,F,"120 Fall Creek, Gary,IL",234,70000,30003,7
E1004,Santosh,Kumar,123459,1985-07-20,M,"511 Aurora Av, Aurora,IL",400,60000,30002,5
E1008,Bharath,Gupta,123413,1985-06-05,M,"145 Berry Ln, Naperville,IL",660,65000,30003,7
E1010,Ann,Jacob,123415,1982-03-30,F,"111 Britany Springs,Elgin,IL",220,70000,30002,5
E1005,Ahmed,Hussain,123410,1981-04-01,M,"216 Oak Tree, Geneva,IL",500,70000,30001,2
E1003,Steve,Wells,123458,1980-10-08,M,"291 Springs, Gary,IL",300,50000,30002,5
E1006,Nancy,Allen,123411,1978-06-02,F,"111 Green Pl, Elgin,IL",600,90000,30001,2
E1001,John,Thomas,123456,1976-09-01,M,"5631 Rice, OakPark,IL",100,100000,30001,2
E1007,Mary,Thomas,123412,1975-05-05,F,"100 Rose Pl, Gary,IL",650,65000,30003,7
E1002,Alice,James,123457,1972-07-31,F,"980 Berry ln, Elgin,IL",200,80000,30002,5


In [13]:
# Return all employee records ordered by salary from highest to lowest
q <- "
SELECT * FROM employees
ORDER BY salary DESC;
"
dbGetQuery(c, q)

EMP_ID,F_NAME,L_NAME,SSN,B_DATE,SEX,ADDRESS,JOB_ID,SALARY,MANAGER_ID,DEP_ID
<chr>,<chr>,<chr>,<chr>,<date>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>
E1001,John,Thomas,123456,1976-09-01,M,"5631 Rice, OakPark,IL",100,100000,30001,2
E1006,Nancy,Allen,123411,1978-06-02,F,"111 Green Pl, Elgin,IL",600,90000,30001,2
E1002,Alice,James,123457,1972-07-31,F,"980 Berry ln, Elgin,IL",200,80000,30002,5
E1005,Ahmed,Hussain,123410,1981-04-01,M,"216 Oak Tree, Geneva,IL",500,70000,30001,2
E1009,Andrea,Jones,123414,1990-09-07,F,"120 Fall Creek, Gary,IL",234,70000,30003,7
E1010,Ann,Jacob,123415,1982-03-30,F,"111 Britany Springs,Elgin,IL",220,70000,30002,5
E1007,Mary,Thomas,123412,1975-05-05,F,"100 Rose Pl, Gary,IL",650,65000,30003,7
E1008,Bharath,Gupta,123413,1985-06-05,M,"145 Berry Ln, Naperville,IL",660,65000,30003,7
E1004,Santosh,Kumar,123459,1985-07-20,M,"511 Aurora Av, Aurora,IL",400,60000,30002,5
E1003,Steve,Wells,123458,1980-10-08,M,"291 Springs, Gary,IL",300,50000,30002,5


In [14]:
# For each department and each sex category within that department, count how many employees belong to each group
q <- "
SELECT dep_id, sex, COUNT(*) AS count FROM employees
GROUP BY dep_id, sex
ORDER BY dep_id, sex;
"
dbGetQuery(c, q)

dep_id,sex,count
<chr>,<chr>,<int64>
2,F,1
2,M,2
5,F,2
5,M,2
7,F,2
7,M,1


In [15]:
# From the department-sex groupings, show only those combinations where fewer than 2 employees exist in that department for that sex
q <- "
SELECT dep_id, sex, COUNT(*) AS count FROM employees
GROUP BY dep_id, sex
HAVING count < 2
ORDER BY dep_id, sex;
"
dbGetQuery(c, q)

dep_id,sex,count
<chr>,<chr>,<int64>
2,F,1
7,M,1


In [16]:
# Retrieve all employee records where the address contains the word Elgin
q <- "
SELECT * FROM employees
WHERE address LIKE '%Elgin%';
"
dbGetQuery(c, q)

EMP_ID,F_NAME,L_NAME,SSN,B_DATE,SEX,ADDRESS,JOB_ID,SALARY,MANAGER_ID,DEP_ID
<chr>,<chr>,<chr>,<chr>,<date>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>
E1002,Alice,James,123457,1972-07-31,F,"980 Berry ln, Elgin,IL",200,80000,30002,5
E1006,Nancy,Allen,123411,1978-06-02,F,"111 Green Pl, Elgin,IL",600,90000,30001,2
E1010,Ann,Jacob,123415,1982-03-30,F,"111 Britany Springs,Elgin,IL",220,70000,30002,5


In [17]:
# List all employees born between January 1, 1970 and December 31, 1979 (inclusive)
q <- "
SELECT f_name, l_name, b_date FROM employees
WHERE b_date BETWEEN DATE('1970-01-01') AND DATE('1979-12-21')
ORDER BY b_date;
"
dbGetQuery(c, q)

f_name,l_name,b_date
<chr>,<chr>,<date>
Alice,James,1972-07-31
Mary,Thomas,1975-05-05
John,Thomas,1976-09-01
Nancy,Allen,1978-06-02


In [18]:
# Retrieve all employees who work in department 5, and earn a salary between 60,000 and 69,999 (inclusive)
q <- "
SELECT * FROM employees
WHERE dep_id = 5 AND salary BETWEEN 60000 AND 69999;
"
dbGetQuery(c, q)

EMP_ID,F_NAME,L_NAME,SSN,B_DATE,SEX,ADDRESS,JOB_ID,SALARY,MANAGER_ID,DEP_ID
<chr>,<chr>,<chr>,<chr>,<date>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>
E1004,Santosh,Kumar,123459,1985-07-20,M,"511 Aurora Av, Aurora,IL",400,60000,30002,5


In [19]:
# Display all employees ordered by department ID in ascending order
q <- "
SELECT * FROM employees
ORDER BY dep_id ASC;
"
dbGetQuery(c, q)

EMP_ID,F_NAME,L_NAME,SSN,B_DATE,SEX,ADDRESS,JOB_ID,SALARY,MANAGER_ID,DEP_ID
<chr>,<chr>,<chr>,<chr>,<date>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>
E1001,John,Thomas,123456,1976-09-01,M,"5631 Rice, OakPark,IL",100,100000,30001,2
E1005,Ahmed,Hussain,123410,1981-04-01,M,"216 Oak Tree, Geneva,IL",500,70000,30001,2
E1006,Nancy,Allen,123411,1978-06-02,F,"111 Green Pl, Elgin,IL",600,90000,30001,2
E1002,Alice,James,123457,1972-07-31,F,"980 Berry ln, Elgin,IL",200,80000,30002,5
E1003,Steve,Wells,123458,1980-10-08,M,"291 Springs, Gary,IL",300,50000,30002,5
E1004,Santosh,Kumar,123459,1985-07-20,M,"511 Aurora Av, Aurora,IL",400,60000,30002,5
E1010,Ann,Jacob,123415,1982-03-30,F,"111 Britany Springs,Elgin,IL",220,70000,30002,5
E1007,Mary,Thomas,123412,1975-05-05,F,"100 Rose Pl, Gary,IL",650,65000,30003,7
E1008,Bharath,Gupta,123413,1985-06-05,M,"145 Berry Ln, Naperville,IL",660,65000,30003,7
E1009,Andrea,Jones,123414,1990-09-07,F,"120 Fall Creek, Gary,IL",234,70000,30003,7


In [20]:
# Display all employees ordered first by department ID in descending order then alphabetically by last name in ascending order within each department
q <- "
SELECT * FROM employees
ORDER BY dep_id DESC, l_name ASC;
"
dbGetQuery(c, q)

EMP_ID,F_NAME,L_NAME,SSN,B_DATE,SEX,ADDRESS,JOB_ID,SALARY,MANAGER_ID,DEP_ID
<chr>,<chr>,<chr>,<chr>,<date>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>
E1008,Bharath,Gupta,123413,1985-06-05,M,"145 Berry Ln, Naperville,IL",660,65000,30003,7
E1009,Andrea,Jones,123414,1990-09-07,F,"120 Fall Creek, Gary,IL",234,70000,30003,7
E1007,Mary,Thomas,123412,1975-05-05,F,"100 Rose Pl, Gary,IL",650,65000,30003,7
E1010,Ann,Jacob,123415,1982-03-30,F,"111 Britany Springs,Elgin,IL",220,70000,30002,5
E1002,Alice,James,123457,1972-07-31,F,"980 Berry ln, Elgin,IL",200,80000,30002,5
E1004,Santosh,Kumar,123459,1985-07-20,M,"511 Aurora Av, Aurora,IL",400,60000,30002,5
E1003,Steve,Wells,123458,1980-10-08,M,"291 Springs, Gary,IL",300,50000,30002,5
E1006,Nancy,Allen,123411,1978-06-02,F,"111 Green Pl, Elgin,IL",600,90000,30001,2
E1005,Ahmed,Hussain,123410,1981-04-01,M,"216 Oak Tree, Geneva,IL",500,70000,30001,2
E1001,John,Thomas,123456,1976-09-01,M,"5631 Rice, OakPark,IL",100,100000,30001,2


In [21]:
# Display all information from the departments table
q <- "
SELECT * FROM departments;
"
dbGetQuery(c, q)

DEPT_ID_DEP,DEP_NAME,MANAGER_ID,LOC_ID
<chr>,<chr>,<chr>,<chr>
2,Architect Group,30001,L0001
5,Software Group,30002,L0002
7,Design Team,30003,L0003


In [22]:
# List all employee details along with their department names. Sort the results first alphabetically by department name (A–Z), then by last name in descending order within each department
q <- "
SELECT e.*, d.dep_name
FROM employees AS e, departments AS d
WHERE e.dep_id = d.dept_id_dep
ORDER BY dep_name ASC, l_name DESC;
"
dbGetQuery(c, q)

EMP_ID,F_NAME,L_NAME,SSN,B_DATE,SEX,ADDRESS,JOB_ID,SALARY,MANAGER_ID,DEP_ID,dep_name
<chr>,<chr>,<chr>,<chr>,<date>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>
E1001,John,Thomas,123456,1976-09-01,M,"5631 Rice, OakPark,IL",100,100000,30001,2,Architect Group
E1005,Ahmed,Hussain,123410,1981-04-01,M,"216 Oak Tree, Geneva,IL",500,70000,30001,2,Architect Group
E1006,Nancy,Allen,123411,1978-06-02,F,"111 Green Pl, Elgin,IL",600,90000,30001,2,Architect Group
E1007,Mary,Thomas,123412,1975-05-05,F,"100 Rose Pl, Gary,IL",650,65000,30003,7,Design Team
E1009,Andrea,Jones,123414,1990-09-07,F,"120 Fall Creek, Gary,IL",234,70000,30003,7,Design Team
E1008,Bharath,Gupta,123413,1985-06-05,M,"145 Berry Ln, Naperville,IL",660,65000,30003,7,Design Team
E1003,Steve,Wells,123458,1980-10-08,M,"291 Springs, Gary,IL",300,50000,30002,5,Software Group
E1004,Santosh,Kumar,123459,1985-07-20,M,"511 Aurora Av, Aurora,IL",400,60000,30002,5,Software Group
E1002,Alice,James,123457,1972-07-31,F,"980 Berry ln, Elgin,IL",200,80000,30002,5,Software Group
E1010,Ann,Jacob,123415,1982-03-30,F,"111 Britany Springs,Elgin,IL",220,70000,30002,5,Software Group


In [23]:
# For each department, calculate the total number of employees assigned to it
q <- "
SELECT d.dep_name, COUNT(*) AS emp_count
FROM employees AS e, departments AS d
WHERE e.dep_id = d.dept_id_dep
GROUP BY dep_name;
"
dbGetQuery(c, q)

dep_name,emp_count
<chr>,<int64>
Architect Group,3
Software Group,4
Design Team,3


In [30]:
# For each department, display the department ID, the department name, the number of employees in that department
q <- "
SELECT e.dep_id, d.dep_name, COUNT(*) AS emp_count
FROM employees AS e, departments AS d
WHERE e.dep_id = d.dept_id_dep
GROUP BY e.dep_id;
"
dbGetQuery(c, q)

dep_id,dep_name,emp_count
<chr>,<chr>,<int64>
2,Architect Group,3
5,Software Group,4
7,Design Team,3


In [None]:
# For each department show the department ID, the total number of employees, the average salary in that department
q <- "
SELECT dep_id, COUNT(*) AS emp_count, AVG(salary) AS avg_salary FROM employees
GROUP BY dep_id;
"
dbGetQuery(c, q)

dep_id,emp_count,avg_salary
<chr>,<int64>,<dbl>
2,3,86666.67
5,4,65000.0
7,3,66666.67


In [33]:
# For each department that has fewer than 4 employees, display the department ID, the department name, the number of employees, the average salary
q <- "
SELECT e.dep_id, d.dep_name, COUNT(*) AS emp_count, AVG(salary) AS avg_salary
FROM employees AS e, departments AS d
WHERE e.dep_id = d.dept_id_dep
GROUP BY dep_id
HAVING emp_count < 4;
"
dbGetQuery(c, q)

dep_id,dep_name,emp_count,avg_salary
<chr>,<chr>,<int64>,<dbl>
2,Architect Group,3,86666.67
7,Design Team,3,66666.67


## Functions

In this section, we explore some basic functionality in MySQL with IBM's toy pet rescue schema. This schema contains a single table.

In [37]:
# Load up the pet rescue schema
q <- "
USE ibm_pets_01;
"
dbGetQuery(c, q)

In [None]:
# Display the table
q <- "
SELECT * FROM petrescue;
"
dbGetQuery(c, q)

ID,ANIMAL,QUANTITY,COST,RESCUEDATE
<int>,<chr>,<int>,<dbl>,<date>
1,Cat,9,450.09,2018-05-29
2,Dog,3,666.66,2018-06-01
3,Dog,1,100.0,2018-06-04
4,Parrot,2,50.0,2018-06-04
5,Dog,1,75.75,2018-06-10
6,Hamster,6,60.6,2018-06-11
7,Cat,1,44.44,2018-06-11
8,Goldfish,24,48.48,2018-06-14
9,Dog,2,222.22,2018-06-15


In [40]:
# Calculate the total cost incurred for all animal rescues combined
q <- "
SELECT SUM(cost) AS total_cost FROM petrescue;
"
dbGetQuery(c, q)

total_cost
<dbl>
1718.24


In [41]:
# Determine the maximum number of animals rescued in a single record
q <- "
SELECT MAX(quantity) AS highest_quantity FROM petrescue;
"
dbGetQuery(c, q)

highest_quantity
<int>
24


In [42]:
# Compute the average cost per rescue record across all animals
q <- "
SELECT AVG(cost) AS avg_cost FROM petrescue;
"
dbGetQuery(c, q)

avg_cost
<dbl>
190.9156


In [44]:
# Calculate the average rescue cost for dogs only, ignoring case sensitivity in the animal name
q <- "
SELECT ROUND(AVG(cost), 2) AS avg_dog_rescue_cost FROM petrescue
WHERE animal = 'dog';
"
dbGetQuery(c, q)

avg_dog_rescue_cost
<dbl>
266.16


In [46]:
# For each rescue record, display the animal type, the rescue cost rounded to two decimal places
q <- "
SELECT animal, ROUND(cost, 2) AS cost FROM petrescue;
"
dbGetQuery(c, q)

animal,cost
<chr>,<dbl>
Cat,450.09
Dog,666.66
Dog,100.0
Parrot,50.0
Dog,75.75
Hamster,60.6
Cat,44.44
Goldfish,48.48
Dog,222.22


In [47]:
# Display the animal type in uppercase letters, the quantity rescued, the cost, the rescue date
q <- "
SELECT UPPER(animal), quantity, cost, rescuedate FROM petrescue;
"
dbGetQuery(c, q)

UPPER(animal),quantity,cost,rescuedate
<chr>,<int>,<dbl>,<date>
CAT,9,450.09,2018-05-29
DOG,3,666.66,2018-06-01
DOG,1,100.0,2018-06-04
PARROT,2,50.0,2018-06-04
DOG,1,75.75,2018-06-10
HAMSTER,6,60.6,2018-06-11
CAT,1,44.44,2018-06-11
GOLDFISH,24,48.48,2018-06-14
DOG,2,222.22,2018-06-15


In [49]:
# List all distinct animal types in the dataset, formatted in uppercase letters
q <- "
SELECT DISTINCT(animal) AS unique_animal FROM petrescue;
"
dbGetQuery(c, q)

unique_animal
<chr>
Cat
Dog
Parrot
Hamster
Goldfish


In [53]:
# For all rescue records involving cats (case-insensitive), display the animal type, the day of the month on which the rescue occurred
q <- "
SELECT animal, DAYOFMONTH(rescuedate) AS day_of_month FROM petrescue
WHERE animal = 'cat';
"
dbGetQuery(c, q)

animal,day_of_month
<chr>,<int64>
Cat,29
Cat,11


In [54]:
# Count how many rescues occurred on the 5th day of any month
q <- "
SELECT COUNT(*) AS rescue_count FROM petrescue
WHERE DAYOFMONTH(rescuedate) = 5;
"
dbGetQuery(c, q)

rescue_count
<int64>
0


In [55]:
# Count how many rescues occurred on the 14th day of any month
q <- "
SELECT COUNT(*) AS rescue_count FROM petrescue
WHERE DAYOFMONTH(rescuedate) = 14;
"
dbGetQuery(c, q)

rescue_count
<int64>
1


In [57]:
# For each rescue record, calculate a new date representing the examination cutoff, defined as three days after the rescue date
q <- "
SELECT *, DATE_ADD(rescuedate, INTERVAL 3 DAY) AS exam_date FROM petrescue;
"
dbGetQuery(c, q)

ID,ANIMAL,QUANTITY,COST,RESCUEDATE,exam_date
<int>,<chr>,<int>,<dbl>,<date>,<date>
1,Cat,9,450.09,2018-05-29,2018-06-01
2,Dog,3,666.66,2018-06-01,2018-06-04
3,Dog,1,100.0,2018-06-04,2018-06-07
4,Parrot,2,50.0,2018-06-04,2018-06-07
5,Dog,1,75.75,2018-06-10,2018-06-13
6,Hamster,6,60.6,2018-06-11,2018-06-14
7,Cat,1,44.44,2018-06-11,2018-06-14
8,Goldfish,24,48.48,2018-06-14,2018-06-17
9,Dog,2,222.22,2018-06-15,2018-06-18


In [None]:
# For each rescue record, calculate how many days have passed between, the rescue date, and today’s date
q <- "
SELECT *, DATEDIFF(CURDATE(), rescuedate) AS days_since_rescue FROM petrescue;
"
dbGetQuery(c, q)

ID,ANIMAL,QUANTITY,COST,RESCUEDATE,days_since_rescue
<int>,<chr>,<int>,<dbl>,<date>,<int64>
1,Cat,9,450.09,2018-05-29,2817
2,Dog,3,666.66,2018-06-01,2814
3,Dog,1,100.0,2018-06-04,2811
4,Parrot,2,50.0,2018-06-04,2811
5,Dog,1,75.75,2018-06-10,2805
6,Hamster,6,60.6,2018-06-11,2804
7,Cat,1,44.44,2018-06-11,2804
8,Goldfish,24,48.48,2018-06-14,2801
9,Dog,2,222.22,2018-06-15,2800


## Subqueries and nested `SELECT`

In [None]:
# Return to the employees schema
q <- "
USE ibm_employees_02;
"
dbGetQuery(c, q)

In [60]:
# Calculate the average salary of all employees in the company
q <- "
SELECT AVG(salary) AS avg_salary FROM employees;
"
dbGetQuery(c, q)

avg_salary
<dbl>
72000


In [61]:
# Retrieve all employee records for individuals whose salary is strictly greater than the company-wide average salary
q <- "
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
"
dbGetQuery(c, q)

EMP_ID,F_NAME,L_NAME,SSN,B_DATE,SEX,ADDRESS,JOB_ID,SALARY,MANAGER_ID,DEP_ID
<chr>,<chr>,<chr>,<chr>,<date>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>
E1001,John,Thomas,123456,1976-09-01,M,"5631 Rice, OakPark,IL",100,100000.0,30001,2
E1002,Alice,James,123457,1972-07-31,F,"980 Berry ln, Elgin,IL",200,80000.0,30002,5
E1006,Nancy,Allen,123411,1978-06-02,F,"111 Green Pl, Elgin,IL",600,90000.0,30001,2


In [None]:
# Display all employee columns and the company-wide average salary as an additional column
q <- "
SELECT 
    *, 
    (SELECT AVG(salary) FROM employees) AS avg_salary
FROM employees
LIMIT 5;
"
dbGetQuery(c, q)

EMP_ID,F_NAME,L_NAME,SSN,B_DATE,SEX,ADDRESS,JOB_ID,SALARY,MANAGER_ID,DEP_ID,avg_salary
<chr>,<chr>,<chr>,<chr>,<date>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<dbl>
E1001,John,Thomas,123456,1976-09-01,M,"5631 Rice, OakPark,IL",100,100000.0,30001,2,72000
E1002,Alice,James,123457,1972-07-31,F,"980 Berry ln, Elgin,IL",200,80000.0,30002,5,72000
E1003,Steve,Wells,123458,1980-10-08,M,"291 Springs, Gary,IL",300,50000.0,30002,5,72000
E1004,Santosh,Kumar,123459,1985-07-20,M,"511 Aurora Av, Aurora,IL",400,60000.0,30002,5,72000
E1005,Ahmed,Hussain,123410,1981-04-01,M,"216 Oak Tree, Geneva,IL",500,70000.0,30001,2,72000


In [67]:
# For every employee, display all original columns and the differences between individual salaries and the company-wide average
q <- "
SELECT
    *,
    salary - (SELECT AVG(salary) FROM employees) AS demeaned_salary
FROM employees
LIMIT 5;
"
dbGetQuery(c, q)

EMP_ID,F_NAME,L_NAME,SSN,B_DATE,SEX,ADDRESS,JOB_ID,SALARY,MANAGER_ID,DEP_ID,demeaned_salary
<chr>,<chr>,<chr>,<chr>,<date>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<dbl>
E1001,John,Thomas,123456,1976-09-01,M,"5631 Rice, OakPark,IL",100,100000.0,30001,2,28000
E1002,Alice,James,123457,1972-07-31,F,"980 Berry ln, Elgin,IL",200,80000.0,30002,5,8000
E1003,Steve,Wells,123458,1980-10-08,M,"291 Springs, Gary,IL",300,50000.0,30002,5,-22000
E1004,Santosh,Kumar,123459,1985-07-20,M,"511 Aurora Av, Aurora,IL",400,60000.0,30002,5,-12000
E1005,Ahmed,Hussain,123410,1981-04-01,M,"216 Oak Tree, Geneva,IL",500,70000.0,30001,2,-2000


In [71]:
# Repeat the previous task, but compute the average salary once in a derived table and attach it to every employee record using a CROSS JOIN
q <- "
SELECT
    e.*,
    e.salary - a.avg_salary AS demeaned_salary
FROM employees AS e
CROSS JOIN (SELECT AVG(salary) AS avg_salary FROM employees) AS a
LIMIT 5;
"
dbGetQuery(c, q)

EMP_ID,F_NAME,L_NAME,SSN,B_DATE,SEX,ADDRESS,JOB_ID,SALARY,MANAGER_ID,DEP_ID,demeaned_salary
<chr>,<chr>,<chr>,<chr>,<date>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<dbl>
E1001,John,Thomas,123456,1976-09-01,M,"5631 Rice, OakPark,IL",100,100000.0,30001,2,28000
E1002,Alice,James,123457,1972-07-31,F,"980 Berry ln, Elgin,IL",200,80000.0,30002,5,8000
E1003,Steve,Wells,123458,1980-10-08,M,"291 Springs, Gary,IL",300,50000.0,30002,5,-22000
E1004,Santosh,Kumar,123459,1985-07-20,M,"511 Aurora Av, Aurora,IL",400,60000.0,30002,5,-12000
E1005,Ahmed,Hussain,123410,1981-04-01,M,"216 Oak Tree, Geneva,IL",500,70000.0,30001,2,-2000


In [73]:
# For each employee, display all original columns, the highest salary in the company as top_salary
q <- "
SELECT
    *,
    (SELECT MAX(salary) FROM employees) AS top_salary,
    salary - (SELECT MAX(salary) FROM employees) AS top_salary_diff
FROM employees
LIMIT 5;
"
dbGetQuery(c, q)

EMP_ID,F_NAME,L_NAME,SSN,B_DATE,SEX,ADDRESS,JOB_ID,SALARY,MANAGER_ID,DEP_ID,top_salary,top_salary_diff
<chr>,<chr>,<chr>,<chr>,<date>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<dbl>,<dbl>
E1001,John,Thomas,123456,1976-09-01,M,"5631 Rice, OakPark,IL",100,100000.0,30001,2,100000.0,0
E1002,Alice,James,123457,1972-07-31,F,"980 Berry ln, Elgin,IL",200,80000.0,30002,5,100000.0,-20000
E1003,Steve,Wells,123458,1980-10-08,M,"291 Springs, Gary,IL",300,50000.0,30002,5,100000.0,-50000
E1004,Santosh,Kumar,123459,1985-07-20,M,"511 Aurora Av, Aurora,IL",400,60000.0,30002,5,100000.0,-40000
E1005,Ahmed,Hussain,123410,1981-04-01,M,"216 Oak Tree, Geneva,IL",500,70000.0,30001,2,100000.0,-30000


In [76]:
# Repeat the previous task, but compute the maximum salary once in a derived table, attach it to all employee records via CROSS JOIN
q <- "
SELECT
    e.*,
    a.top_salary,
    e.salary - a.top_salary AS top_salary_diff
FROM employees AS e
CROSS JOIN (SELECT MAX(salary) AS top_salary FROM employees) AS a
LIMIT 5;
"
dbGetQuery(c, q)

EMP_ID,F_NAME,L_NAME,SSN,B_DATE,SEX,ADDRESS,JOB_ID,SALARY,MANAGER_ID,DEP_ID,top_salary,top_salary_diff
<chr>,<chr>,<chr>,<chr>,<date>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<dbl>,<dbl>
E1001,John,Thomas,123456,1976-09-01,M,"5631 Rice, OakPark,IL",100,100000.0,30001,2,100000.0,0
E1002,Alice,James,123457,1972-07-31,F,"980 Berry ln, Elgin,IL",200,80000.0,30002,5,100000.0,-20000
E1003,Steve,Wells,123458,1980-10-08,M,"291 Springs, Gary,IL",300,50000.0,30002,5,100000.0,-50000
E1004,Santosh,Kumar,123459,1985-07-20,M,"511 Aurora Av, Aurora,IL",400,60000.0,30002,5,100000.0,-40000
E1005,Ahmed,Hussain,123410,1981-04-01,M,"216 Oak Tree, Geneva,IL",500,70000.0,30001,2,100000.0,-30000


In [79]:
# Create a public-facing employee view that excludes sensitive or unnecessary information. Return only the following columns, employee ID, first name, last name, job ID, manager ID, department ID
q <- "
SELECT *
FROM (
    SELECT emp_id, f_name, l_name, job_id, manager_id, dep_id
    FROM employees
) AS public_view;
"
dbGetQuery(c, q)

emp_id,f_name,l_name,job_id,manager_id,dep_id
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
E1001,John,Thomas,100,30001,2
E1002,Alice,James,200,30002,5
E1003,Steve,Wells,300,30002,5
E1004,Santosh,Kumar,400,30002,5
E1005,Ahmed,Hussain,500,30001,2
E1006,Nancy,Allen,600,30001,2
E1007,Mary,Thomas,650,30003,7
E1008,Bharath,Gupta,660,30003,7
E1009,Andrea,Jones,234,30003,7
E1010,Ann,Jacob,220,30002,5


## Joins

In [None]:
# 
q <- "

"
dbGetQuery(c, q)

## Disconnect

In [None]:
# Disconnect
#dbDisconnect(c)