# Correlated Subqueries Lab

### Introduction

In the last lesson, we saw that correlated subqueries can be used to perform more precise queries than a window function, and also allow us to join records from two different tables with more precision.  

In this lab, we'll practice implementing those queries.

### Loading our data

In [1]:
import pandas as pd
import sqlite3
conn = sqlite3.connect('users.db')
root_url = "https://raw.githubusercontent.com/tech-interviews-jigsaw/sql-advanced-joins/main/6-common-strategies"
table_names = ['products', 'employees', 'orders', 'customers', 'dog_foods']

dfs = [pd.read_csv(f"{root_url}/{table_name}.csv") for table_name in table_names]

In [2]:
[df.to_sql(table_name, conn, index = True, 
           index_label = 'id', if_exists = 'replace') 
 for df, table_name in zip(dfs, table_names)]

[10, 5, 3, 5, 5]

### Correlated Subqueries

Let's start with our product data, which as you can see has our product name, category and price.

In [17]:
pd.read_sql("select * from products limit 6", conn)

Unnamed: 0,id,name,category,price
0,0,Smartphone,Electronics,600
1,1,Running Shoes,Footwear,80
2,2,T-shirt,Clothing,20
3,3,Coffee Maker,Appliances,50
4,4,Backpack,Accessories,40
5,5,Headphones,Electronics,150


Use a correlated subquery to add a column of the minimum price for each product category.  Order the overall table by the product and the price.

> Remember the components of the outer table, and the subquery that references the outer table.  Get started by just querying the outer table, and then add the subquery.

In [19]:
query = """

"""
pd.read_sql(query, conn)

# id	name	category	price	min_price
# 0	8	Water Bottle	Accessories	15	15
# 1	9	Sunglasses	Accessories	30	15
# 2	4	Backpack	Accessories	40	15
# ...

Unnamed: 0,id,name,category,price,min_price
0,8,Water Bottle,Accessories,15,15
1,9,Sunglasses,Accessories,30,15
2,4,Backpack,Accessories,40,15
3,3,Coffee Maker,Appliances,50,50
4,2,T-shirt,Clothing,20,20
5,6,Jeans,Clothing,45,20
6,5,Headphones,Electronics,150,150
7,0,Smartphone,Electronics,600,150
8,1,Running Shoes,Footwear,80,80
9,7,Toothbrush,Personal Care,5,5


Next, write a subquery to return the second most expensive consumer electronics product.  Order the results by category and price.

In [4]:
query = """

"""
pd.read_sql(query, conn)

# 	id	name	category	price	second_most
# 0	8	Water Bottle	Accessories	15	30.0
# 1	9	Sunglasses	Accessories	30	30.0
# 2	4	Backpack	Accessories	40	30.0
# 3	3	Coffee Maker	Appliances	50	NaN
# 4	2	T-shirt	Clothing	20	20.0

Unnamed: 0,id,name,category,price,second_most
0,8,Water Bottle,Accessories,15,30.0
1,9,Sunglasses,Accessories,30,30.0
2,4,Backpack,Accessories,40,30.0
3,3,Coffee Maker,Appliances,50,
4,2,T-shirt,Clothing,20,20.0


* Now return only the second priciest product from each category (you do not need to use a correlated subquery to do so).

In [14]:
query = """

"""
pd.read_sql(query, conn)

# 	name	category	price
# 0	Sunglasses	Accessories	30
# 1	T-shirt	Clothing	20
# 2	Headphones	Electronics	150

Unnamed: 0,name,category,price
0,Sunglasses,Accessories,30
1,T-shirt,Clothing,20
2,Headphones,Electronics,150


Next, return only cheapest product from each group.

In [18]:
query = """

"""
pd.read_sql(query, conn)

# name	category	price
# 0	Water Bottle	Accessories	15
# 1	Coffee Maker	Appliances	50
# 2	T-shirt	Clothing	20
# 3	Headphones	Electronics	150
# 4	Running Shoes	Footwear	80
# 5	Toothbrush	Personal Care	5

Unnamed: 0,name,category,price
0,Water Bottle,Accessories,15
1,Coffee Maker,Appliances,50
2,T-shirt,Clothing,20
3,Headphones,Electronics,150
4,Running Shoes,Footwear,80
5,Toothbrush,Personal Care,5


### Back to Dog Food

Ok, now let's take another look at our dog food data.

In [22]:
pd.read_sql("select * from dog_foods limit 3", conn)

Unnamed: 0,id,brand,price
0,0,Acme Dog Food,22
1,1,Puppy Chow,32
2,2,Healthy Paws,38


And our customers.

In [24]:
pd.read_sql("select * from customers limit 3", conn)

Unnamed: 0,id,customer_id,name,budget
0,0,1,John Smith,25
1,1,2,Jane Doe,30
2,2,3,Michael Brown,40


* Now, return the set of `dog_brands`, but with an added column displaying the number of customers who can afford each dog food.  Order by the number of customers.

In [29]:
query = """

"""
pd.read_sql(query, conn)

# 	id	brand	price	num_customers
# 0	3	Bark Bites	19	5
# 1	0	Acme Dog Food	22	4
# 2	1	Puppy Chow	32	2
# 3	2	Healthy Paws	38	2
# 4	4	Superior K9	45	1

Unnamed: 0,id,brand,price,num_customers
0,3,Bark Bites,19,5
1,0,Acme Dog Food,22,4
2,1,Puppy Chow,32,2
3,2,Healthy Paws,38,2
4,4,Superior K9,45,1


### Summary

In this lesson, we reviewed how to perform a correlated subquery.  To perform a correlated subquery, we need an outer table, and a subquery that references that outer table.  

The correlated subquery executes for each row in the outer query, for either a filter or calculation. 

```sql
SELECT employee_name, department, salary,
(SELECT MAX(salary) FROM employees e2 WHERE e2.department = e1.department) AS max_sal -- 2. correlated subquery
FROM employees e1 -- 1. outer query
```

### Resources

[Correlated subqueries performance](https://blog.jooq.org/correlated-subqueries-are-evil-and-slow-or-are-they/)