# Correlated Subqueries

### Loading our data

In [8]:
import pandas as pd
import sqlite3
conn = sqlite3.connect('users.db')

table_names = ['products', 'employees', 'orders', 'customers', 'dog_foods']

dfs = [pd.read_csv(f"./{table_name}.csv") for table_name in table_names]

In [9]:
[df.to_sql(table_name, conn, index = True, 
           index_label = 'id', if_exists = 'replace') 
 for df, table_name in zip(dfs, table_names)]

[10, 5, 3, 5, 5]

### Correlated Subqueries

Let's start with our product data.

In [17]:
pd.read_sql("select * from products limit 6", conn)

Unnamed: 0,id,name,category,price
0,0,Smartphone,Electronics,600
1,1,Running Shoes,Footwear,80
2,2,T-shirt,Clothing,20
3,3,Coffee Maker,Appliances,50
4,4,Backpack,Accessories,40
5,5,Headphones,Electronics,150


* Use a correlated subquery to add a column of the minimum price for each product category.  Order the overall table by the product and the price.
    * Remember the components of the outer table, and the subquery that references the outer table.  Begin with just the outer table.

In [19]:
query = """
select *, 
(select min(price) from products p2 where p1.category = p2.category)
as min_price from products p1 order by category, price
"""
pd.read_sql(query, conn)

Unnamed: 0,id,name,category,price,min_price
0,8,Water Bottle,Accessories,15,15
1,9,Sunglasses,Accessories,30,15
2,4,Backpack,Accessories,40,15
3,3,Coffee Maker,Appliances,50,50
4,2,T-shirt,Clothing,20,20
5,6,Jeans,Clothing,45,20
6,5,Headphones,Electronics,150,150
7,0,Smartphone,Electronics,600,150
8,1,Running Shoes,Footwear,80,80
9,7,Toothbrush,Personal Care,5,5


* Now write a subquery to return the second most expensive consumer electronics product
    * Order the results by category and price

In [20]:
query = """
select *, 
(select price from products p2 where p1.category = p2.category order by price desc limit 1 offset 1)
as second_most from products p1 order by category, price
"""
pd.read_sql(query, conn)

Unnamed: 0,id,name,category,price,second_most
0,8,Water Bottle,Accessories,15,30.0
1,9,Sunglasses,Accessories,30,30.0
2,4,Backpack,Accessories,40,30.0
3,3,Coffee Maker,Appliances,50,
4,2,T-shirt,Clothing,20,20.0
5,6,Jeans,Clothing,45,20.0
6,5,Headphones,Electronics,150,150.0
7,0,Smartphone,Electronics,600,150.0
8,1,Running Shoes,Footwear,80,
9,7,Toothbrush,Personal Care,5,


### Back to Dog Food

Ok, now let's take another look at our dog food data.

In [22]:
pd.read_sql("select * from dog_foods limit 3", conn)

Unnamed: 0,id,brand,price
0,0,Acme Dog Food,22
1,1,Puppy Chow,32
2,2,Healthy Paws,38


And our customers.

In [24]:
pd.read_sql("select * from customers limit 3", conn)

Unnamed: 0,id,customer_id,name,budget
0,0,1,John Smith,25
1,1,2,Jane Doe,30
2,2,3,Michael Brown,40


* Now, find the number of customers who can afford each dog food.  Order by the number of customers.

In [29]:
query = """
select *, 
(select count(*) from customers c where d.price < c.budget) num_customers from dog_foods d
order by num_customers desc
"""
pd.read_sql(query, conn)

Unnamed: 0,id,brand,price,num_customers
0,3,Bark Bites,19,5
1,0,Acme Dog Food,22,4
2,1,Puppy Chow,32,2
3,2,Healthy Paws,38,2
4,4,Superior K9,45,1


### Summary

In this lesson, we saw how to perform a correlated subquery.  To perform a correlated subquery, we need an outer table, and a subquery that references that outer table.  

The correlated subquery executes for each row in the outer query, for either a filter or calculation. 

```sql
SELECT employee_name, department, salary,
(SELECT MAX(salary) FROM employees e2 WHERE e2.department = e1.department) AS max_sal -- 2. correlated subquery
FROM employees e1 -- 1. outer query
```

We also saw some use cases for our correlated subquery.  

For example, we saw saw a query that returns the second highest salary per department -- which goes further than what we can do with a window function.

In [None]:
query = """SELECT employee_name, department, salary,
(SELECT salary FROM employees e2 WHERE e2.department = e1.department
order by salary desc limit 1 offset 1 ) AS max_sal
FROM employees e1 order by department desc, salary desc"""

And then finally, we saw how we can join two tables, aligning the rows based on our subquery -- as we did by finding the priciest dog food within each customer's budget.

### Resources

[Correlated subqueries performance](https://blog.jooq.org/correlated-subqueries-are-evil-and-slow-or-are-they/)