# Cross Joins

### Introduction

### Loading the data

In [1]:
import sqlite3
conn = sqlite3.connect('users.db')

In [2]:
import pandas as pd
root_url = "https://raw.githubusercontent.com/jigsawlabs-student/curriculum-images/main/has-many-through-bar/data/"
names = ['bartenders', 'customers', 'drinks', 'orders', 'ingredients', 'ingredients_drinks']
loaded_dfs = [pd.read_csv(f'{root_url}{name}.csv') for name in names]

In [3]:
for index, name in enumerate(names):
    loaded_dfs[index].to_sql(f'{name}', conn, index = False, if_exists = 'replace')

### Performing A Cross Join

Chances are that you have already seen a cross join.  A cross join occurs every time that we do not specify what columns we are joining.  

For example, let's join show all of the possible combinations customers paired with bartenders.

In [8]:
import pandas as pd

pd.read_sql("""
select bartenders.name as bartender_name,
customers.name as customer_name
from bartenders join customers;
""", conn)

Unnamed: 0,bartender_name,customer_name
0,moe,bart simpson
1,moe,maggie simpson
2,moe,lisa simpson
3,selma,bart simpson
4,selma,maggie simpson
5,selma,lisa simpson
6,patty,bart simpson
7,patty,maggie simpson
8,patty,lisa simpson


In other words, with a cross join we are creating all possible combinations of our data.

### But why?

Now why would we want to do such a thing?  It turns out that generating a grid of data, can be pretty useful.

This [stackoverflow post](https://stackoverflow.com/questions/219716/what-are-the-uses-for-cross-join), lays out some good examples of grids we may want:

* Size and color information for an article of clothing:
    
```sql
select size, color
from sizes CROSS JOIN colors
```

A row for every minute in the day, and you want to use it to verify that a procedure has executed each minute, so you might cross three tables:

```sql
CREATE TABLE minute_tasks as (
    select hour, minute from hours CROSS JOIN minutes
)
```

### Cross joining itself

Now sometimes, we'll want to cross join a table with itself.  For example, let's say that we want to find the smallest difference in ages between all of our customers.

A good starting point is with a cross join, where we pair every customer with every other customer.

Doing so looks something like the following:

```sql
select c1.name, c2.name from customers c1 join customers c2
```

So notice above that we join customers with itself.  And each time we reference the table, we give it a separate alias (c1 and c2).

Now let's see a more full version of this query.

In [37]:
import pandas as pd

query = """
select c1.id id_1, c1.name name_1, c1.birthyear as birthyear_1, 
c2.id as id_2, c2.name as name_2, c2.birthyear as birthyear_2 
from customers c1 join customers c2;
"""

pd.read_sql(query, conn)

Unnamed: 0,id_1,name_1,birthyear_1,id_2,name_2,birthyear_2
0,1,bart simpson,2008,1,bart simpson,2008
1,1,bart simpson,2008,2,maggie simpson,2016
2,1,bart simpson,2008,3,lisa simpson,2006
3,2,maggie simpson,2016,1,bart simpson,2008
4,2,maggie simpson,2016,2,maggie simpson,2016
5,2,maggie simpson,2016,3,lisa simpson,2006
6,3,lisa simpson,2006,1,bart simpson,2008
7,3,lisa simpson,2006,2,maggie simpson,2016
8,3,lisa simpson,2006,3,lisa simpson,2006


Ok, we'll let you take it from here. Update the query to find the smallest difference in ages between our customers. 

In [38]:
import pandas as pd

query = """

"""

pd.read_sql(query, conn)

# minimum_diff
# 0	10

### Resources

[Stackoverflow - Cross join](https://stackoverflow.com/questions/219716/what-are-the-uses-for-cross-join)