# Cross Joins

### Introduction

In this lesson we'll learn about cross joins, which given two lists, generates all combinations of those lists.  This is also known as the cartesian product.  

Ok, let's get started.

### Loading the data

For this lesson, we'll work with our moe's bar dataset.  Let's load the data.

In [1]:
import sqlite3
conn = sqlite3.connect('users.db')

In [2]:
import pandas as pd
root_url = "https://raw.githubusercontent.com/jigsawlabs-student/curriculum-images/main/has-many-through-bar/data/"
names = ['bartenders', 'customers', 'drinks', 'orders', 'ingredients', 'ingredients_drinks']
loaded_dfs = [pd.read_csv(f'{root_url}{name}.csv') for name in names]

In [3]:
for index, name in enumerate(names):
    loaded_dfs[index].to_sql(f'{name}', conn, index = False, if_exists = 'replace')

### Performing A Cross Join

Chances are that you have already seen a cross join.  A cross join occurs every time that we do not specify what columns we are joining.  

For example, let's join show all of the possible combinations customers paired with bartenders.

In [5]:
pd.read_sql(""" select * from bartenders""", conn)

Unnamed: 0,id,name,hometown,birthyear
0,1,moe,springfield,1965
1,2,selma,milwaukee,1970
2,3,patty,philly,1970


In [6]:
import pandas as pd

pd.read_sql("""create table ( select bartenders.name as bartender_name,
customers.name as customer_name
from bartenders join customers)

""", conn)

Unnamed: 0,bartender_name,customer_name
0,moe,bart simpson
1,moe,maggie simpson
2,moe,lisa simpson
3,selma,bart simpson
4,selma,maggie simpson
5,selma,lisa simpson
6,patty,bart simpson
7,patty,maggie simpson
8,patty,lisa simpson


### But why?

Now why would we want to do such a thing?  Cross joins generate *a grid* of data.  And doing so can be pretty useful.

Let's see some examples, taken from this  [stackoverflow post](https://stackoverflow.com/questions/219716/what-are-the-uses-for-cross-join).

### Cross joining itself

* Self cross joins

In [14]:
import pandas as pd

query = """
select c1.id c1_id, c1.name, c2.name, c2.id c2_id from customers c1 join customers c2
where c1.id <> c2.id 
"""

pd.read_sql(query, conn)

Unnamed: 0,c1_id,name,name.1,c2_id
0,1,bart simpson,maggie simpson,2
1,1,bart simpson,lisa simpson,3
2,2,maggie simpson,bart simpson,1
3,2,maggie simpson,lisa simpson,3
4,3,lisa simpson,bart simpson,1
5,3,lisa simpson,maggie simpson,2


In [13]:
query = """
select c1.name, c2.name from customers c1 join customers c2
where c1.id > c2.id 
"""

pd.read_sql(query, conn)

Unnamed: 0,name,name.1
0,maggie simpson,bart simpson
1,lisa simpson,bart simpson
2,lisa simpson,maggie simpson


### Challenge problem

Ok, now it's your turn.  Use a cross join to find the smallest difference in ages between our customers. 

In [38]:
import pandas as pd

query = """

"""

pd.read_sql(query, conn)

# minimum_diff
# 0	10

### Resources

[Stackoverflow - Cross join](https://stackoverflow.com/questions/219716/what-are-the-uses-for-cross-join)