# Join Statements - Lab

## Introduction

In this lab, you'll practice your knowledge of `JOIN` statements, using various types of joins and various methods for specifying the links between them.

## Objectives

You will be able to:
* Write SQL queries that make use of various types of joins
* Compare and contrast the various types of joins
* Discuss how primary and foreign keys are used in SQL
* Decide and perform whichever type of join is best for retrieving desired data

## CRM Schema

In almost all cases, rather than just working with a single table you will typically need data from multiple tables. 
Doing this requires the use of **joins** using shared columns from the two tables. 

In this lab, you'll use the same customer relationship management (CRM) database that you saw from the previous lesson.
<img src='images/Database-Schema.png' width="600">

## Connecting to the Database
Import the necessary packages and connect to the database **data.sqlite**.

In [1]:
import pandas as pd
import sqlite3

In [2]:
conn = sqlite3.connect('data.sqlite')
cur = conn.cursor()

## Display the names of all the employees in Boston.
Hint: join the employees and offices tables.

In [5]:
cur.execute('''SELECT firstName, lastName
               FROM employees e
               JOIN offices o
               ON e.officeCode = o.officeCode
               WHERE city = 'Boston';''')

df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
df.head()


# Alternatively, I could use:

# cur.execute("""SELECT firstName, lastName 
#                FROM employees 
#                JOIN offices 
#                USING(officeCode) 
#                WHERE city = 'Boston';""")

Unnamed: 0,firstName,lastName
0,Julie,Firrelli
1,Steve,Patterson


## Are there any offices that have zero employees?
Hint: Combine the employees and offices tables and use a group by.

In [15]:
cur.execute('''SELECT city, COUNT(employeeNumber)
               FROM offices
               LEFT JOIN employees
               USING(officeCode)
               GROUP BY city;
               ''')

df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
df.head()

# There are no offices with zero employees.

Unnamed: 0,city,COUNT(employeeNumber)
0,Boston,2
1,London,2
2,NYC,2
3,Paris,5
4,San Francisco,6


## Write 3 Questions of your own and answer them

In [None]:
# Answers will vary
# Example: Display the htmlDescription and employee's first and last name for each product that each employee has sold

Where did the most expensive order come from?

In [60]:
cur.execute('''SELECT city, state, country, MAX(amount)
               FROM customers
               JOIN payments
               USING(customerNumber);''')

df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
df


# Have Abhineet check these.
# When I put MIN instead of MAX, it looks like I actually get a bigger amount.
# Apparently this is read as the max value because it starts with 99... Why?

Unnamed: 0,city,state,country,MAX(amount)
0,New Bedford,MA,USA,9977.85


Display names of all customers with on-hold orders.

In [45]:
cur.execute('''SELECT status, contactFirstName, contactLastName
               FROM orders
               JOIN customers
               USING(customerNumber)
               WHERE status = 'On Hold';''')

df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
df

Unnamed: 0,status,contactFirstName,contactLastName
0,On Hold,Christina,Berglund
1,On Hold,William,Brown
2,On Hold,Sue,Frick
3,On Hold,Juri,Yoshido


Pull up the name and phone number of whoever placed the most recent order.

In [53]:
cur.execute('''SELECT contactFirstName, contactLastName, phone, orderDate
               FROM customers
               JOIN orders
               USING(customerNumber)
               ORDER BY orderDate DESC
               LIMIT 1;''')

df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
df

Unnamed: 0,contactFirstName,contactLastName,phone,orderDate
0,Janine,Labrune,40.67.8555,2005-05-31


## Level Up: Display the names of every individual product that each employee has sold

In [61]:
cur.execute("""SELECT firstName, lastName, productName
               FROM employees e
               JOIN customers c
               ON e.employeeNumber = c.salesRepEmployeeNumber
               JOIN orders o
               USING(customerNumber)
               JOIN orderdetails od
               USING(orderNumber)
               JOIN products p
               USING(productCode)""")
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print(len(df))
df.head()

2996


Unnamed: 0,firstName,lastName,productName
0,Leslie,Jennings,1958 Setra Bus
1,Leslie,Jennings,1940 Ford Pickup Truck
2,Leslie,Jennings,1939 Cadillac Limousine
3,Leslie,Jennings,1996 Peterbilt 379 Stake Bed with Outrigger
4,Leslie,Jennings,1968 Ford Mustang


## Level Up: Display the Number of Products each employee has sold

In [68]:
cur.execute("""SELECT firstName, lastName, COUNT(productName)
               FROM employees e
               JOIN customers c
               ON e.employeeNumber = c.salesRepEmployeeNumber
               JOIN orders o
               USING(customerNumber)
               JOIN orderdetails od
               USING(orderNumber)
               JOIN products p
               USING(productCode)
               GROUP BY lastName
               ORDER BY firstName""")
df = pd.DataFrame(cur.fetchall())
df.columns = [i[0] for i in cur.description]
print(len(df))
df

15


Unnamed: 0,firstName,lastName,COUNT(productName)
0,Andy,Fixter,185
1,Barry,Jones,220
2,Foon Yue,Tseng,142
3,George,Vanauf,211
4,Gerard,Hernandez,396
5,Julie,Firrelli,124
6,Larry,Bott,236
7,Leslie,Jennings,331
8,Leslie,Thompson,114
9,Loui,Bondur,177


## Summary

Congrats! You practiced using join statements and leveraged your foreign keys knowledge!