## Problem set 5: Databases with SQL and Python

## Auto dealership database

### Table: Employee

* id - Unique identifier
* name - First name 
* salary - Salary in €
* education - High School, Bachelor or Master
* department - Admin, Sales or Service

### Table: Car

* id - Unique identifier
* make - Car manufacturer, e.g., Toyota or BMW
* type - Type of car, e.g., Sedan or SUV
* year - Production year of the car
* color - Color of Car
* msrp - Manufacturer's suggested retail price in €
* employee id - Identifier from employee table, telling us which employee is responsible for the car  

### Table: Customer

* **id** - Unique identifier 
* **first_name** - First name
* **last_name** - Last name
* **phone** - Telephone number
* **birth_year** - Birth year

### Table: Sales

* **id** - Unique identifier
* **date** - Date of the sale
* **list_price** - The list price at the time of the sale
* **price** - The actual sales price
* **customer_id** - The customer id
* **car_id** - The car id
* **employee_id** - The employee id 

In [None]:
%load_ext isqlite3
%sql_open auto_dealership_database.db

## Exercise 1: Queries on a single table

1. Select the **make**, the **year** and the **color** for all the cars in the Car table
2. Select the **phone** number and **birth year** of the customer *Lanay Holt*
3. Select all the unique car **makes**, and sort them alphabetically 
4. Select all the customers with a **birth year** between 1989 and 1991
5. Select all the employees in the sales **department** ordered by their **salary** from high to low
6. Select all the sales where the car is sold for less than the **list price**

In [None]:
%%sql -Solution 1

SELECT make, year, color
FROM Cars
LIMIT 20

In [None]:
%%sql -Solution 2

SELECT phone, birth_year
FROM Customers
WHERE Customers.first_name = 'Lanay'
AND Customers.last_name ='Holt'

In [None]:
%%sql -Solution 3

SELECT DISTINCT make
FROM Cars
ORDER BY make

In [None]:
%%sql -Solution 4

SELECT *
FROM Customers
WHERE birth_year BETWEEN 1989 AND 1991
LIMIT 10

In [None]:
%%sql -Solution 5

SELECT *
FROM Employees
WHERE department = 'Sales'
ORDER BY Salary DESC
LIMIT 10

In [None]:
%%sql -Solution 6

SELECT *
FROM Sales
WHERE list_price > price
LIMIT 10

# Exercise 2: Queries involving several tables

1. Create a query that returns the sales **date**, sales **list price** and employee **name** for all cars that was sold for exactly 5000
2. Create a query that returns the distinct **departments** for the employees identified in the Sales table 
3. Create a query that returns all the information from the customer table for all the customers that buys a car for more than 170000

In [None]:
%%sql -Solution 1

SELECT Sales.date, Sales.list_price, Employees.name
FROM Sales, Employees
WHERE Sales.employee_id = Employees.id
AND Sales.price = 5000

In [None]:
%%sql -Solution 2

SELECT DISTINCT Employees.department
FROM Sales, Employees
WHERE Sales.employee_id = Employees.id

In [None]:
%%sql -Solution 3

SELECT Customers.*
FROM Customers, Sales
WHERE Sales.customer_id = Customers.id
AND Sales.price > 170000

# Exercises 3: Extracting information from the database

1. Who is the employee in the Admin **department** with the highest **salary**
2. What **type** of car is the Audi with the highest sales price?
3. Identify the SUV with the lowest **msrp**
4. How many in the sales **department** are categorized with a High School degree
5. How many cars are the employee Fredrik responsible for?

In [None]:
%%sql -Solution 1

SELECT *
FROM Employees 
WHERE department = 'Admin'
LIMIT 1

In [None]:
%%sql -Solution 2

SELECT Sales.price, Cars.type
FROM Sales, Cars
WHERE Sales.car_id = Cars.id
AND Cars.make = 'Audi'
ORDER BY Sales.price DESC
LIMIT 1

In [None]:
%%sql -Solution 3

SELECT *
From Cars
WHERE type = 'SUV'
ORDER BY msrp
LIMIT 1

In [None]:
%%sql -Solution 4

SELECT count(*)
FROM Employees
WHERE department = 'Sales'
AND education = 'High School'

In [None]:
%%sql -Solution 5

SELECT count(Cars.id)
FROM Cars, Employees
WHERE Cars.employee_id = Employees.id
AND Employees.name = 'Fredrik'

# Exercise 4: Python and SQL

1. Create a Pandas dataframe with all the information about the Hondas from the Cars table 
2. Create a Pandas dataframe with all the information about the employees in the service department
3. Plot the yearly mean (based on the sales date) sales **price** from all the sales registered in the database
4. Plot the yearly mean (based on the year of the car) **msrp** for all the cars with the make **Mercedes**  

In [None]:
import pandas as pd
import sqlite3

In [None]:
# Solution 1

con = sqlite3.connect('auto_dealership_database.db')
Hondas = pd.read_sql('SELECT * FROM Cars WHERE make = "Honda"', con)
con.close()
Hondas.head()

In [None]:
# Solution 2

con = sqlite3.connect('auto_dealership_database.db')
Service_epms = pd.read_sql('SELECT * FROM Employees WHERE department = "Service"', con)
con.close()
Service_epms.head()

In [None]:
# Solution 3

con = sqlite3.connect('auto_dealership_database.db')
Sales = pd.read_sql('SELECT * FROM Sales', con)
con.close()

SalesDay = Sales.groupby('date')['price'].mean()
SalesDay.index = pd.to_datetime(SalesDay.index)
SalesDay.resample('A').mean().plot(lw=2, label='Average sales price', legend=True)

In [None]:
#Solution 4

con = sqlite3.connect('auto_dealership_database.db')
Mercedes = pd.read_sql('SELECT * FROM Cars WHERE make = "Mercedes"', con)
con.close()

MercedesDay = Mercedes.groupby('year')['msrp'].mean()
MercedesDay.index = pd.to_datetime(MercedesDay.index)
MercedesDay.plot(lw=2, label='Average Mercedes msrp', legend=True)