# SQL Basics

## Course: Programming and Data Management (EDI 3400)

### *Vegard H. Larsen (Department of Data Science and Analytics)*

# 1. The Auto Dealership Database

## Let's look at the database in DB browser for SQLite

# 2. Running SQL-code in the Notebook

## An SQL-extension

- Download the file `isqlite3.py` from Ed 
- The file `isqlite3.py` must be stored in the same folder as the Notebook
- Developed by my colleague Jan Kudlicka
- We can now run SQL queries directly in notebook cells 

In [12]:
%load_ext isqlite3

The isqlite3 extension is already loaded. To reload it, use:
  %reload_ext isqlite3


In [13]:
%sql_open files/auto_dealership_database.db

Done!


## Employees table

In [14]:
%%sql

SELECT * FROM Employees LIMIT 10

id,name,salary,education,department
1,Erik,114100,Bachelor,Sales
2,Sue,116200,Bachelor,Admin
3,Linda,67200,High School,Admin
4,Anne,75900,Master,Service
5,Mary,89100,Bachelor,Service
6,Tom,95900,Bachelor,Sales
7,John,148200,Bachelor,Sales
8,Joe,148100,Master,Sales
9,Sofia,117100,Bachelor,Sales
10,Marie,79000,Bachelor,Admin


## Cars table

In [15]:
%%sql

SELECT * 
FROM Cars 
LIMIT 10

id,make,type,year,color,msrp,employee_id
1,BMW,Hatchback,2007,Silver,49400.0,19
2,Audi,SUV,2012,Red,58000.0,17
3,Toyota,SUV,2007,Silver,41400.0,4
4,Honda,Coupe,2012,Blue,41100.0,4
5,Audi,Sedan,2009,Red,78100.0,4
6,Mercedes,Estate,2007,Silver,99300.0,22
7,Volkswagen,Hatchback,2013,Blue,28700.0,26
8,Toyota,Sedan,2019,Red,42300.0,22
9,BMW,Sedan,2010,Black,55700.0,4
10,Audi,Sedan,2016,Silver,63700.0,26


## Sales table

In [16]:
%%sql

SELECT * 
FROM Sales 
LIMIT 10

id,date,list_price,price,customer_id,car_id,employee_id
1,2017-11-09,12500,12260.0,1555,7464,25
2,2010-05-05,15800,14930.0,2270,4358,7
3,2020-05-07,11800,11070.0,3481,1633,16
4,2020-09-22,30200,30110.0,3465,6161,1
5,2017-07-03,25200,24360.0,834,3811,16
6,2021-04-01,37600,37370.0,1541,4850,8
7,2019-11-29,10300,10230.0,1215,6784,1
8,2021-05-11,19300,19450.0,1914,1110,24
9,2019-11-11,20400,19470.0,711,7882,21
10,2021-07-22,31500,30150.0,1609,1202,16


## Customers table

In [25]:
%%sql

SELECT * 
FROM Customers
LIMIT 10

id,first_name,last_name,phone,birth_year
1,Geffery,Eaton,46000950,2011
2,Nira,Perry,98553493,2003
3,Moneisha,Perry,99054734,1998
4,Sigourney,Noble,24082013,1966
5,Lavita,Carlson,31488331,1984
6,Aide,Henry,18885778,1994
7,Annie,Owen,34520084,1993
8,Darik,Montgomery,93811698,1976
9,Thelma,Poole,92675192,2006
10,Slyvia,Roberson,58184531,1969


# 3. SQL-syntax

## The `SELECT` and `FROM` statements:

- SQL is not case sensitive (Python is case sensitive) and `select` and `SELECT` is treated the same 

`SELECT column_name` 

`FROM table_name`



In [17]:
%%sql

SELECT msrp
FROM Cars
LIMIT 5

msrp
49400.0
58000.0
41400.0
41100.0
78100.0


## The `WHERE` clause:

`SELECT column_name` 

`FROM table_name` 

`WHERE condition`

In [18]:
%%sql

SELECT id, price 
FROM Sales
WHERE price > 10000
LIMIT 5

id,price
1,12260.0
2,14930.0
3,11070.0
4,30110.0
5,24360.0


In [24]:
%%sql

Select Cars.id, Cars.msrp, Sales.price
FROM Cars, Sales
Where Cars.id = Sales.car_id
LIMIT 5

id,msrp,price
7464,38900.0,12260.0
4358,28700.0,14930.0
1633,32200.0,11070.0
6161,47300.0,30110.0
3811,45900.0,24360.0


## The `ORDER BY` statement:

`SELECT column_name`

`FROM table_name`

`ORDER BY column_name [ASC or DESC]`

In [21]:
%%sql

SELECT id, name, salary
FROM Employees
ORDER BY salary DESC
LIMIT 10

id,name,salary
21,Eli,176500
7,John,148200
8,Joe,148100
16,Karl,136100
9,Sofia,117100
14,Linda,116300
2,Sue,116200
1,Erik,114100
26,Fredrik,105700
25,Mary,104600


## `INNER JOIN`
`INNER JOIN` combines rows from two or more tables based on a related column between them. This type of join returns only the rows that have matching values in both tables. If there is no match, the rows will not appear in the result set.

In SQL, if you use JOIN without specifying the type of join, it defaults to INNER JOIN. Therefore, JOIN and INNER JOIN are functionally equivalent and will produce the same result set, which includes only the rows that have matching values in both tables being joined.


In [29]:
%%sql

SELECT Customers.first_name, Customers.last_name, Sales.date, Sales.price
FROM Customers
INNER JOIN Sales ON Customers.id = Sales.customer_id
WHERE Sales.price > 90000
LIMIT 10

first_name,last_name,date,price
Jorell,Weeks,2009-01-27,90300.0
Klarissa,Glover,2016-01-20,142570.0
Alica,Lin,2020-02-13,115770.0
Conley,Sharp,2006-02-23,90910.0
Kendi,Cortes,2006-12-07,109390.0
Hamzah,Bailey,2017-12-15,184440.0
Ashlely,Combs,2021-02-18,121570.0
Genesia,Stephenson,2012-06-18,95680.0
Candelaria,Ford,2021-09-09,121410.0
Taletha,Berg,2021-12-29,98020.0


In [33]:
%%sql

SELECT
    Customers.first_name,
    Customers.last_name,
    Cars.make,
    Cars.type,
    Cars.msrp,
    Sales.date,
    Sales.price
FROM Customers
JOIN Sales ON Customers.id = Sales.customer_id
JOIN Cars ON Sales.car_id = Cars.id
LIMIT 10


first_name,last_name,make,type,msrp,date,price
Mirian,Valentine,BMW,Sedan,38900.0,2017-11-09,12260.0
Dalyn,Ward,Ford,Hatchback,28700.0,2010-05-05,14930.0
Johua,Gibson,Toyota,Hatchback,32200.0,2020-05-07,11070.0
Daric,Cisneros,Audi,Estate,47300.0,2020-09-22,30110.0
Lucas,Stanley,Toyota,Sedan,45900.0,2017-07-03,24360.0
Jeanetta,Mcgee,Audi,Sedan,58700.0,2021-04-01,37370.0
Tosha,Bradshaw,Toyota,Coupe,47300.0,2019-11-29,10230.0
Camilla,Weiss,Toyota,SUV,48000.0,2021-05-11,19450.0
Melodi,Weeks,Chevrolet,Sedan,53800.0,2019-11-11,19470.0
Sheresa,Jimenez,Volkswagen,Sedan,38000.0,2021-07-22,30150.0


## Functions:
- `count` used to count rows 
- `sum` used to sum values
- `min` used to find the minimum value
- `max` used to find the maximum value
- `avg` used to calculate the mean

## GROUP BY
The `GROUP BY` statement is used to arrange identical data into groups with the help of aggregate functions like `count()`, `sum()`, `max()`, `min()`, `avg()`. When you use a `GROUP BY`` clause, you're instructing the SQL database to combine rows that have the same values in the specified columns into summary rows, like "find the number of customers in each country" or "calculate the total revenue from each product category."

Without `GROUP BY`, an aggregate function like `count()` would return a single value for the entire table or dataset.
With `GROUP BY`, you get the count for each group separately.

## Examples

### Count the number of sales each employee has made.

In [40]:
%%sql

SELECT employee_id, count(*) AS number_of_sales
FROM Sales
GROUP BY employee_id

employee_id,number_of_sales
1,566
6,593
7,575
8,626
9,611
11,567
12,659
13,614
14,678
16,591


### Find the minimum sale price of all sales.

In [38]:
%%sql

SELECT min(price) AS min_sale_price
FROM Sales

min_sale_price
730.0


### Find the maximum sale price of all sales.

In [39]:
%%sql

SELECT max(price) AS max_sale_price
FROM Sales

max_sale_price
222400.0


### Calculate the average salary of all employees.

In [41]:
%%sql 

SELECT AVG(salary) AS average_salary
FROM Employees

average_salary
91153.57142857143
