# Lecture 9-2

# Intro to SQL

## Week 9 Wednesday

## Miles Chen, PhD

# Do not try to set up your own SQL Server

There is a big difference between setting up a SQL server and learning how to write a few queries.

We will learn how to write a few queries.

We will not learn how to set up a server and design a database. Most likely, if you go into the workforce and they list SQL as a requirement, they need someone who can write queries. The person who sets up, designs, and maintains the server will be a database administrator.

# Learning more

Recommended exercises: <https://www.w3resource.com/sql-exercises/>

Another place to just practice SQL queries: <http://sqlfiddle.com>

# sqlalchemy

sqlalchemy allows us to connect and interact with databases from within Python

Most of your SQL experience will be with connecting to a database that already exists. Most data analysts / data scientists are not the data base administrator and this is not a database administration course.

You can download the chinook database file from the chinook database github:

https://github.com/lerocha/chinook-database/tree/master/ChinookDatabase/DataSources

In [1]:
import pandas as pd

In [2]:
from sqlalchemy import create_engine

In [3]:
# create_engine creates a connection to an existing database
# I have 'Chinook_sqlite.sqlite' downloaded into my folder, and python
# connects to this database
engine = create_engine('sqlite:///Chinook_Sqlite.sqlite')

In [4]:
from sqlalchemy import inspect
insp = inspect(engine) # creates an inspector

In [5]:
# Use the inspector to get table names
# Save the table names to a list: table_names
table_names = insp.get_table_names()

# Print the table names to the shell
print(table_names)

['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']


## Basics

SQL is generally not case sensitive. Convention, however, puts SQL commands in ALL-CAPS and then leaves column and variable names in the same case as they appear in the table.

semi-colons are not required to terminate SQL queries, but their usage is recommended.

### `SELECT`

`SELECT` is used to select variables from a given table. To select all columns, use `*`

### `FROM`

`FROM` specifies which table to select from.

Once you create the database engine with `sqlalchemy`, we can begin executing SQL queries by establishing a connection with the database.

In [6]:
from sqlalchemy import text

In [7]:
# Open engine connection
con = engine.connect()

# Perform query and store results in rs
# this will select all columns from the Album table
rs = con.execute(text('SELECT * FROM Album;'))

# Fetch all results of the query and save to DataFrame
df = pd.DataFrame(rs.fetchall())

# Close the connection to the engine
con.close()

# Print head of query results
print(df.head(10))
print(rs.keys())

   AlbumId                                  Title  ArtistId
0        1  For Those About To Rock We Salute You         1
1        2                      Balls to the Wall         2
2        3                      Restless and Wild         2
3        4                      Let There Be Rock         1
4        5                               Big Ones         3
5        6                     Jagged Little Pill         4
6        7                               Facelift         5
7        8                         Warner 25 Anos         6
8        9         Plays Metallica By Four Cellos         7
9       10                             Audioslave         8
RMKeyView(['AlbumId', 'Title', 'ArtistId'])


Instead of having to open and close the engine connection, we can use Python's with statement which will automatically open and close the connection for us

In [8]:
# We can write our SQL command across multiple lines
# enclosed in triple quotes
command = '''
SELECT FirstName, LastName, Title 
FROM Employee;
'''

# SELECT chooses the desired columns
# FROM indicates the table to query

with engine.connect() as con:
    rs = con.execute(text(command))
    df = pd.DataFrame(rs.fetchall())
    df.columns = rs.keys()

print(df)

  FirstName  LastName                Title
0    Andrew     Adams      General Manager
1     Nancy   Edwards        Sales Manager
2      Jane   Peacock  Sales Support Agent
3  Margaret      Park  Sales Support Agent
4     Steve   Johnson  Sales Support Agent
5   Michael  Mitchell           IT Manager
6    Robert      King             IT Staff
7     Laura  Callahan             IT Staff


Pandas offers functionality to directly query a SQL database using an existing engine

In [9]:
# we can use the same command as earlier:
query = text(command)
conn = engine.connect()
df = pd.read_sql_query(query, conn)
df

Unnamed: 0,FirstName,LastName,Title
0,Andrew,Adams,General Manager
1,Nancy,Edwards,Sales Manager
2,Jane,Peacock,Sales Support Agent
3,Margaret,Park,Sales Support Agent
4,Steve,Johnson,Sales Support Agent
5,Michael,Mitchell,IT Manager
6,Robert,King,IT Staff
7,Laura,Callahan,IT Staff


## `ORDER BY`

ORDER BY is SQL's version of sort

<https://www.w3schools.com/sql/sql_orderby.asp>

```
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;
```

In [10]:
command = '''
SELECT * 
FROM Employee 
ORDER BY Birthdate DESC;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,EmployeeId,LastName,FirstName,Title,ReportsTo,BirthDate,HireDate,Address,City,State,Country,PostalCode,Phone,Fax,Email
0,3,Peacock,Jane,Sales Support Agent,2.0,1973-08-29 00:00:00,2002-04-01 00:00:00,1111 6 Ave SW,Calgary,AB,Canada,T2P 5M5,+1 (403) 262-3443,+1 (403) 262-6712,jane@chinookcorp.com
1,6,Mitchell,Michael,IT Manager,1.0,1973-07-01 00:00:00,2003-10-17 00:00:00,5827 Bowness Road NW,Calgary,AB,Canada,T3B 0C5,+1 (403) 246-9887,+1 (403) 246-9899,michael@chinookcorp.com
2,7,King,Robert,IT Staff,6.0,1970-05-29 00:00:00,2004-01-02 00:00:00,590 Columbia Boulevard West,Lethbridge,AB,Canada,T1K 5N8,+1 (403) 456-9986,+1 (403) 456-8485,robert@chinookcorp.com
3,8,Callahan,Laura,IT Staff,6.0,1968-01-09 00:00:00,2004-03-04 00:00:00,923 7 ST NW,Lethbridge,AB,Canada,T1H 1Y8,+1 (403) 467-3351,+1 (403) 467-8772,laura@chinookcorp.com
4,5,Johnson,Steve,Sales Support Agent,2.0,1965-03-03 00:00:00,2003-10-17 00:00:00,7727B 41 Ave,Calgary,AB,Canada,T3B 1Y7,1 (780) 836-9987,1 (780) 836-9543,steve@chinookcorp.com
5,1,Adams,Andrew,General Manager,,1962-02-18 00:00:00,2002-08-14 00:00:00,11120 Jasper Ave NW,Edmonton,AB,Canada,T5K 2N1,+1 (780) 428-9482,+1 (780) 428-3457,andrew@chinookcorp.com
6,2,Edwards,Nancy,Sales Manager,1.0,1958-12-08 00:00:00,2002-05-01 00:00:00,825 8 Ave SW,Calgary,AB,Canada,T2P 2T3,+1 (403) 262-3443,+1 (403) 262-3322,nancy@chinookcorp.com
7,4,Park,Margaret,Sales Support Agent,2.0,1947-09-19 00:00:00,2003-05-03 00:00:00,683 10 Street SW,Calgary,AB,Canada,T2P 5G3,+1 (403) 263-4423,+1 (403) 263-4289,margaret@chinookcorp.com


## `WHERE`

Filter row selection with WHERE. (similar to using if as a boolean mask)

SQL uses single equal sign = for comparison

In [11]:
command = '''
SELECT * 
FROM Employee 
WHERE EmployeeId >= 6 AND Title = 'IT Staff'
ORDER BY BirthDate;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,EmployeeId,LastName,FirstName,Title,ReportsTo,BirthDate,HireDate,Address,City,State,Country,PostalCode,Phone,Fax,Email
0,8,Callahan,Laura,IT Staff,6,1968-01-09 00:00:00,2004-03-04 00:00:00,923 7 ST NW,Lethbridge,AB,Canada,T1H 1Y8,+1 (403) 467-3351,+1 (403) 467-8772,laura@chinookcorp.com
1,7,King,Robert,IT Staff,6,1970-05-29 00:00:00,2004-01-02 00:00:00,590 Columbia Boulevard West,Lethbridge,AB,Canada,T1K 5N8,+1 (403) 456-9986,+1 (403) 456-8485,robert@chinookcorp.com


### `JOIN` and `LIMIT`

We can look at data across multiple tables using a `JOIN`

`LIMIT` acts like "head()", and limits the number of entries it returns

In [12]:
command = '''
SELECT * 
FROM Album
LIMIT 10;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,AlbumId,Title,ArtistId
0,1,For Those About To Rock We Salute You,1
1,2,Balls to the Wall,2
2,3,Restless and Wild,2
3,4,Let There Be Rock,1
4,5,Big Ones,3
5,6,Jagged Little Pill,4
6,7,Facelift,5
7,8,Warner 25 Anos,6
8,9,Plays Metallica By Four Cellos,7
9,10,Audioslave,8


In [13]:
command = '''
SELECT * 
FROM Artist
LIMIT 10;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,ArtistId,Name
0,1,AC/DC
1,2,Accept
2,3,Aerosmith
3,4,Alanis Morissette
4,5,Alice In Chains
5,6,Antônio Carlos Jobim
6,7,Apocalyptica
7,8,Audioslave
8,9,BackBeat
9,10,Billy Cobham


`JOIN`

`INNER JOIN` is a specific type of join. It keeps only rows where the key exists in both tables. If one table is missing an entry that exists in the other table, the entry will not be returned.

When using a `JOIN`, specify the name of the table that is being joined and the columns used to match the rows. Columns are specified with dot notation. `TableName.ColumnName`

In [14]:
command = '''
SELECT * 
FROM Album
INNER JOIN Artist ON Album.ArtistId = Artist.ArtistId
LIMIT 10;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,AlbumId,Title,ArtistId,ArtistId.1,Name
0,1,For Those About To Rock We Salute You,1,1,AC/DC
1,2,Balls to the Wall,2,2,Accept
2,3,Restless and Wild,2,2,Accept
3,4,Let There Be Rock,1,1,AC/DC
4,5,Big Ones,3,3,Aerosmith
5,6,Jagged Little Pill,4,4,Alanis Morissette
6,7,Facelift,5,5,Alice In Chains
7,8,Warner 25 Anos,6,6,Antônio Carlos Jobim
8,9,Plays Metallica By Four Cellos,7,7,Apocalyptica
9,10,Audioslave,8,8,Audioslave


In [15]:
# you can rename columns using `AS`
command = '''
SELECT Title AS "Album Title", Name AS "Artist Name"
FROM Album
INNER JOIN Artist ON Album.ArtistId = Artist.ArtistId
LIMIT 10;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,Album Title,Artist Name
0,For Those About To Rock We Salute You,AC/DC
1,Balls to the Wall,Accept
2,Restless and Wild,Accept
3,Let There Be Rock,AC/DC
4,Big Ones,Aerosmith
5,Jagged Little Pill,Alanis Morissette
6,Facelift,Alice In Chains
7,Warner 25 Anos,Antônio Carlos Jobim
8,Plays Metallica By Four Cellos,Apocalyptica
9,Audioslave,Audioslave


`GROUP BY` can be used to create groups to help calculate summary values

`COUNT()` is one function that can be used to calculate summary values. Other summary functions include `SUM()` and `AVG()`

In [16]:
command = '''
SELECT Artist.ArtistId, Name, COUNT(AlbumId) AS album_count,
  AVG(AlbumId) AS avg_id, SUM(AlbumID) as sum
FROM Album
INNER JOIN Artist ON Album.ArtistId = Artist.ArtistId
GROUP BY Artist.ArtistId
LIMIT 10;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,ArtistId,Name,album_count,avg_id,sum
0,1,AC/DC,2,2.5,5
1,2,Accept,2,2.5,5
2,3,Aerosmith,1,5.0,5
3,4,Alanis Morissette,1,6.0,6
4,5,Alice In Chains,1,7.0,7
5,6,Antônio Carlos Jobim,2,21.0,42
6,7,Apocalyptica,1,9.0,9
7,8,Audioslave,3,97.333333,292
8,9,BackBeat,1,12.0,12
9,10,Billy Cobham,1,13.0,13


In [17]:
# for comparison with previous table
command = '''
SELECT * 
FROM Album
ORDER BY ArtistId
LIMIT 15;
'''

pd.read_sql_query(text(command), conn)

Unnamed: 0,AlbumId,Title,ArtistId
0,1,For Those About To Rock We Salute You,1
1,4,Let There Be Rock,1
2,2,Balls to the Wall,2
3,3,Restless and Wild,2
4,5,Big Ones,3
5,6,Jagged Little Pill,4
6,7,Facelift,5
7,8,Warner 25 Anos,6
8,34,Chill: Brazil (Disc 2),6
9,9,Plays Metallica By Four Cellos,7


In [18]:
# Conditionals on the Group By must be done with 'HAVING'
command = '''
SELECT Artist.ArtistId, Name, COUNT(AlbumId) AS album_count
FROM Album
INNER JOIN Artist ON Album.ArtistId = Artist.ArtistId
GROUP BY Artist.ArtistId
HAVING album_count > 8;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,ArtistId,Name,album_count
0,22,Led Zeppelin,14
1,50,Metallica,10
2,58,Deep Purple,11
3,90,Iron Maiden,21
4,150,U2,10


In [19]:
command = '''
SELECT ArtistId, ArtistId * 2 AS "magic number", Name
From Artist 
LIMIT 10;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,ArtistId,magic number,Name
0,1,2,AC/DC
1,2,4,Accept
2,3,6,Aerosmith
3,4,8,Alanis Morissette
4,5,10,Alice In Chains
5,6,12,Antônio Carlos Jobim
6,7,14,Apocalyptica
7,8,16,Audioslave
8,9,18,BackBeat
9,10,20,Billy Cobham


## Table previews

In [20]:
command = '''
SELECT * 
FROM Album
LIMIT 5;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,AlbumId,Title,ArtistId
0,1,For Those About To Rock We Salute You,1
1,2,Balls to the Wall,2
2,3,Restless and Wild,2
3,4,Let There Be Rock,1
4,5,Big Ones,3


In [21]:
command = '''
SELECT * 
FROM Artist
LIMIT 5;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,ArtistId,Name
0,1,AC/DC
1,2,Accept
2,3,Aerosmith
3,4,Alanis Morissette
4,5,Alice In Chains


In [22]:
command = '''
SELECT * 
FROM Invoice
LIMIT 5;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,InvoiceId,CustomerId,InvoiceDate,BillingAddress,BillingCity,BillingState,BillingCountry,BillingPostalCode,Total
0,1,2,2009-01-01 00:00:00,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,1.98
1,2,4,2009-01-02 00:00:00,Ullevålsveien 14,Oslo,,Norway,0171,3.96
2,3,8,2009-01-03 00:00:00,Grétrystraat 63,Brussels,,Belgium,1000,5.94
3,4,14,2009-01-06 00:00:00,8210 111 ST NW,Edmonton,AB,Canada,T6G 2C7,8.91
4,5,23,2009-01-11 00:00:00,69 Salem Street,Boston,MA,USA,2113,13.86


In [23]:
command = '''
SELECT * 
FROM InvoiceLine
LIMIT 7;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,InvoiceLineId,InvoiceId,TrackId,UnitPrice,Quantity
0,1,1,2,0.99,1
1,2,1,4,0.99,1
2,3,2,6,0.99,1
3,4,2,8,0.99,1
4,5,2,10,0.99,1
5,6,2,12,0.99,1
6,7,3,16,0.99,1


In [24]:
command = '''
SELECT * 
FROM Track
LIMIT 6;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,TrackId,Name,AlbumId,MediaTypeId,GenreId,Composer,Milliseconds,Bytes,UnitPrice
0,1,For Those About To Rock (We Salute You),1,1,1,"Angus Young, Malcolm Young, Brian Johnson",343719,11170334,0.99
1,2,Balls to the Wall,2,2,1,,342562,5510424,0.99
2,3,Fast As a Shark,3,2,1,"F. Baltes, S. Kaufman, U. Dirkscneider & W. Ho...",230619,3990994,0.99
3,4,Restless and Wild,3,2,1,"F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. D...",252051,4331779,0.99
4,5,Princess of the Dawn,3,2,1,Deaffy & R.A. Smith-Diesel,375418,6290521,0.99
5,6,Put The Finger On You,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",205662,6713451,0.99


In [25]:
command = '''
SELECT * 
FROM Customer
LIMIT 5;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,CustomerId,FirstName,LastName,Company,Address,City,State,Country,PostalCode,Phone,Fax,Email,SupportRepId
0,1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3
1,2,Leonie,Köhler,,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,+49 0711 2842222,,leonekohler@surfeu.de,5
2,3,François,Tremblay,,1498 rue Bélanger,Montréal,QC,Canada,H2G 1A7,+1 (514) 721-4711,,ftremblay@gmail.com,3
3,4,Bjørn,Hansen,,Ullevålsveien 14,Oslo,,Norway,0171,+47 22 44 22 22,,bjorn.hansen@yahoo.no,4
4,5,František,Wichterlová,JetBrains s.r.o.,Klanova 9/506,Prague,,Czech Republic,14700,+420 2 4172 5555,+420 2 4172 5555,frantisekw@jetbrains.com,4


# table aliases

Provide a query showing the Invoices of customers who are from Brazil. The resultant table should show the customer's full name, Invoice ID, Date of the invoice and billing country.

In [26]:
command = '''
SELECT c.FirstName, c.lastname, 
    i.invoiceid, i.invoicedate, i.billingcountry    -- selects the desired columns
FROM customer AS c                     -- provide an alias to the table, so we dont have to type the full name out
    JOIN invoice AS i
    ON c.customerid = i.customerid     -- this is how the tables are linked
WHERE c.country = 'Brazil'
LIMIT 20;                               -- limits how many rows we get back
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,FirstName,LastName,InvoiceId,InvoiceDate,BillingCountry
0,Luís,Gonçalves,98,2010-03-11 00:00:00,Brazil
1,Luís,Gonçalves,121,2010-06-13 00:00:00,Brazil
2,Luís,Gonçalves,143,2010-09-15 00:00:00,Brazil
3,Luís,Gonçalves,195,2011-05-06 00:00:00,Brazil
4,Luís,Gonçalves,316,2012-10-27 00:00:00,Brazil
5,Luís,Gonçalves,327,2012-12-07 00:00:00,Brazil
6,Luís,Gonçalves,382,2013-08-07 00:00:00,Brazil
7,Eduardo,Martins,25,2009-04-09 00:00:00,Brazil
8,Eduardo,Martins,154,2010-11-14 00:00:00,Brazil
9,Eduardo,Martins,177,2011-02-16 00:00:00,Brazil


## `DISTINCT` 

Provide a query showing a unique list of billing countries from the Invoice table.

In [27]:
command = '''
SELECT DISTINCT billingcountry 
FROM invoice;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,BillingCountry
0,Germany
1,Norway
2,Belgium
3,Canada
4,USA
5,France
6,Ireland
7,United Kingdom
8,Australia
9,Chile


## Joining three tables

The following query shows the invoices associated with each sales agent.

The invoice table has no information about employee. But each invoice has a customer and each customer has a support rep (employee). We connect the invoice table with the employee table by connecting them through the customer table

In [28]:
command = '''
SELECT e.firstname, e.lastname,   -- employee first and last name
       i.*   -- all columns from invoice table 

FROM invoice AS i
    JOIN customer AS c
    ON c.customerid = i.customerid

    JOIN employee AS e
    ON e.employeeid = c.supportrepid
    
ORDER BY e.employeeid;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,FirstName,LastName,InvoiceId,CustomerId,InvoiceDate,BillingAddress,BillingCity,BillingState,BillingCountry,BillingPostalCode,Total
0,Jane,Peacock,6,37,2009-01-19 00:00:00,Berger Straße 10,Frankfurt,,Germany,60316,0.99
1,Jane,Peacock,7,38,2009-02-01 00:00:00,Barbarossastraße 19,Berlin,,Germany,10779,1.98
2,Jane,Peacock,9,42,2009-02-02 00:00:00,"9, Place Louis Barthou",Bordeaux,,France,33000,3.96
3,Jane,Peacock,10,46,2009-02-03 00:00:00,3 Chatham Street,Dublin,Dublin,Ireland,,5.94
4,Jane,Peacock,11,52,2009-02-06 00:00:00,202 Hoxton Street,London,,United Kingdom,N1 5LH,8.91
...,...,...,...,...,...,...,...,...,...,...,...
407,Steve,Johnson,398,41,2013-10-21 00:00:00,"11, Place Bellecour",Lyon,,France,69002,0.99
408,Steve,Johnson,402,50,2013-11-05 00:00:00,C/ San Bernardo 85,Madrid,,Spain,28015,5.94
409,Steve,Johnson,404,6,2013-11-13 00:00:00,Rilská 3174/6,Prague,,Czech Republic,14300,25.86
410,Steve,Johnson,406,21,2013-12-04 00:00:00,801 W 4th Street,Reno,NV,USA,89503,1.98


Following query shows the Invoice Total, Customer name, Country and Sale Agent name for all invoices and customers.

In [29]:
command = '''
SELECT i.InvoiceId, i.total,
       e.firstname AS 'employee first', 
       e.lastname AS 'employee last', 
       c.firstname AS 'customer first', 
       c.lastname AS 'customer last', 
       c.country
FROM employee AS e
        JOIN customer AS c 
        ON e.employeeid = c.supportrepid
        JOIN invoice AS i 
        ON c.customerid = i.customerid;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,InvoiceId,Total,employee first,employee last,customer first,customer last,Country
0,1,1.98,Steve,Johnson,Leonie,Köhler,Germany
1,2,3.96,Margaret,Park,Bjørn,Hansen,Norway
2,3,5.94,Margaret,Park,Daan,Peeters,Belgium
3,4,8.91,Steve,Johnson,Mark,Philips,Canada
4,5,13.86,Margaret,Park,John,Gordon,USA
...,...,...,...,...,...,...,...
407,408,3.96,Steve,Johnson,Victor,Stevens,USA
408,409,5.94,Jane,Peacock,Robert,Brown,Canada
409,410,8.91,Margaret,Park,Madalena,Sampaio,Portugal
410,411,13.86,Jane,Peacock,Terhi,Hämäläinen,Finland


How many Invoices were there in 2011? What are the total sales for that year?

In [30]:
command = '''
SELECT invoiceId, InvoiceDate, total
FROM invoice as i
WHERE i.invoicedate BETWEEN datetime('2011-01-01') AND datetime('2011-12-31');
'''
pd.read_sql_query(text(command), conn)  # result has 83 rows

Unnamed: 0,InvoiceId,InvoiceDate,Total
0,167,2011-01-02 00:00:00,0.99
1,168,2011-01-15 00:00:00,1.98
2,169,2011-01-15 00:00:00,1.98
3,170,2011-01-16 00:00:00,3.96
4,171,2011-01-17 00:00:00,5.94
...,...,...,...
78,245,2011-12-22 00:00:00,1.98
79,246,2011-12-22 00:00:00,1.98
80,247,2011-12-23 00:00:00,3.96
81,248,2011-12-24 00:00:00,5.94


In [31]:
command = '''
SELECT count(i.invoiceId) as 'count',
    sum(i.total) as 'sum'
FROM invoice as i
WHERE i.invoicedate BETWEEN datetime('2011-01-01') AND datetime('2011-12-31');
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,count,sum
0,83,469.58


Count how many orders were made on each day

In [32]:
command = '''
SELECT i.InvoiceDate, count(i.invoiceId) as 'count'
FROM invoice as i
WHERE i.invoicedate BETWEEN datetime('2011-01-01') AND datetime('2011-12-31')
GROUP BY i.invoiceDate;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,InvoiceDate,count
0,2011-01-02 00:00:00,1
1,2011-01-15 00:00:00,2
2,2011-01-16 00:00:00,1
3,2011-01-17 00:00:00,1
4,2011-01-20 00:00:00,1
...,...,...
66,2011-12-09 00:00:00,1
67,2011-12-22 00:00:00,2
68,2011-12-23 00:00:00,1
69,2011-12-24 00:00:00,1


Looking at the InvoiceLine table, provide a query that COUNTs the number of line items for each Invoice.

In [33]:
command = '''
SELECT *
FROM invoiceline
LIMIT 10;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,InvoiceLineId,InvoiceId,TrackId,UnitPrice,Quantity
0,1,1,2,0.99,1
1,2,1,4,0.99,1
2,3,2,6,0.99,1
3,4,2,8,0.99,1
4,5,2,10,0.99,1
5,6,2,12,0.99,1
6,7,3,16,0.99,1
7,8,3,20,0.99,1
8,9,3,24,0.99,1
9,10,3,28,0.99,1


In [34]:
command = '''
SELECT invoiceid, count(invoicelineid) AS 'Count'
FROM invoiceline
GROUP BY invoiceid
ORDER BY Count DESC;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,InvoiceId,Count
0,5,14
1,12,14
2,19,14
3,26,14
4,33,14
...,...,...
407,384,1
408,391,1
409,398,1
410,405,1


Find the invoice with the maximum number of Invoiceline IDs most elegant please

CTE Common Table Expression - allows you to query tables that you created as intermediate steps

In [35]:
command = '''
WITH InvoiceCounts (id, count) 
AS 
(  -- an intermediate table that aggregates the invoicelineIDs 
   -- pretty much the exact same table we generated in previous step
    SELECT invoiceid, count(invoicelineid) AS 'Count'
    FROM invoiceline
    GROUP BY invoiceid
    ORDER BY Count DESC
)

SELECT MAX(count) as Max, MIN(count) as Min
FROM InvoiceCounts;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,Max,Min
0,14,1


In [36]:
command = '''
WITH InvoiceCounts (id, count) 
AS 
(  -- an intermediate table that aggregates the invoicelineIDs 
   -- pretty much the exact same table we generated in previous step
    SELECT invoiceid, count(invoicelineid) AS 'Count'
    FROM invoiceline
    GROUP BY invoiceid
    ORDER BY Count DESC
)

SELECT count, COUNT(id) as "HowMany"
FROM InvoiceCounts
GROUP BY count;
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,count,HowMany
0,1,59
1,2,117
2,4,59
3,6,59
4,9,59
5,14,59


Provide a query that includes the purchased track name AND artist name with each invoice line item.


In [37]:
command = '''
SELECT i.*, 
    t.name AS 'track', 
    ar.name AS 'artist'
FROM invoiceline AS i
        JOIN track AS t 
            ON i.trackid = t.trackid     -- i links to t
        JOIN album AS al 
            ON t.albumid = al.albumid    -- t links to al
        JOIN artist AS ar 
            ON al.artistid = ar.artistid;  -- al links to ar
'''
pd.read_sql_query(text(command), conn)

Unnamed: 0,InvoiceLineId,InvoiceId,TrackId,UnitPrice,Quantity,track,artist
0,1,1,2,0.99,1,Balls to the Wall,Accept
1,2,1,4,0.99,1,Restless and Wild,Accept
2,3,2,6,0.99,1,Put The Finger On You,AC/DC
3,4,2,8,0.99,1,Inject The Venom,AC/DC
4,5,2,10,0.99,1,Evil Walks,AC/DC
...,...,...,...,...,...,...,...
2235,2236,411,3136,0.99,1,Looking For Love,Lenny Kravitz
2236,2237,411,3145,0.99,1,Sweet Lady Luck,Lenny Kravitz
2237,2238,411,3154,0.99,1,Feirinha da Pavuna/Luz do Repente/Bagaço da La...,Zeca Pagodinho
2238,2239,411,3163,0.99,1,Samba pras moças,Zeca Pagodinho


In [38]:
# Look up 
# differences betwen LEFT AND RIGHT JOINS
# https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/

One table is of products
Another table is of sales
assume productid is the link

how do find all the products that do not exist in the sales table?

product_table AS p LEFT OUTER JOIN sales_table AS s
 ON p.productid = s.productid
 WHERE s.productid IS NULL