# SQL querying and selecting data

## Preparation

For this section you need `chinook.db` database file and working `%sql` magic.  
If you don't have it, please go back to the [previous section](connect_to_database.ipynb) and follow the instructions.  
The following code should not produce any errors:

In [22]:
%load_ext sql
%sql sqlite:///chinook.db

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


## `SELECT` - querying the database

### Selecting some/all columns and their order

Write the 1st query to retrieve some data of several (first five) customers from this database:
- after `SELECT` you write the name(s) of the column(s) that we want to retireve
- after `FROM` you write the name of the table
- the optional part `LIMIT` allows to specify how many rows to show

In [26]:
%%sql
SELECT FirstName, LastName
  FROM customers 
  LIMIT 7

FirstName,LastName
Luís,Gonçalves
Leonie,Köhler
François,Tremblay
Bjørn,Hansen
František,Wichterlová
Helena,Holý
Astrid,Gruber


We can retrieve all columns from the table using `*` instead of the column names:

In [27]:
%%sql
SELECT * 
  FROM customers 
  LIMIT 5

CustomerId,FirstName,LastName,Company,Address,City,State,Country,PostalCode,Phone,Fax,Email,SupportRepId
1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3
2,Leonie,Köhler,,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,+49 0711 2842222,,leonekohler@surfeu.de,5
3,François,Tremblay,,1498 rue Bélanger,Montréal,QC,Canada,H2G 1A7,+1 (514) 721-4711,,ftremblay@gmail.com,3
4,Bjørn,Hansen,,Ullevålsveien 14,Oslo,,Norway,0171,+47 22 44 22 22,,bjorn.hansen@yahoo.no,4
5,František,Wichterlová,JetBrains s.r.o.,Klanova 9/506,Prague,,Czech Republic,14700,+420 2 4172 5555,+420 2 4172 5555,frantisekw@jetbrains.com,4


The order of the names of the column define the order of columns in the table. It is also possible to directly perform arithmetic operations:

In [33]:
%%sql
SELECT Name, TrackId, UnitPrice + 10
  FROM tracks
  LIMIT 5

Name,TrackId,UnitPrice + 10
For Those About To Rock (We Salute You),1,10.99
Balls to the Wall,2,10.99
Fast As a Shark,3,10.99
Restless and Wild,4,10.99
Princess of the Dawn,5,10.99


### `LIMIT`- limiting the number of returned rows

Simple limit of the records returned from the query:

In [34]:
%%sql
SELECT TrackId, Name
  FROM tracks
  LIMIT 3

TrackId,Name
1,For Those About To Rock (We Salute You)
2,Balls to the Wall
3,Fast As a Shark


*Note:* if you want to get the first 10 rows starting from the 10th row of the result set, you use `OFFSET` keyword:

In [35]:
%%sql
SELECT TrackId, Name
  FROM tracks
  LIMIT 3 OFFSET 10

TrackId,Name
11,C.O.D.
12,Breaking The Rules
13,Night Of The Long Knives


### `AS` - renaming columns

To provide an own name to a column use the `AS` keyword. Put the new name in quotes:

In [36]:
%%sql
SELECT TrackId, Name, UnitPrice, UnitPrice + 10 AS 'NewUnitPrice'
  FROM tracks
  LIMIT 5

TrackId,Name,UnitPrice,NewUnitPrice
1,For Those About To Rock (We Salute You),0.99,10.99
2,Balls to the Wall,0.99,10.99
3,Fast As a Shark,0.99,10.99
4,Restless and Wild,0.99,10.99
5,Princess of the Dawn,0.99,10.99


### `ORDER` - sorting rows

With `ORDER BY` you define the sorting order. Additional keywords:
- The `ASC` keyword means ascending (default, when you don't specify).
- The `DESC` keyword means descending.

In [37]:
%%sql
SELECT Name, Milliseconds, AlbumId
  FROM tracks
  ORDER BY AlbumId DESC
  LIMIT 10

Name,Milliseconds,AlbumId
Koyaanisqatsi,206005,347
"Quintet for Horn, Violin, 2 Violas, and Cello in E Flat Major, K. 407/386c: III. Allegro",221331,346
"L'orfeo, Act 3, Sinfonia (Orchestra)",66639,345
"String Quartet No. 12 in C Minor, D. 703 ""Quartettsatz"": II. Andante - Allegro assai",139200,344
Pini Di Roma (Pinien Von Rom) \ I Pini Della Via Appia,286741,343
"Concerto for Violin, Strings and Continuo in G Major, Op. 3, No. 9: I. Allegro",493573,342
"Erlkonig, D.328",261849,341
"Étude 1, In C Major - Preludio (Presto) - Liszt",51780,340
"24 Caprices, Op. 1, No. 24, for Solo Violin, in A Minor",265541,339
"Symphony No. 2, Op. 16 - ""The Four Temperaments"": II. Allegro Comodo e Flemmatico",286998,338


### `DISTINCT` - select unique rows (remove duplicated rows)

With `DISTINCT` you force duplicate rows to be removed from the query result. Compare the following two queries:

In [38]:
%%sql 
SELECT City
  FROM customers
  LIMIT 10

City
São José dos Campos
Stuttgart
Montréal
Oslo
Prague
Prague
Vienne
Brussels
Copenhagen
São Paulo


In [39]:
%%sql 
SELECT DISTINCT City
  FROM customers
  LIMIT 10

City
São José dos Campos
Stuttgart
Montréal
Oslo
Prague
Vienne
Brussels
Copenhagen
São Paulo
Rio de Janeiro


### `WHERE` - selecting rows by a condition

#### Relational operators

Let's filter all tracks for which: `millisconds > 300000`:

In [40]:
%%sql
SELECT TrackId, Milliseconds
  FROM tracks
  WHERE Milliseconds > 300000
  LIMIT 5

TrackId,Milliseconds
1,343719
2,342562
5,375418
15,331180
17,366654


SQL uses the following relational operators: `>`, `>=`, `<`, `<=`, `=` (equality), `!=` or `<>` (both inequality).  
Let's find customers from Prague:

In [12]:
%%sql
SELECT FirstName, LastName, City 
  FROM customers
  WHERE City = 'Prague'

FirstName,LastName,City
František,Wichterlová,Prague
Helena,Holý,Prague


#### `OR`, `AND`, `NOT` - Logical operators

Understand the following examples of `OR`, `AND`, `NOT`:

In [41]:
%%sql
SELECT FirstName, Country
  FROM customers 
  WHERE Country = "Netherlands" OR Country = "Germany"
  LIMIT 5

FirstName,Country
Leonie,Germany
Hannah,Germany
Fynn,Germany
Niklas,Germany
Johannes,Netherlands


In [42]:
%%sql
SELECT FirstName, Country
  FROM customers 
  WHERE NOT( Country = "Netherlands" OR Country = "Germany" )
  LIMIT 5

FirstName,Country
Luís,Brazil
François,Canada
Bjørn,Norway
František,Czech Republic
Helena,Czech Republic


In [43]:
%%sql
SELECT *
  FROM invoice_items
  WHERE InvoiceId = 26 AND TrackId > 850

InvoiceLineId,InvoiceId,TrackId,UnitPrice,Quantity
143,26,858,0.99,1
144,26,867,0.99,1
145,26,876,0.99,1
146,26,885,0.99,1
147,26,894,0.99,1
148,26,903,0.99,1
149,26,912,0.99,1


#### `IS NULL` - Value is missing

The following statement attempts to find tracks whose composers are NULL: `IS NULL`.  

To find the tracks whose composers are not `NULL`, use: `IS NOT NULL`.

In [44]:
%%sql
SELECT Name, Composer
  FROM tracks
  WHERE Composer IS NULL
  LIMIT 5

Name,Composer
Balls to the Wall,
Desafinado,
Garota De Ipanema,
Samba De Uma Nota Só (One Note Samba),
Por Causa De Você,


#### `IN` - Set membership (for categorical variables)

Compare the following two notations to test whether a value belongs to a set.  
The `OR` notation works only with a fixed set of values and does not scale well:

In [45]:
%%sql
SELECT *
  FROM customers
  WHERE country = "Brazil" OR country = "Finland" OR country = "Poland" OR country = "Spain"

CustomerId,FirstName,LastName,Company,Address,City,State,Country,PostalCode,Phone,Fax,Email,SupportRepId
1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3
10,Eduardo,Martins,Woodstock Discos,"Rua Dr. Falcão Filho, 155",São Paulo,SP,Brazil,01007-010,+55 (11) 3033-5446,+55 (11) 3033-4564,eduardo@woodstock.com.br,4
11,Alexandre,Rocha,Banco do Brasil S.A.,"Av. Paulista, 2022",São Paulo,SP,Brazil,01310-200,+55 (11) 3055-3278,+55 (11) 3055-8131,alero@uol.com.br,5
12,Roberto,Almeida,Riotur,"Praça Pio X, 119",Rio de Janeiro,RJ,Brazil,20040-020,+55 (21) 2271-7000,+55 (21) 2271-7070,roberto.almeida@riotur.gov.br,3
13,Fernanda,Ramos,,Qe 7 Bloco G,Brasília,DF,Brazil,71020-677,+55 (61) 3363-5547,+55 (61) 3363-7855,fernadaramos4@uol.com.br,4
44,Terhi,Hämäläinen,,Porthaninkatu 9,Helsinki,,Finland,00530,+358 09 870 2000,,terhi.hamalainen@apple.fi,3
49,Stanisław,Wójcik,,Ordynacka 10,Warsaw,,Poland,00-358,+48 22 828 37 39,,stanisław.wójcik@wp.pl,4
50,Enrique,Muñoz,,C/ San Bernardo 85,Madrid,,Spain,28015,+34 914 454 454,,enrique_munoz@yahoo.es,5


The `IN` notation might use a directly written list but also a result of a subquery (not shown here).

In [46]:
%%sql
SELECT *
  FROM customers
  WHERE country IN ("Brazil", "Finland", "Poland", "Spain")

CustomerId,FirstName,LastName,Company,Address,City,State,Country,PostalCode,Phone,Fax,Email,SupportRepId
1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3
10,Eduardo,Martins,Woodstock Discos,"Rua Dr. Falcão Filho, 155",São Paulo,SP,Brazil,01007-010,+55 (11) 3033-5446,+55 (11) 3033-4564,eduardo@woodstock.com.br,4
11,Alexandre,Rocha,Banco do Brasil S.A.,"Av. Paulista, 2022",São Paulo,SP,Brazil,01310-200,+55 (11) 3055-3278,+55 (11) 3055-8131,alero@uol.com.br,5
12,Roberto,Almeida,Riotur,"Praça Pio X, 119",Rio de Janeiro,RJ,Brazil,20040-020,+55 (21) 2271-7000,+55 (21) 2271-7070,roberto.almeida@riotur.gov.br,3
13,Fernanda,Ramos,,Qe 7 Bloco G,Brasília,DF,Brazil,71020-677,+55 (61) 3363-5547,+55 (61) 3363-7855,fernadaramos4@uol.com.br,4
44,Terhi,Hämäläinen,,Porthaninkatu 9,Helsinki,,Finland,00530,+358 09 870 2000,,terhi.hamalainen@apple.fi,3
49,Stanisław,Wójcik,,Ordynacka 10,Warsaw,,Poland,00-358,+48 22 828 37 39,,stanisław.wójcik@wp.pl,4
50,Enrique,Muñoz,,C/ San Bernardo 85,Madrid,,Spain,28015,+34 914 454 454,,enrique_munoz@yahoo.es,5


#### `BETWEEN` - Value in range (for numerical variables)

Use `BETWEEN` (and `NOT BETWEEN`) to find whether a value is in (or out) a certain range.

How to find invoices whose invoice dates are from January 1 2010 and January 31 2010?

In [47]:
%%sql
SELECT InvoiceId, BillingAddress, InvoiceDate, Total
  FROM invoices
  WHERE InvoiceDate BETWEEN '2010-01-01' AND '2010-01-31'
  ORDER BY InvoiceDate

InvoiceId,BillingAddress,InvoiceDate,Total
84,"68, Rue Jouvence",2010-01-08 00:00:00,1.98
85,Erzsébet krt. 58.,2010-01-08 00:00:00,1.98
86,"Via Degli Scipioni, 43",2010-01-09 00:00:00,3.96
87,Celsiusg. 9,2010-01-10 00:00:00,6.94
88,"Calle Lira, 198",2010-01-13 00:00:00,17.91
89,"Rotenturmstraße 4, 1010 Innere Stadt",2010-01-18 00:00:00,18.86
90,801 W 4th Street,2010-01-26 00:00:00,0.99


#### `LIKE` - Value matches a pattern (for text variables)

Sometimes, you don’t know exactly the complete keyword that you want to query. For example, you may know that your most favorite song contains the word `elevator` but you don’t know exactly the name.

1) To find the tracks whose names start with the `Wild` string, you use the percent sign `%` wildcard at the end of the pattern.

2) To find the tracks whose names end with `Wild` word, you use `%` wildcard at the beginning of the pattern.

3) To find the tracks whose names contain the `Wild` literal string, you use `%` wildcard at the beginning and end of the pattern:

In [48]:
%%sql
SELECT TrackId, Name
  FROM tracks
  WHERE Name LIKE 'Wild%'

TrackId,Name
1245,Wildest Dreams
1973,Wild Side
2627,Wild Hearted Son
2633,Wild Flower
2944,Wild Honey


Get track name by exact number of charchters and finish by `y`:

In [49]:
%%sql
SELECT Trackid, Name
  FROM tracks
  WHERE Name LIKE '___y'

TrackId,Name
532,Baby
784,Lazy
843,Otay
948,Easy
