# SQL querying and selecting data (exercises)

## Preparation

For this section you need `chinook.db` database file and working `%sql` magic.  
If you don't have it, please go back to the [previous section](connect_to_database.ipynb) and follow the instructions.  
The following code should not produce any errors:

In [1]:
%load_ext sql
%sql sqlite:///chinook.db

## Exercise: biggest tracks

Print (select) the top 10 biggest `tracks` according to size in `Bytes` column.

In [2]:
%%sql
SELECT TrackId, Bytes 
    FROM tracks 
    ORDER BY Bytes DESC
    LIMIT 10 

TrackId,Bytes
3224,1059546140
2820,1054423946
3236,587051735
3242,577829804
2910,574325829
3235,570152232
3231,558872190
2902,555244214
3228,554509033
2832,552893447


## Exercise: simple filtering

Write statements to get `tracks` with: the `AlbumId` equal to `1` and the `Bytes` length greater than 200,000 milliseconds.

In [4]:
%%sql
SELECT * 
FROM tracks 
WHERE AlbumID = 1 AND Bytes > 200000
LIMIT 4

TrackId,Name,AlbumId,MediaTypeId,GenreId,Composer,Milliseconds,Bytes,UnitPrice
1,For Those About To Rock (We Salute You),1,1,1,"Angus Young, Malcolm Young, Brian Johnson",343719,11170334,0.99
6,Put The Finger On You,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",205662,6713451,0.99
7,Let's Get It Up,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",233926,7636561,0.99
8,Inject The Venom,1,1,1,"Angus Young, Malcolm Young, Brian Johnson",210834,6852860,0.99


## Exercise: filter with `IN`

Return `customers` from `State` of `FL` (Florida), `WA` (Washington), `CA` (California).  
Use `IN`, not `AND`.

In [5]:
%%sql 
SELECT * 
    FROM customers 
    WHERE State IN ("FL", "WA", "CA")

CustomerId,FirstName,LastName,Company,Address,City,State,Country,PostalCode,Phone,Fax,Email,SupportRepId
16,Frank,Harris,Google Inc.,1600 Amphitheatre Parkway,Mountain View,CA,USA,94043-1351,+1 (650) 253-0000,+1 (650) 253-0000,fharris@google.com,4
17,Jack,Smith,Microsoft Corporation,1 Microsoft Way,Redmond,WA,USA,98052-8300,+1 (425) 882-8080,+1 (425) 882-8081,jacksmith@microsoft.com,5
19,Tim,Goyer,Apple Inc.,1 Infinite Loop,Cupertino,CA,USA,95014,+1 (408) 996-1010,+1 (408) 996-1011,tgoyer@apple.com,3
20,Dan,Miller,,541 Del Medio Avenue,Mountain View,CA,USA,94040-111,+1 (650) 644-3358,,dmiller@comcast.com,4
22,Heather,Leacock,,120 S Orange Ave,Orlando,FL,USA,32801,+1 (407) 999-7788,,hleacock@gmail.com,4


## Exercise: filter for numbers in range

Find `invoices` whose `Total` is between 14.96 and 18.86. Use `BETWEEN`.  
Sort the output with increasing `Total`. Show only these columns: `InvoiceId`, `BillingAddress`, `Total`.

In [10]:
%%sql
SELECT InvoiceId, BillingAddress, Total 
    FROM invoices 
    WHERE Total BETWEEN 14.96 AND 18.86 
    ORDER BY Total ASC 

InvoiceId,BillingAddress,Total
103,162 E Superior Street,15.86
208,Ullevålsveien 14,15.86
306,Klanova 9/506,16.86
313,"68, Rue Jouvence",16.86
88,"Calle Lira, 198",17.91
89,"Rotenturmstraße 4, 1010 Innere Stadt",18.86
201,319 N. Frances Street,18.86


## Exercise: filter partially matching words

Find the `tracks` whose `Name`s contain a substring: `Br` (two letters), one letter, `wn` (two letters).

## Exercise: filtering missing values

Find the `customers` who do not have phone numbers. In the result show only the name and the (missing) phone number.

## Exercise: from the database to a Python list

Create a Python variable `bs` to be a list containing all `tracks` sizes as provided in the `Bytes` column.  
Print the `type` of the `bs` variable. Print the first 10 elements of `bs`.

In [25]:
b = %sql SELECT Bytes FROM tracks
bs = [row[0] for row in b]
type(bs)
print(bs[1:10])

[5510424, 3990994, 4331779, 6290521, 6713451, 7636561, 6852860, 6599424, 8611245]


## Exercise: from the database to a Pandas data frame

Create a Python variable `df` to be a Pandas `DataFrame` with two columns corresponding to `Milliseconds` and `Bytes` columns of the `tracks` table. Print `df`.  
You will likely need to:
- Import `pandas` package.
- Use `read_sql` function from `pandas`.
- Create a separate connection `engine` with `creeate_engine`.

In [35]:
import pandas as pd 
result = %sql SELECT Milliseconds, Bytes FROM tracks 
df = pd.DataFrame(result)
df


Unnamed: 0,Milliseconds,Bytes
0,343719,11170334
1,342562,5510424
2,230619,3990994
3,252051,4331779
4,375418,6290521
...,...,...
3498,286741,4718950
3499,139200,2283131
3500,66639,1189062
3501,221331,3665114
