# Exercise: Analyzing Chinook Database

Preparation I've done:
 - Retrieve the dataset and load it
 - Load the %sql extension and point it at the database
 - Display the tables and an example query

Additional steps you might take:
 - Add libraries for visualization (matplotlib, seaborn, plotly)
 - Add libraries for statistics (numpy)
 - Explore the dataset using SQL and/or pandas

----

1. Retrieve a list of all the tracks in the database, displaying only the track name and the name of the album it belongs to. Limit the result to the first 5 rows.
   > Operations: `SELECT`
2. Find the total number of customers from each country. Display the country name and the corresponding count. Order the results by the count in descending order.
   > Operations: `SELECT`, `COUNT`, `GROUP BY`, `ORDER BY`
3. Identify the top 5 genres with the highest number of tracks. Display the genre name along with the total number of tracks for each genre.
   > Operations: `SELECT`, `COUNT`, `GROUP BY`, `ORDER BY`
4. Determine the average invoice total for each customer, considering both the album and individual track purchases. Display the customer's first and last name along with the average invoice total. Order the results by the average invoice total in descending order.
   > Operations: `SELECT`, `AVG`, `JOIN`, `GROUP BY`, `ORDER BY`
5. Identify the customer who spent the most on music purchases. Display the customer's first and last name, along with the total amount spent.
   > Operations: `SELECT`, `SUM`, `JOIN`, `GROUP BY`, `ORDER BY`, `LIMIT`

In [1]:
# Make sure to install remove `jupysql` and re-install `ipython-sql` to avoid conflicts
%pip uninstall jupysql ipython-sql -yq
%pip install -q pandas ipython-sql

# Reset the kernel, so the installed packages are reloaded
%reset -f

[0mNote: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


<div class="alert alert-block alert-info">
⚠️ Warning: Restart the kernel here!
</div>

In [14]:
# Load chinook dataset and query it using SQL magic into pandas dataframes
import pandas as pd
import sqlite3

%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.displaycon = False

# Load data
conn = sqlite3.connect("chinook.sqlite")

# Tell %sql about the database
%sql sqlite:///chinook.sqlite

# List tables in database
tables = %sql SELECT name FROM sqlite_master WHERE type='table'

# Remove SQLite internal tables from the list (tables with 'sqlite' in the name)
tables = tables[~tables.name.str.contains("sqlite")]

# Print head
display(tables)


# Query to get the first 5 rows of the `albums` table
result = %sql SELECT * FROM albums LIMIT 5;

# Display query result, note that Pandas DataFrame is returned!
display(result)


The sql extension is already loaded. To reload it, use:
  %reload_ext sql
Done.


Unnamed: 0,name
0,albums
2,artists
3,customers
4,employees
5,genres
6,invoices
7,invoice_items
8,media_types
9,playlists
10,playlist_track


Done.


Unnamed: 0,AlbumId,Title,ArtistId
0,1,For Those About To Rock We Salute You,1
1,2,Balls to the Wall,2
2,3,Restless and Wild,2
3,4,Let There Be Rock,1
4,5,Big Ones,3


In [22]:
# 1. Retrieve a list of all the tracks in the database, displaying only the track name and the -name- ID of the album it belongs to. Limit the result to the first 5 rows.

# Query
result = %sql SELECT name AS "Track Name" \
                   , albumid AS "Album Title" \
                FROM tracks \
               LIMIT 5;

# Display query result
display(result)


Done.


Unnamed: 0,Track Name,Album Title
0,For Those About To Rock (We Salute You),1
1,Balls to the Wall,2
2,Fast As a Shark,3
3,Restless and Wild,3
4,Princess of the Dawn,3


In [23]:
# 2. Find the total number of customers from each country. Display the country name and the corresponding count. Order the results by the count in descending order

# Query
result = %sql SELECT country AS "Country" \
                   , COUNT(*) AS "Number of Customers" \
                FROM customers \
            GROUP BY country \
            ORDER BY COUNT(*) DESC

# Display query result
display(result.head())

Done.


Unnamed: 0,Country,Number of Customers
0,USA,13
1,Canada,8
2,France,5
3,Brazil,5
4,Germany,4


In [21]:
# 3. Identify the top 5 genres with the highest number of tracks. Display the genre name along with the total number of tracks for each genre

# Query
result = %sql SELECT g.name AS "Genre" \
                   , COUNT(t.trackid) AS "Number of Tracks" \
                FROM genres g \
                JOIN tracks t ON g.genreid = t.genreid \
            GROUP BY g.name \
            ORDER BY COUNT(t.trackid) DESC \
               LIMIT 5

# Display query result
display(result)

Done.


Unnamed: 0,Genre,Number of Tracks
0,Rock,1297
1,Latin,579
2,Metal,374
3,Alternative & Punk,332
4,Jazz,130


In [24]:
# 4. Determine the average invoice total for each customer, considering both the album and individual track purchases. Display the customer's first and last name along with the average invoice total. Order the results by the average invoice total in descending orde

# Query
result = %sql SELECT c.firstname AS "First Name" \
                   , c.lastname AS "Last Name" \
                   , AVG(i.total) AS "Average Invoice Total" \
                FROM customers c \
                JOIN invoices i ON c.customerid = i.customerid \
            GROUP BY c.customerid \
            ORDER BY AVG(i.total) DESC

# Display query result
display(result.head())

Done.


Unnamed: 0,First Name,Last Name,Average Invoice Total
0,Helena,Holý,7.088571
1,Richard,Cunningham,6.802857
2,Luis,Rojas,6.66
3,Ladislav,Kovács,6.517143
4,Hugh,O'Reilly,6.517143


In [25]:
# 5. Identify the customer who spent the most on music purchases. Display the customer's first and last name, along with the total amount spent.

# Query
result = %sql SELECT c.firstname AS "First Name" \
                   , c.lastname AS "Last Name" \
                   , SUM(i.total) AS "Total Amount Spent" \
                FROM customers c \
                JOIN invoices i ON c.customerid = i.customerid \
            GROUP BY c.customerid \
            ORDER BY SUM(i.total) DESC \
               LIMIT 1

# Display query result
display(result)


Done.


Unnamed: 0,First Name,Last Name,Total Amount Spent
0,Helena,Holý,49.62
