# Answering Business Questions using SQL

As part of this guided project, we will be using a database named `chinook.db`, which contains tables relating to a store named Chinook that sells digital music, akin to a mini iTunes store. To get a better view of the tables in the database and the relation between the different tables, the following database schema is provided:

![Image](https://s3.amazonaws.com/dq-content/191/chinook-schema.svg)

# Connecting the Database to Jupyter Notebook

In [1]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db

'Connected: None@chinook.db'

## Familiarizing ourselves with the database

In [2]:
%%sql
SELECT *
FROM sqlite_master
WHERE type IN ('table', 'view');

Done.


type,name,tbl_name,rootpage,sql
table,album,album,2,"CREATE TABLE [album] (  [album_id] INTEGER PRIMARY KEY NOT NULL,  [title] NVARCHAR(160) NOT NULL,  [artist_id] INTEGER NOT NULL,  FOREIGN KEY ([artist_id]) REFERENCES [artist] ([artist_id]) ON DELETE NO ACTION ON UPDATE NO ACTION )"
table,artist,artist,3,"CREATE TABLE [artist] (  [artist_id] INTEGER PRIMARY KEY NOT NULL,  [name] NVARCHAR(120) )"
table,customer,customer,4,"CREATE TABLE [customer] (  [customer_id] INTEGER PRIMARY KEY NOT NULL,  [first_name] NVARCHAR(40) NOT NULL,  [last_name] NVARCHAR(20) NOT NULL,  [company] NVARCHAR(80),  [address] NVARCHAR(70),  [city] NVARCHAR(40),  [state] NVARCHAR(40),  [country] NVARCHAR(40),  [postal_code] NVARCHAR(10),  [phone] NVARCHAR(24),  [fax] NVARCHAR(24),  [email] NVARCHAR(60) NOT NULL,  [support_rep_id] INTEGER,  FOREIGN KEY ([support_rep_id]) REFERENCES [employee] ([employee_id]) ON DELETE NO ACTION ON UPDATE NO ACTION )"
table,employee,employee,5,"CREATE TABLE [employee] (  [employee_id] INTEGER PRIMARY KEY NOT NULL,  [last_name] NVARCHAR(20) NOT NULL,  [first_name] NVARCHAR(20) NOT NULL,  [title] NVARCHAR(30),  [reports_to] INTEGER,  [birthdate] DATETIME,  [hire_date] DATETIME,  [address] NVARCHAR(70),  [city] NVARCHAR(40),  [state] NVARCHAR(40),  [country] NVARCHAR(40),  [postal_code] NVARCHAR(10),  [phone] NVARCHAR(24),  [fax] NVARCHAR(24),  [email] NVARCHAR(60),  FOREIGN KEY ([reports_to]) REFERENCES [employee] ([employee_id]) ON DELETE NO ACTION ON UPDATE NO ACTION )"
table,genre,genre,6,"CREATE TABLE [genre] (  [genre_id] INTEGER PRIMARY KEY NOT NULL,  [name] NVARCHAR(120) )"
table,invoice,invoice,7,"CREATE TABLE [invoice] (  [invoice_id] INTEGER PRIMARY KEY NOT NULL,  [customer_id] INTEGER NOT NULL,  [invoice_date] DATETIME NOT NULL,  [billing_address] NVARCHAR(70),  [billing_city] NVARCHAR(40),  [billing_state] NVARCHAR(40),  [billing_country] NVARCHAR(40),  [billing_postal_code] NVARCHAR(10),  [total] NUMERIC(10,2) NOT NULL,  FOREIGN KEY ([customer_id]) REFERENCES [customer] ([customer_id]) ON DELETE NO ACTION ON UPDATE NO ACTION )"
table,invoice_line,invoice_line,8,"CREATE TABLE [invoice_line] (  [invoice_line_id] INTEGER PRIMARY KEY NOT NULL,  [invoice_id] INTEGER NOT NULL,  [track_id] INTEGER NOT NULL,  [unit_price] NUMERIC(10,2) NOT NULL,  [quantity] INTEGER NOT NULL,  FOREIGN KEY ([invoice_id]) REFERENCES [invoice] ([invoice_id]) ON DELETE NO ACTION ON UPDATE NO ACTION,  FOREIGN KEY ([track_id]) REFERENCES [track] ([track_id]) ON DELETE NO ACTION ON UPDATE NO ACTION )"
table,media_type,media_type,9,"CREATE TABLE [media_type] (  [media_type_id] INTEGER PRIMARY KEY NOT NULL,  [name] NVARCHAR(120) )"
table,playlist,playlist,10,"CREATE TABLE [playlist] (  [playlist_id] INTEGER PRIMARY KEY NOT NULL,  [name] NVARCHAR(120) )"
table,playlist_track,playlist_track,11,"CREATE TABLE [playlist_track] (  [playlist_id] INTEGER NOT NULL,  [track_id] INTEGER NOT NULL,  CONSTRAINT [pk_playlist_track] PRIMARY KEY ([playlist_id], [track_id]),  FOREIGN KEY ([playlist_id]) REFERENCES [playlist] ([playlist_id]) ON DELETE NO ACTION ON UPDATE NO ACTION,  FOREIGN KEY ([track_id]) REFERENCES [track] ([track_id]) ON DELETE NO ACTION ON UPDATE NO ACTION )"


# Selecting Albums to Purchase

The Chinook record store has just signed a deal with a new record label and a list of four albums is provided, of which the store intends to select three out of the four to be added to the store. All four albums are by artists that do not have any tracks in the store right now but the artist names and genres of music are provided:

Artist Name|Genre
-----|-----
Regal|Hip-Hop
Red Tone|Punk
Meteor and the Girls|Pop
Slim Jim Bites|Blues

The record label specializes in artists from the USA, and they have given Chinook some money to advertise the new albums in the USA, so we are interested in finding out which genres sell the best in the USA.

In [3]:
%%sql

WITH
    usa AS
        (
        SELECT
            il.track_id,
            il.quantity
        FROM invoice_line il
        LEFT JOIN invoice i ON i.invoice_id = il.invoice_id
        WHERE i.billing_country = 'USA'
        ),
    usa_genre AS
        (
        SELECT
            u.track_id,
            u.quantity,
            g.name
        FROM usa u
        LEFT JOIN track t ON t.track_id = u.track_id
        LEFT JOIN genre g ON t.genre_id = g.genre_id
        ),
    total_track_sold AS
        (
        SELECT CAST(SUM(quantity) AS FLOAT) FROM usa
        )
        
SELECT
    name Genre,
    SUM(quantity) no_tracks_sold,
    ROUND(SUM(quantity)/(
                   SELECT * FROM total_track_sold
                  )*100, 2) pct_tracks_sold
FROM usa_genre
GROUP BY 1
ORDER BY 2 DESC, 3 DESC;

Done.


Genre,no_tracks_sold,pct_tracks_sold
Rock,561,53.38
Alternative & Punk,130,12.37
Metal,124,11.8
R&B/Soul,53,5.04
Blues,36,3.43
Alternative,35,3.33
Latin,22,2.09
Pop,22,2.09
Hip Hop/Rap,20,1.9
Jazz,14,1.33


From the table below, we can see that the top 3 genres (based on absolute number of tracks sold and percentage of total tracks sold in USA) that coincide with the 4 albums' genres listed are:
1. Punk
2. Blues
3. Pop

As such, I would recommend the following three artists whose albums that Chinook should purchase for the store, based on the sales of tracks from their genres:

Artist Name|Genre|Rank from Table
-----|-----
Red Tone|Punk|2
Slim Jim Bites|Blues|5
Meteor and the Girls|Pop|8



# Analyzing Employee Sales Performance

Each customer gets assigned to a sales support agent within Chinook upon their first purchase. To determine if any sales support agent is performing better or worse than the others, the purchases of customers belonging to each employee has to be analyzed.

In [4]:
%%sql
WITH
    saa AS
        (
        SELECT
            employee_id,
            first_name || ' ' || last_name employee_name,
            birthdate,
            hire_date,
            country
        FROM employee
        WHERE title = 'Sales Support Agent'
        ),
    sales_total AS
        (
        SELECT
            c.customer_id,
            c.support_rep_id,
            SUM(i.total) total_dollar_sales
        FROM invoice i
        LEFT JOIN customer c ON c.customer_id = i.customer_id
        GROUP BY i.customer_id
        )

SELECT
    saa.*,
    SUM(sales_total.total_dollar_sales) total_dollar_sales
FROM saa
LEFT JOIN sales_total ON sales_total.support_rep_id = saa.employee_id
GROUP BY 1
ORDER BY 5 DESC;

Done.


employee_id,employee_name,birthdate,hire_date,country,total_dollar_sales
3,Jane Peacock,1973-08-29 00:00:00,2017-04-01 00:00:00,Canada,1731.51
4,Margaret Park,1947-09-19 00:00:00,2017-05-03 00:00:00,Canada,1584.0
5,Steve Johnson,1965-03-03 00:00:00,2017-10-17 00:00:00,Canada,1393.92


From the above results, we can see that Jane Peacock is the best performing sales support agent. On the other hand, Steve Johnson is the worst performing sales support agent. Looking at the attributes of the two agents, a possible interpretation is that older and recent hires tend to perform, on average, worse than the others.

# Analyzing Sales by Country

In a bid to analyze the sales data for customers from each different country, the `country` value from the `customers` table is employed instead of the `billing_country` from the `invoice` table. For countries with only one customer, these customers are lumped into a single group "Other" for ease of analysis.

In [5]:
%%sql
WITH
    sales AS
        (
        SELECT
            i.customer_id,
            SUM(i.total) total_sales,
            COUNT(i.invoice_id) no_order,
            c.country
        FROM invoice i
        LEFT JOIN customer c ON c.customer_id = i.customer_id
        GROUP BY 1
        ),
    countries AS
        (
        SELECT
            CASE
                WHEN COUNT(customer_id) = 1 THEN 'Other'
                ELSE country
                END country,
            COUNT(customer_id) num_customers,
            SUM(total_sales) total_sales,
            SUM(no_order) no_orders
        FROM sales
        GROUP BY country
        ),
    grouped AS
        (
        SELECT
            country,
            SUM(num_customers) num_customers,
            SUM(total_sales) total_sales,
            SUM(no_orders) no_orders
        FROM countries
        GROUP BY country
        )

SELECT
    country,
    num_customers total_num_customers,
    ROUND(total_sales, 2) total_sales,
    ROUND(total_sales/num_customers,2) avg_sales_per_customer,
    ROUND(total_sales/no_orders,2) avg_order_value
FROM
    (
    SELECT
        *,
        CASE
            WHEN country = 'Other' THEN 1
            ELSE 0
            END sort
    FROM grouped
    )
ORDER BY sort, total_sales DESC;

Done.


country,total_num_customers,total_sales,avg_sales_per_customer,avg_order_value
USA,13,1040.49,80.04,7.94
Canada,8,535.59,66.95,7.05
Brazil,5,427.68,85.54,7.01
France,5,389.07,77.81,7.78
Germany,4,334.62,83.66,8.16
Czech Republic,2,273.24,136.62,9.11
United Kingdom,3,245.52,81.84,8.77
Portugal,2,185.13,92.56,6.38
India,2,183.15,91.57,8.72
Other,15,1094.94,73.0,7.45


From the results, it appears that most of the sales come from USA and Canada, in terms of the total sales amount. However, looking at the average sales per customer and the average order value, it seems like customers from the Czech Republic are more willing to spend despite the low number of customers. Collectively, the Other group, which was created to lump all countries with only 1 customer, actually has the highest total sales and the highest total number of customers.

# Analyzing Album vs Individual Track Purchases

In [6]:
%%sql

WITH invoice_first_track AS
    (
     SELECT
         il.invoice_id invoice_id,
         MIN(il.track_id) first_track_id
     FROM invoice_line il
     GROUP BY 1
    )

SELECT
    album_purchase,
    COUNT(invoice_id) number_of_invoices,
    CAST(count(invoice_id) AS FLOAT) / (
                                         SELECT COUNT(*) FROM invoice
                                      ) percent
FROM
    (
    SELECT
        ifs.*,
        CASE
            WHEN
                 (
                  SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 

                  EXCEPT 

                  SELECT il2.track_id FROM invoice_line il2
                  WHERE il2.invoice_id = ifs.invoice_id
                 ) IS NULL
             AND
                 (
                  SELECT il2.track_id FROM invoice_line il2
                  WHERE il2.invoice_id = ifs.invoice_id

                  EXCEPT 

                  SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 
                 ) IS NULL
             THEN "yes"
             ELSE "no"
         END AS "album_purchase"
     FROM invoice_first_track ifs
    )
GROUP BY album_purchase;

Done.


album_purchase,number_of_invoices,percent
no,500,0.8143322475570033
yes,114,0.1856677524429967


Album purchases account for 18.6% of purchases. Based on this data, I would recommend against purchasing only select tracks from albums from record companies, since there is potential to lose one fifth of revenue.