# Business Questions & SQL

The goal in this project is to try and answer typical business questions by exploring a relational database with SQL.

## Connect to database

In [1]:
%%capture
%load_ext sql
%sql sqlite:///chinook.db

'Connected: None@chinook.db'

## Database Overview

In [3]:
%%sql
SELECT
    name,
    type
FROM sqlite_master
WHERE type IN ("table","view");

Done.


name,type
album,table
artist,table
customer,table
employee,table
genre,table
invoice,table
invoice_line,table
media_type,table
playlist,table
playlist_track,table


In [10]:
%%sql
PRAGMA table_info(customer);

Done.


cid,name,type,notnull,dflt_value,pk
0,customer_id,INTEGER,1,,1
1,first_name,NVARCHAR(40),1,,0
2,last_name,NVARCHAR(20),1,,0
3,company,NVARCHAR(80),0,,0
4,address,NVARCHAR(70),0,,0
5,city,NVARCHAR(40),0,,0
6,state,NVARCHAR(40),0,,0
7,country,NVARCHAR(40),0,,0
8,postal_code,NVARCHAR(10),0,,0
9,phone,NVARCHAR(24),0,,0


## Data Exploration

### Analyzing sells and customer tastes

#### Which music genres sell the best in the USA ?

In [17]:
%%sql
WITH usa_track_sells AS 
    (
    SELECT il.*
      FROM invoice_line AS il
        INNER JOIN invoice AS i ON il.invoice_id = i.invoice_id
        INNER JOIN customer AS c ON i.customer_id = c.customer_id
      WHERE c.country = 'USA'
    )
    
SELECT g.name AS genre,
       COUNT(uts.invoice_line_id) AS tracks_sold,
       CAST(COUNT(uts.invoice_line_id) AS FLOAT) / (
           SELECT COUNT(*) FROM usa_track_sells) AS percentage_sold
FROM usa_track_sells AS uts
INNER JOIN track AS t ON t.track_id = uts.track_id
INNER JOIN genre AS g ON g.genre_id = t.genre_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;

Done.


genre,tracks_sold,percentage_sold
Rock,561,0.5337773549000951
Alternative & Punk,130,0.1236917221693625
Metal,124,0.1179828734538534
R&B/Soul,53,0.0504281636536631
Blues,36,0.0342530922930542
Alternative,35,0.033301617507136
Latin,22,0.0209324452901998
Pop,22,0.0209324452901998
Hip Hop/Rap,20,0.0190294957183634
Jazz,14,0.0133206470028544


The 5 best selling genres in the USA are `Rock`, `Punk`, `Metal` and `R&B` representing almost *80%* of sold tracks. Based on those results, a record label company representing artists in the USA should be on the lookout for those genres.

#### Sales by country

In [23]:
%%sql

WITH country_or_other AS
    (
     SELECT
       CASE
           WHEN (
                 SELECT count(*)
                 FROM customer
                 WHERE country = c.country
                ) = 1 THEN "Other"
           ELSE c.country
       END AS country,
       c.customer_id,
       il.*
     FROM invoice_line AS il
     INNER JOIN invoice AS i ON i.invoice_id = il.invoice_id
     INNER JOIN customer AS c ON c.customer_id = i.customer_id
    )

SELECT
    country,
    customers,
    total_sales,
    average_order,
    customer_lifetime_value
FROM
    (
    SELECT
        country,
        COUNT(distinct customer_id) AS customers,
        ROUND(SUM(unit_price),2) AS total_sales,
        ROUND(SUM(unit_price) / COUNT(DISTINCT customer_id),2) AS customer_lifetime_value,
        ROUND(SUM(unit_price) / COUNT(DISTINCT invoice_id),2) AS average_order,
        CASE
            WHEN country = "Other" THEN 1
            ELSE 0
        END AS sort
    FROM country_or_other
    GROUP BY country
    ORDER BY sort ASC, total_sales DESC
    );

Done.


country,customers,total_sales,average_order,customer_lifetime_value
USA,13,1040.49,7.94,80.04
Canada,8,535.59,7.05,66.95
Brazil,5,427.68,7.01,85.54
France,5,389.07,7.78,77.81
Germany,4,334.62,8.16,83.66
Czech Republic,2,273.24,9.11,136.62
United Kingdom,3,245.52,8.77,81.84
Portugal,2,185.13,6.38,92.57
India,2,183.15,8.72,91.58
Other,15,1094.94,7.45,73.0


From the result, we can suspect that `Czech Republic`, `U.K`, `India`, hold business opportunities. Indeed, even though they are not even close to the `total_sales` top, the `average_order` for those countries are the highest, meaning that with appropriate strategies these areas might be bringing higher revenues.
However it would be best to acquire more data to validate the staticial validity of such interpretations.

#### Most sells : individual tracks or full albums ?

In [25]:
%%sql

WITH invoice_first_track AS
    (
     SELECT
         il.invoice_id AS invoice_id,
         MIN(il.track_id) AS first_track_id
     FROM invoice_line AS il
     GROUP BY 1
    )

SELECT
    album_purchase,
    COUNT(invoice_id) AS number_of_invoices,
    ROUND(CAST(COUNT(invoice_id) AS FLOAT) / (
                                         SELECT COUNT(*) FROM invoice
                                      ),2) AS percent
FROM
    (
    SELECT
        ifs.*,
        CASE
            WHEN
                 (
                  SELECT t.track_id FROM track AS t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track AS t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 

                  EXCEPT 

                  SELECT il2.track_id FROM invoice_line AS il2
                  WHERE il2.invoice_id = ifs.invoice_id
                 ) IS NULL
             AND
                 (
                  SELECT il2.track_id FROM invoice_line AS il2
                  WHERE il2.invoice_id = ifs.invoice_id

                  EXCEPT 

                  SELECT t.track_id FROM track AS t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track AS t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 
                 ) IS NULL
             THEN "yes"
             ELSE "no"
         END AS "album_purchase"
     FROM invoice_first_track AS ifs
    )
GROUP BY album_purchase;

Done.


album_purchase,number_of_invoices,percent
no,500,0.81
yes,114,0.19


It seems that individual tracks are 4 times more bought than full albums.

### Sales Department - Staff Performance

Let's see who's selling the most.

In [19]:
%%sql
WITH customer_support_rep_sales AS
    (
     SELECT
         i.customer_id,
         c.support_rep_id,
         SUM(i.total) AS total
     FROM invoice AS i
     INNER JOIN customer AS c ON i.customer_id = c.customer_id
     GROUP BY 1,2
    )

SELECT
    e.first_name || " " || e.last_name employee,
    e.hire_date,
    ROUND(SUM(csrs.total),2) AS total_sales
FROM customer_support_rep_sales AS csrs
INNER JOIN employee AS e ON e.employee_id = csrs.support_rep_id
GROUP BY 1;

Done.


employee,hire_date,total_sales
Jane Peacock,2017-04-01 00:00:00,1731.51
Margaret Park,2017-05-03 00:00:00,1584.0
Steve Johnson,2017-10-17 00:00:00,1393.92


The difference between the top 2 is of only 20% which but hire dates are different which must be taken into account by management to measure sales performance.
However, `Steve Johnson` seems to be doing pretty good as he as been hired months later is we compare his hiring date with his colleagues ones, and yet, his `total_sales` are just 14% lower than n°2 `Margaret Park`