# Daily Queries 5

In [1]:
import pandas as pd
import psycopg2
from sqlalchemy import create_engine, inspect
from IPython.display import display
from pprint import pprint

<a class="anchor" id="0_toc"></a>
# Table of Contents
***

1. [SQL Engine](#1-engine)
2. [Tables](#2-tables)
3. [Queries](#3-queries)
    1. [What region had the most units sold for pencil?](#q1)

    2. [For each sale above the minimum price the rep gets 10% commission of the total price as a bonus.  
    How many transactions weren’t calculated for commissions?](#q2)

    3. [For West region what was the share of each sales rep in the data?](#q3)

    4. [For the item Binder what was the share of each sales rep in the data?](#q4)

    5. [For Jardine what was the share of each item sold in the data?](#q5)

    6. [Which item is the most profitable to sell?  
    Get the percentage of the diff from unit price to minimum and rank it for the entire data](#q6)

    7. [Which sales rep had the highest price difference from the minimum price on these items combined: pen, pen set and pencil?](#q7)

<a class="anchor" id="1-engine"></a>
## SQL Engine
***
[back to Table of Contents](#0_toc)

In [2]:
db_name = 'data-analyst-sales-data-db'

In [3]:
db_config = {'user': 'practicum_student',         # username
             'pwd': 's65BlTKV3faNIGhmvJVzOqhs', # password
             'host': 'rc1b-wcoijxj3yxfsf3fs.mdb.yandexcloud.net',
             'port': 6432,              # connection port
             'db': db_name}          # the name of the database

connection_string = 'postgresql://{}:{}@{}:{}/{}'.format(db_config['user'],
                                                                     db_config['pwd'],
                                                                       db_config['host'],
                                                                       db_config['port'],
                                                                       db_config['db'])

engine = create_engine(connection_string, connect_args={'sslmode':'require'})
inspector = inspect(engine)

In [4]:
def read_schema(table_name):
    return pd.DataFrame(inspector.get_columns(table_name)).rename_axis(table_name, axis=1)

In [5]:
def execute_query(q):
    return pd.io.sql.read_sql(q, con=engine)

<a class="anchor" id="2-tables"></a>
## Inspect tables
***
[back to Table of Contents](#0_toc)

In [6]:
tables = inspector.get_table_names()
tables

['item', 'rep_sales']

In [7]:
for table in tables:
    display(read_schema(table))

item,name,type,nullable,default,autoincrement,comment
0,Item,TEXT,True,,False,
1,MinPrice,"NUMERIC(5, 2)",True,,False,


rep_sales,name,type,nullable,default,autoincrement,comment
0,OrderDate,TIMESTAMP,True,,False,
1,Region,TEXT,True,,False,
2,Rep,TEXT,True,,False,
3,Item,TEXT,True,,False,
4,Units,INTEGER,True,,False,
5,Unit_Cost,NUMERIC,True,,False,
6,Total,NUMERIC,True,,False,


In [8]:
for table in tables:
    display(execute_query(f'SELECT * FROM {table} LIMIT 1'))

Unnamed: 0,Item,MinPrice
0,Binder,1.99


Unnamed: 0,OrderDate,Region,Rep,Item,Units,Unit_Cost,Total
0,2019-01-23,Central,Kivell,Binder,50,19.99,999.5


<a class="anchor" id="3-queries"></a>
## Queries
***
[back to Table of Contents](#0_toc)

1. [What region had the most units sold for pencil?](#q1)

2. [For each sale above the minimum price the rep gets 10% commission of the total price as a bonus.  
How many transactions weren’t calculated for commissions?](#q2)

3. [For West region what was the share of each sales rep in the data?](#q3)

4. [For the item Binder what was the share of each sales rep in the data?](#q4)

5. [For Jardine what was the share of each item sold in the data?](#q5)

6. [Which item is the most profitable to sell?  
Get the percentage of the diff from unit price to minimum and rank it for the entire data](#q6)

7. [Which sales rep had the highest price difference from the minimum price on these items combined: pen, pen set and pencil?](#q7)

<a class="anchor" id="q1"></a>
### 1. What region had the most units sold for pencil?
[up](#3-queries)

In [9]:
execute_query("""
SELECT
    "Region",
    SUM("Units") AS Units
FROM
    rep_sales
WHERE
    "Item" iLike 'pencil'
GROUP BY
    "Region"
ORDER BY
    Units DESC
""")

Unnamed: 0,Region,units
0,Central,498
1,East,258
2,West,88


"Central" region had most (498) pencils sold.

<a class="anchor" id="q2"></a>
### 2. For each sale above the minimum price the rep gets 10% commission of the total price as a bonus. <br/>How many transactions weren’t calculated for commissions?  
[up](#3-queries)

In [6]:
execute_query("""
SELECT
    *
FROM
    rep_sales
        LEFT JOIN (SELECT DISTINCT * FROM item) item
            ON item."Item" = rep_sales."Item"
WHERE
    "Unit_Cost" <= "MinPrice"
""")

Unnamed: 0,OrderDate,Region,Rep,Item,Units,Unit_Cost,Total,Item.1,MinPrice
0,2019-07-12,East,Howard,Binder,29,1.99,57.71,Binder,1.99
1,2019-09-01,Central,Smith,Desk,2,125.0,250.0,Desk,125.0
2,2020-06-17,Central,Kivell,Desk,5,125.0,625.0,Desk,125.0
3,2019-11-25,Central,Kivell,Pen Set,96,4.99,479.04,Pen Set,4.99
4,2020-03-24,Central,Jardine,Pen Set,50,4.99,249.5,Pen Set,4.99
5,2020-07-04,East,Jones,Pen Set,62,4.99,309.38,Pen Set,4.99
6,2019-12-12,Central,Smith,Pencil,67,1.29,86.43,Pencil,1.29
7,2020-05-14,Central,Gill,Pencil,53,1.29,68.37,Pencil,1.29
8,2020-09-10,Central,Gill,Pencil,7,1.29,9.03,Pencil,1.29
9,2020-10-31,Central,Andrews,Pencil,14,1.29,18.06,Pencil,1.29


3 transactions weren't calculated.

<a class="anchor" id="q3"></a>
### 3. For West region what was the share of each sales rep in the data?
[up](#3-queries)

In [11]:
execute_query("""
SELECT
    *,
    ROUND(sales / SUM(sales) OVER(), 2) AS share
FROM
    (SELECT
        "Rep",
        SUM("Total") AS sales
    FROM
        rep_sales
    WHERE
        "Region" = 'West'
    GROUP BY
        "Rep"
    ) AS subq
ORDER BY
    share DESC
""")

Unnamed: 0,Rep,sales,share
0,Thompson,3060.23,0.61
1,Sorvino,1922.65,0.39


<a class="anchor" id="q4"></a>
### 4. For the item Binder what was the share of each sales rep in the data?
[up](#3-queries)

In [12]:
execute_query("""
SELECT
    *,
    ROUND(sales / SUM(sales) OVER(), 2) AS share
FROM
    (SELECT
        "Rep",
        SUM("Total") AS sales
    FROM
        rep_sales
    WHERE
       "Item" = 'Binder'
    GROUP BY
        "Rep"
    ) AS subq
ORDER BY
    share DESC
""")

Unnamed: 0,Rep,sales,share
0,Jones,1386.52,0.17
1,Gill,1132.74,0.14
2,Jardine,1054.09,0.13
3,Smith,952.0,0.12
4,Kivell,999.5,0.12
5,Parent,935.48,0.11
6,Thompson,832.0,0.1
7,Joe,347.71,0.04
8,Morgan,251.72,0.03
9,Andrews,139.72,0.02


<a class="anchor" id="q5"></a>
### 5. For Jardine what was the share of each item sold in the data?
[up](#3-queries)

In [13]:
execute_query("""
SELECT
    *,
    ROUND(total / SUM(total) OVER(), 2) AS share
FROM
    (SELECT
        "Item",
        SUM("Total") AS total
    FROM
        rep_sales
    WHERE
       "Rep" = 'Jardine'
    GROUP BY
        "Item"
    ) AS subq
ORDER BY
    share DESC
""")

Unnamed: 0,Item,total,share
0,Binder,1054.09,0.35
1,Marker Set,983.18,0.33
2,Pencil,628.74,0.21
3,Pen Set,249.5,0.08
4,Pen,91.29,0.03


<a class="anchor" id="q6"></a>
### 6. Which item is the most profitable to sell? <br />Get the percentage of the diff from unit price to minimum and rank it for the entire data
[up](#3-queries)

In [14]:
execute_query("""
SELECT
    *,
    RANK() OVER (ORDER BY diff_pct DESC)
FROM
    (SELECT
        "Item",
        ROUND(total * 100 / SUM(total) OVER (), 2)  AS diff_pct
    FROM
        (SELECT
            item."Item",
            SUM("Unit_Cost" - "MinPrice") AS total
        FROM
            rep_sales
                LEFT JOIN (SELECT DISTINCT * FROM item) item
                    ON item."Item" = rep_sales."Item"
        GROUP BY
            item."Item"
        ) AS subq
    ORDER BY
        diff_pct DESC) AS subq_2
""")

Unnamed: 0,Item,diff_pct,rank
0,Desk,57.09,1
1,Binder,21.65,2
2,Pen,6.42,3
3,Pen Set,6.11,4
4,Marker Set,6.0,5
5,Pencil,2.72,6


"Desk" has highest profit percent.

<div class="alert alert-info"> <b>Student comment:</b> <br />
    please comment on the query below, whether this is a viable method to determine profitability of an item.
</div>

In [15]:
execute_query("""
SELECT
    *,
    ROUND(profit_per_unit * 100 / SUM(profit_per_unit) OVER (), 2)  AS diff_pct
FROM
    (SELECT
        item."Item",
        SUM("Unit_Cost" - "MinPrice") AS profit,
        SUM("Units") AS units,
        SUM("Unit_Cost" - "MinPrice") / SUM("Units") AS profit_per_unit
    FROM
        rep_sales
            LEFT JOIN (SELECT DISTINCT * FROM item) item
                ON item."Item" = rep_sales."Item"
    GROUP BY
        item."Item"
    ) AS subq
ORDER BY
    profit_per_unit DESC
""")

Unnamed: 0,Item,profit,units,profit_per_unit,diff_pct
0,Desk,418.0,23,18.173913,97.52
1,Binder,158.52,708,0.223898,1.2
2,Pen Set,44.77,510,0.087784,0.47
3,Pen,47.01,723,0.065021,0.35
4,Marker Set,43.96,717,0.061311,0.33
5,Pencil,19.9,844,0.023578,0.13


<a class="anchor" id="q7"></a>
### 7. Which sales rep had the highest price difference from the minimum price on these items combined: <br />pen, pen set and pencil?
[up](#3-queries)

In [16]:
execute_query("""
SELECT
    "Rep",
    SUM("Unit_Cost" - "MinPrice") AS price_diff
FROM
    rep_sales
        LEFT JOIN (SELECT DISTINCT * FROM item) item
            ON item."Item" = rep_sales."Item"
WHERE
    LOWER(item."Item") IN ('pen', 'pen set', 'pencil')
GROUP BY
    "Rep"
ORDER BY
    price_diff DESC
""")

Unnamed: 0,Rep,price_diff
0,Parent,26.24
1,Jones,22.64
2,Gill,17.98
3,Morgan,13.46
4,Kivell,12.01
5,Jardine,7.44
6,Thompson,4.54
7,Howard,3.24
8,Sorvino,1.94
9,Andrews,1.15


"Parent" had the highest price difference on selected items.