# Daily Queries 3

In [1]:
import pandas as pd
import psycopg2
from sqlalchemy import create_engine, inspect
from IPython.display import display
from pprint import pprint

<div class="alert alert-info"> <b>Student comment v. 2:</b> <br />
    <a href="#q9">updated #9</a>
</div>

<a class="anchor" id="0_toc"></a>
# Table of Contents
***

1. [SQL Engine](#1-engine)
2. [Tables](#2-tables)
3. [Queries](#3-queries)
    1. [What is the total income in the data?](#q1)
    2. [Which sales rep brought the most income?](#q2)
    3. [Which item brought the most income?](#q3)
    4. [Which region sold the most desks?](#q4)
    5. [Which sales rep sold the highest average price a piece for the item pen?](#q5)
    6. [Which sales rep sold the most units between April – September of 2020?](#q6)
    7. [Which sales rep had the highest price difference from the minimum price on the item pen set?](#q7)
    8. [Was there a sales rep that sold an item below the minimum price?  
    If so, which item was it and what was the percent difference from the minimum price?](#q8)
    9. [Which region had the highest share of  price difference?  
    (where percent price different is the amount above min price divided by the total amount from that region)](#q9)
    10. [For each sale above the minimum price the rep gets 10% commission of the total price as a bonus.  
    Which 3 sales reps have earned the most commission (show their commission as well)?](#q10)

<a class="anchor" id="1-engine"></a>
## SQL Engine
***
[back to Table of Contents](#0_toc)

In [2]:
db_name = 'data-analyst-sales-data-db'

In [3]:
db_config = {'user': 'practicum_student',         # username
             'pwd': 's65BlTKV3faNIGhmvJVzOqhs', # password
             'host': 'rc1b-wcoijxj3yxfsf3fs.mdb.yandexcloud.net',
             'port': 6432,              # connection port
             'db': db_name}          # the name of the database

connection_string = 'postgresql://{}:{}@{}:{}/{}'.format(db_config['user'],
                                                                     db_config['pwd'],
                                                                       db_config['host'],
                                                                       db_config['port'],
                                                                       db_config['db'])

engine = create_engine(connection_string, connect_args={'sslmode':'require'})
inspector = inspect(engine)

In [4]:
def read_schema(table_name):
    return pd.DataFrame(inspector.get_columns(table_name)).rename_axis(table_name, axis=1)

In [5]:
def execute_query(q):
    return pd.io.sql.read_sql(q, con=engine)

<a class="anchor" id="2-tables"></a>
## Inspect tables
***
[back to Table of Contents](#0_toc)

In [6]:
tables = inspector.get_table_names()
tables

['item', 'rep_sales']

In [7]:
for table in tables:
    display(read_schema(table))

item,name,type,nullable,default,autoincrement,comment
0,Item,TEXT,True,,False,
1,MinPrice,"NUMERIC(5, 2)",True,,False,


rep_sales,name,type,nullable,default,autoincrement,comment
0,OrderDate,TIMESTAMP,True,,False,
1,Region,TEXT,True,,False,
2,Rep,TEXT,True,,False,
3,Item,TEXT,True,,False,
4,Units,INTEGER,True,,False,
5,Unit_Cost,NUMERIC,True,,False,
6,Total,NUMERIC,True,,False,


In [8]:
for table in tables:
    display(execute_query(f'SELECT * FROM {table} LIMIT 1'))

Unnamed: 0,Item,MinPrice
0,Binder,1.99


Unnamed: 0,OrderDate,Region,Rep,Item,Units,Unit_Cost,Total
0,2019-01-23,Central,Kivell,Binder,50,19.99,999.5


### Check data

In [9]:
execute_query("""
SELECT
    COUNT(1)
FROM
    item
""")

Unnamed: 0,count
0,12


In [10]:
items_df = execute_query("""
SELECT
    *
FROM
    item
""")
items_df

Unnamed: 0,Item,MinPrice
0,Binder,1.99
1,Desk,125.0
2,Pen,1.75
3,Pen Set,4.99
4,Pencil,1.29
5,Marker Set,5.99
6,Binder,1.99
7,Desk,125.0
8,Pen,1.75
9,Pen Set,4.99


In [11]:
items_df.duplicated().sum()

6

There are 6 duplicated values, will use DISTINCT in future joins.

In [12]:
execute_query("""
SELECT
    COUNT(1)
FROM
    rep_sales
""")

Unnamed: 0,count
0,69


In [13]:
execute_query("""
SELECT
    *
FROM
    rep_sales
ORDER BY
    "OrderDate"
""").duplicated().sum()

0

<a class="anchor" id="3-queries"></a>
## Queries
***
[back to Table of Contents](#0_toc)

1. [What is the total income in the data?](#q1)
2. [Which sales rep brought the most income?](#q2)
3. [Which item brought the most income?](#q3)
4. [Which region sold the most desks?](#q4)
5. [Which sales rep sold the highest average price a piece for the item pen?](#q5)
6. [Which sales rep sold the most units between April – September of 2020?](#q6)
7. [Which sales rep had the highest price difference from the minimum price on the item pen set?](#q7)
8. [Was there a sales rep that sold an item below the minimum price?  
If so, which item was it and what was the percent difference from the minimum price?](#q8)
9. [Which region had the highest share of  price difference?  
(where percent price different is the amount above min price divided by the total amount from that region)](#q9)
10. [For each sale above the minimum price the rep gets 10% commission of the total price as a bonus.  
Which 3 sales reps have earned the most commission (show their commission as well)?](#q10)

<a class="anchor" id="q1"></a>
### 1. What is the total income in the data?
[up](#3-queries)

In [14]:
execute_query("""
SELECT
    SUM("Total")
FROM
    rep_sales
""")

Unnamed: 0,sum
0,28867.97


<div class="alert alert-success" role="alert">
  Great!
</div>

<a class="anchor" id="q2"></a>
### 2. Which sales rep brought the most income?
[up](#3-queries)

In [15]:
top = 5

execute_query(f"""
SELECT
    "Rep",
    SUM("Total")
FROM
    rep_sales
GROUP BY
    "Rep"
ORDER BY
    sum DESC
LIMIT {top}
""")

Unnamed: 0,Rep,sum
0,Kivell,3554.23
1,Thompson,3060.23
2,Jardine,3006.8
3,Jones,2969.49
4,Morgan,2669.98


"Kivell" has highest total sells sum.

<div class="alert alert-success" role="alert">
  Great!
</div>

<a class="anchor" id="q3"></a>
### 3. Which item brought the most income?
[up](#3-queries)

In [16]:
top = 3

execute_query(f"""
SELECT
    "Item",
    SUM("Total")
FROM
    rep_sales
GROUP BY
    "Item"
ORDER BY
    sum DESC
LIMIT {top}
""")

Unnamed: 0,Item,sum
0,Binder,8229.12
1,Marker Set,6618.71
2,Pen Set,4683.66


"Binder" has highest total gross.

<div class="alert alert-success" role="alert">
  Great!
</div>

<a class="anchor" id="q4"></a>
### 4. Which region sold the most desks?
[up](#3-queries)

In [17]:
execute_query("""
SELECT
    "Region",
    SUM("Units") AS units
FROM
    rep_sales
WHERE
    "Item" iLIKE '%%desk%%'
GROUP BY
    "Region"
ORDER BY
    units DESC
""")

Unnamed: 0,Region,units
0,West,9
1,East,7
2,Central,7


"West" region had most (9) desks sold.

<div class="alert alert-danger" role="alert">
 <del> You actually checked amount of transactions that included desks and not amount of desks actually sold. You are close though>/del>
</div>

<div class="alert alert-success" role="alert">
  Great!
</div>

<a class="anchor" id="q5"></a>
### 5. Which sales rep sold the highest average price a piece for the item pen?
[up](#3-queries)

In [18]:
execute_query("""
SELECT
    "Rep" AS rep,
    SUM("Units") AS units,
    SUM("Total") AS total,
    AVG("Total" / "Units") AS average_item_price
FROM
    rep_sales
WHERE
    "Item" = 'Pen'
GROUP BY
    "Rep"
ORDER BY
    average_item_price DESC
""")

Unnamed: 0,rep,units,total,average_item_price
0,Parent,15,299.85,19.99
1,Gill,112,666.38,10.74
2,Jones,64,575.36,8.99
3,Howard,96,479.04,4.99
4,Joe,34,74.46,2.19
5,Sorvino,76,151.24,1.99
6,Jardine,51,91.29,1.79
7,Thompson,210,333.9,1.59
8,Andrews,65,97.5,1.5


<div class="alert alert-success" role="alert">
  Great!
</div>

"Parent" has sold Pens with the highest (19.99) avg price per item.

<a class="anchor" id="q6"></a>
### 6. Which sales rep sold the most units between April – September of 2020?
[up](#3-queries)

In [19]:
execute_query("""
SELECT
    "Rep",
    SUM("Units") AS units,
    SUM("Total") AS total
FROM
    rep_sales
WHERE
    DATE_TRUNC('month', "OrderDate") BETWEEN '2020-04-01' AND '2020-09-01'
GROUP BY
    "Rep"
ORDER BY
    units DESC
""")

Unnamed: 0,Rep,units,total
0,Andrews,309,1562.06
1,Gill,256,1170.94
2,Thompson,245,648.55
3,Howard,99,1079.04
4,Kivell,94,2075.69
5,Jones,86,837.14
6,Sorvino,79,976.24
7,Morgan,55,686.95
8,Joe,38,966.46


"Andrews" sold the most (309) units between April and September of 2020.

<div class="alert alert-success" role="alert">
  Great!
</div>

<a class="anchor" id="q7"></a>
### 7. Which sales rep had the highest price difference from the minimum price on the item pen set?
[up](#3-queries)

In [20]:
execute_query("""
SELECT
    *,
    max - min AS diff
FROM
    (SELECT
        "Rep",
        MIN("Unit_Cost"),
        MAX("Unit_Cost")
    FROM
        rep_sales
    WHERE
        "Item" iLIKE 'pen set'
    GROUP BY
        "Rep"
    ) AS subq
ORDER BY
    diff DESC
""")

Unnamed: 0,Rep,min,max,diff
0,Kivell,4.99,17.0,12.01
1,Jones,4.99,15.99,11.0
2,Morgan,7.25,12.49,5.24
3,Parent,12.99,12.99,0.0
4,Thompson,8.99,8.99,0.0
5,Jardine,4.99,4.99,0.0


"Kivell" had the highest (12.01) unit price difference on Pen Set.

<div class="alert alert-success" role="alert">
  Great!
</div>

<a class="anchor" id="q8"></a>
### 8. Was there a sales rep that sold an item below the minimum price? <br />&nbsp;&nbsp;&nbsp;&nbsp;If so, which item was it and what was the percent difference from the minimum price?
[up](#3-queries)

In [21]:
execute_query("""
SELECT DISTINCT
    *,
    ROUND(("MinPrice" - "Unit_Cost") * 100.0 / "MinPrice", 2) AS percent_diff
FROM
    rep_sales
        LEFT JOIN item ON item."Item" = rep_sales."Item"
WHERE
    "Unit_Cost" < "MinPrice"
ORDER BY
    percent_diff DESC
""")

Unnamed: 0,OrderDate,Region,Rep,Item,Units,Unit_Cost,Total,Item.1,MinPrice,percent_diff
0,2020-06-18,Central,Gill,Pen,85,1.49,126.65,Pen,1.75,14.86
1,2020-09-27,Central,Andrews,Pen,65,1.5,97.5,Pen,1.75,14.29
2,2020-08-12,West,Thompson,Pen,210,1.59,333.9,Pen,1.75,9.14


<div class="alert alert-success" role="alert">
  Great! - good idea here to sort them by that difference:)
</div>

<a class="anchor" id="q9"></a>
### 9. Which region had the highest share of price difference? <br />&nbsp;&nbsp;&nbsp;&nbsp;(where percent price different is the amount above min price divided by the total amount from that region)
[up](#3-queries)

In [22]:
execute_query("""
SELECT DISTINCT
    "Region",
    ROUND(SUM("Unit_Cost" - "MinPrice") OVER (PARTITION BY "Region") / 
        SUM("Unit_Cost" - "MinPrice") OVER (), 2) AS price_diff_ratio
FROM
    rep_sales
        LEFT JOIN item ON item."Item" = rep_sales."Item"
WHERE
    "Unit_Cost" > "MinPrice"
ORDER BY
    price_diff_ratio DESC
""")

Unnamed: 0,Region,price_diff_ratio
0,West,0.41
1,East,0.39
2,Central,0.2


"West" region had the highest overall percent diff.

<div class="alert alert-danger" role="alert">
  <del>Have you noticed that your percentage don't add up to a 100? looks like a problem in sub query </del>
</div>

<div class="alert alert-warning" role="alert">
  <del>Almost:)
  <br>Beacuse we want above min price and we know we have some below that from the last question, we should exclud the negative ones  </del>
</div>

<div class="alert alert-success" role="alert">
  Great!
</div>

<a class="anchor" id="q10"></a>
### 10. For each sale above the minimum price the rep gets 10% commission of the total price as a bonus. <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Which 3 sales reps have earned the most commission (show their commission as well)?
[up](#3-queries)

In [23]:
top = 3

execute_query(f"""
SELECT
    "Rep",
    SUM(comission) AS comission
FROM
    (SELECT DISTINCT
        "Rep",
        "Unit_Cost",
        "MinPrice",
        "Total" * 0.1 AS comission
    FROM
        rep_sales
            RIGHT JOIN item ON item."Item" = rep_sales."Item"
    WHERE
        "Unit_Cost" > "MinPrice") AS subq
GROUP BY
    "Rep"
ORDER BY
    comission DESC
LIMIT {top}
""")

Unnamed: 0,Rep,comission
0,Jardine,275.73
1,Thompson,272.633
2,Morgan,266.998


<div class="alert alert-danger" role="alert">
<del>  The answer itself is good but it seems like you multiplied it by 2 for some reason you should get 
0	Jardine	275.730
1	Thompson	272.633
    2	Morgan	266.998</del>
</div>

<div class="alert alert-success" role="alert">
  Great!
</div>