### Used Libraries<a class="anchor" id="chapter1"></a>

In [1]:
import pandas as pd
from sqlalchemy import create_engine
import psycopg2

### Access to the DB <a class="anchor" id="chapter2"></a>

In [2]:
db ="data-analyst-sales-data-db"

db_config = {'user': 'practicum_student',         # username
             'pwd': 's65BlTKV3faNIGhmvJVzOqhs', # password
             'host': 'rc1b-wcoijxj3yxfsf3fs.mdb.yandexcloud.net',
             'port': 6432,              # connection port
             'db': db}          # the name of the database

connection_string = 'postgresql://{}:{}@{}:{}/{}'.format(db_config['user'],
                                                                     db_config['pwd'],
                                                                       db_config['host'],
                                                                       db_config['port'],
                                                                       db_config['db'])

engine = create_engine(connection_string, connect_args={'sslmode':'require'})


#  Table Queries <a class="anchor" id="chapter3"></a>

function that takes a query and return dataframe for general use

In [3]:
def queryResult(q):
    return pd.io.sql.read_sql(q, con = engine)

In [4]:
from sqlalchemy import inspect
inspector = inspect(engine)

inspector.get_table_names()

['item', 'rep_sales']

# The DB
## rep_sales table:

**OrderDate:** date when the order was placed

**Region:** geographical area in which the sale was made.

**Rep:** sales representative's name

**Item:** name of the item sold

**Units:** number of units sold

**UnitCost:** cost of one unit

**Total:** total cost of the order - Units x UnitCost

## Item table:

**Item:** name of the item

**MinPrice:** minimum price that the item can be sold.

In [5]:
q="select * from item"
queryResult(q)

Unnamed: 0,Item,MinPrice
0,Binder,1.99
1,Desk,125.0
2,Pen,1.75
3,Pen Set,4.99
4,Pencil,1.29
5,Marker Set,5.99
6,Binder,1.99
7,Desk,125.0
8,Pen,1.75
9,Pen Set,4.99


we see here some duplicates, but the data isn't different          
we will use in the project "select distinct * from item"

In [6]:
q="select distinct * from rep_sales"
queryResult(q)

Unnamed: 0,OrderDate,Region,Rep,Item,Units,Unit_Cost,Total
0,2019-12-12,Central,Smith,Marker Set,82,7.99,655.18
1,2020-05-24,East,Joe,Desk,4,223.00,892.00
2,2020-07-06,West,Thompson,Pen Set,35,8.99,314.65
3,2019-07-17,West,Thompson,Desk,3,205.00,615.00
4,2019-10-27,East,Parent,Marker Set,26,8.49,220.74
...,...,...,...,...,...,...,...
64,2020-11-17,Central,Jardine,Binder,11,4.99,54.89
65,2020-07-31,Central,Gill,Marker Set,31,7.99,247.69
66,2019-09-19,Central,Morgan,Marker Set,79,9.99,789.21
67,2019-01-05,Central,Jardine,Pen,51,1.79,91.29


In [7]:
q="select count(*) from rep_sales"
queryResult(q)

Unnamed: 0,count
0,69


here there are 69 rows with or without distinct, so there are not duplicates!

1. What is the total income in the data?

In [8]:
q='select sum("Total") as total_income from rep_sales'
queryResult(q)

Unnamed: 0,total_income
0,28867.97


<div class="alert alert-success" role="alert">
  Great!
</div>

2. Which sales rep brought the most income?

In [9]:
q='''
select "Rep",sum("Total") from rep_sales group by "Rep" order by sum("Total") desc
'''
queryResult(q)

Unnamed: 0,Rep,sum
0,Kivell,3554.23
1,Thompson,3060.23
2,Jardine,3006.8
3,Jones,2969.49
4,Morgan,2669.98
5,Parent,2365.37
6,Gill,2124.21
7,Smith,1943.61
8,Howard,1941.51
9,Sorvino,1922.65


<div class="alert alert-success" role="alert">
  Great!
</div>

3. Which item brought the most income?

In [10]:
q='''
select "Item",sum("Total") from rep_sales group by "Item" order by sum("Total") desc limit 1
'''
queryResult(q)

Unnamed: 0,Item,sum
0,Binder,8229.12


<div class="alert alert-success" role="alert">
  Great!
</div>

4. Which region sold the most desks?

In [11]:
q='''
select "Region",sum("Units") as desks from rep_sales where "Item"='Desk' group by "Region" order by desks desc
'''
queryResult(q)

Unnamed: 0,Region,desks
0,West,9
1,East,7
2,Central,7


<div class="alert alert-success" role="alert">
  Great!
</div>

5. Which sales rep sold the highest average price a piece for the item pen?

In [12]:
q='''
select "Rep",avg("Unit_Cost") as avg_unit_price 
from rep_sales 
where "Item"='Pen' 
group by "Rep" 
order by avg_unit_price desc
limit 1
'''
queryResult(q)

Unnamed: 0,Rep,avg_unit_price
0,Parent,19.99


<div class="alert alert-success" role="alert">
  Great!
</div>

6. Which sales rep sold the most units between April – September of 2020?

In [13]:
q='''
select "Rep",sum("Units") as units_num 
from rep_sales 
where "OrderDate" between '2020-04-01' and '2020-09-30'
group by "Rep"
order by units_num desc
limit 1
'''
queryResult(q)

Unnamed: 0,Rep,units_num
0,Andrews,309


<div class="alert alert-success" role="alert">
  Great!
</div>

7. Which sales rep had the highest price difference from the minimum price on the item pen set?

In [14]:
q='''
select "Rep",max("Unit_Cost"-"MinPrice") as max_difference 
from rep_sales inner join (select distinct * from item) as x on rep_sales."Item"=x."Item"
where rep_sales."Item" = 'Pen Set'
group by "Rep"
order by max_difference desc 
limit 1
'''
queryResult(q)

Unnamed: 0,Rep,max_difference
0,Kivell,12.01


<div class="alert alert-success" role="alert">
  Great!
</div>

8. Was there a sales rep that sold an item below the minimum price? If so, which item was it and what was the percent difference from the minimum price?

In [15]:
#  i understand "percent difference" as 
# 100 - Unit_Cost percent from MinPrice = percent difference
q='''
select rep_sales."Item","Rep","Unit_Cost","MinPrice", 
100 - round("Unit_Cost"/"MinPrice"*100,2) as percent_diff
from rep_sales inner join (select distinct * from item) as x on rep_sales."Item"=x."Item"
where "Unit_Cost"<"MinPrice"
order by percent_diff desc

'''
queryResult(q)

Unnamed: 0,Item,Rep,Unit_Cost,MinPrice,percent_diff
0,Pen,Gill,1.49,1.75,14.86
1,Pen,Andrews,1.5,1.75,14.29
2,Pen,Thompson,1.59,1.75,9.14


<div class="alert alert-success" role="alert">
  Great!
</div>

9. Which region had the highest share of  price difference ?

(where percent price different is the amount above min price divided by the total amount from that region)

In [16]:

q=''' 
select distinct "Region",
round(sum("Unit_Cost" -"MinPrice") over(partition by "Region")*100/ sum("Unit_Cost" -"MinPrice") over(),1) 
    as share_of_price_diff_percentage
from rep_sales inner join (select distinct * from item) as x on rep_sales."Item"=x."Item"
order by share_of_price_diff_percentage desc

'''
queryResult(q)

Unnamed: 0,Region,share_of_price_diff_percentage
0,West,40.8
1,East,39.4
2,Central,19.9


<div class="alert alert-success" role="alert">
  Great! - you can notice that we have a bit over 100% here (100.1%) so it's better to use without round here
</div>

10. For each sale above the minimum price the rep gets 10% commission of the total price as a bonus. Which 3 sales reps have earned the most commission (show their commission as well)?

In [17]:

q=''' 
select "Rep",sum("Total"),sum("Total")*0.1 as commision
from rep_sales inner join (select distinct * from item) as x on rep_sales."Item"=x."Item"
where "Unit_Cost">"MinPrice" 
group by "Rep"
order by commision desc
limit 3

'''
queryResult(q)

Unnamed: 0,Rep,sum,commision
0,Jardine,2757.3,275.73
1,Thompson,2726.33,272.633
2,Morgan,2669.98,266.998


<div class="alert alert-success" role="alert">
  Great!
</div>