### Used Libraries<a class="anchor" id="chapter1"></a>

In [1]:
import pandas as pd
from sqlalchemy import create_engine
import psycopg2

### Access to the DB <a class="anchor" id="chapter2"></a>

In [2]:
db_config = {'user': 'practicum_student',                         # username
             'pwd': 's65BlTKV3faNIGhmvJVzOqhs',                   # password
             'host': 'rc1b-wcoijxj3yxfsf3fs.mdb.yandexcloud.net',
             'port': 6432,                                        # connection port
             'db': 'data-analyst-sales-data-db'}          # the name of the database

connection_string = 'postgresql://{}:{}@{}:{}/{}'.format(db_config['user'],
                                                                     db_config['pwd'],
                                                                       db_config['host'],
                                                                       db_config['port'],
                                                                       db_config['db'])

engine = create_engine(connection_string, connect_args={'sslmode':'require'})


## The Database

### rep_sales table:
**OrderDate:** date when the order was placed

**Region:** geographical area in which the sale was made.

**Rep:** sales representative's name

**Item:** name of the item sold

**Units:** number of units sold

**UnitCost:** cost of one unit

**Total:** total cost of the order - Units x UnitCost


### Item table:
**Item:** name of the item

**MinPrice:** minimum price that the item can be sold.

#  Table Queries <a class="anchor" id="chapter3"></a>

function that takes a query and return dataframe for general use

In [3]:
def queryResult(q):
    return pd.io.sql.read_sql(q, con = engine)

### 1. What is the total income in the data?

In [15]:
query = '''
        SELECT
            SUM(rep_sales."Total") AS total_income
            
        FROM
            rep_sales
              
        '''
print()
print('The total income of the data:')
queryResult(query)


The total income of the data:


Unnamed: 0,total_income
0,28867.97


**28867.97** is the total income.

----

### 2. Which sales rep brought the most income?

In [16]:
query = '''
        SELECT
            DISTINCT rep_sales."Rep",
            SUM(rep_sales."Total") AS rep_total
            
        FROM
            rep_sales
            
        GROUP BY
            rep_sales."Rep"
            
        ORDER BY
            rep_total DESC
            
        LIMIT 3;
        '''
print()
print('The total income per rep:')
queryResult(query)


The total income per rep:


Unnamed: 0,Rep,rep_total
0,Kivell,3554.23
1,Thompson,3060.23
2,Jardine,3006.8


**Kivell** brought the most income from the data.

----

### 3. Which item brought the most income?

In [19]:
query = '''
        SELECT
            DISTINCT rep_sales."Item",
            SUM(rep_sales."Total") AS item_total
            
        FROM
            rep_sales
            
        GROUP BY
            rep_sales."Item"
            
        ORDER BY
            item_total DESC
            
        LIMIT 3;
        '''
print()
print('The total income per item:')
queryResult(query)


The total income per item:


Unnamed: 0,Item,item_total
0,Binder,8229.12
1,Marker Set,6618.71
2,Pen Set,4683.66


**Binder** is the item that brought the highest income.

----

### 4. Which region sold the most desks?

In [22]:
query = '''
        SELECT
            DISTINCT rep_sales."Region",
            rep_sales."Item",
            SUM(rep_sales."Units") AS total_desks
            
        FROM
            rep_sales
        
        WHERE
            rep_sales."Item" = 'Desk'
            
        GROUP BY
            rep_sales."Region",
            rep_sales."Item"

        ORDER BY
            total_desks DESC;
        '''
print()
print('The total desks sold per region:')
queryResult(query)


The total desks sold per region:


Unnamed: 0,Region,Item,total_desks
0,West,Desk,9
1,Central,Desk,7
2,East,Desk,7


**West Region** sold the most desks.

----

### 5. Which sales rep sold the highest average price a piece for the item pen?

In [8]:
query = '''
            SELECT
                rep_sales."Rep" AS rep,
                rep_sales."Item" AS item,
                AVG(rep_sales."Unit_Cost") AS avg_unit_cost
                
            FROM
                rep_sales
                LEFT JOIN item ON item."Item" = rep_sales."Item"
            
            WHERE
                rep_sales."Item" = 'Pen'
                
            GROUP BY
                rep_sales."Rep",
                rep_sales."Item"
            
            ORDER BY
               avg_unit_cost DESC; 
        '''
print()
print('The average price per pan unit each rep sells:')
queryResult(query)


The average price per pan unit each rep sells:


Unnamed: 0,rep,item,avg_unit_cost
0,Parent,Pen,19.99
1,Gill,Pen,10.74
2,Jones,Pen,8.99
3,Howard,Pen,4.99
4,Joe,Pen,2.19
5,Sorvino,Pen,1.99
6,Jardine,Pen,1.79
7,Thompson,Pen,1.59
8,Andrews,Pen,1.5


**Parent** has the highest average price per unit.

----

### 6. Which sales rep sold the most units between April – September of 2020?

In [25]:
query = '''
        SELECT
            rep_sales."Rep" AS rep,
            SUM(rep_sales."Units") AS total_units
        FROM
            rep_sales
        
        WHERE
            CAST(rep_sales."OrderDate" AS date) BETWEEN '2020-04-01' AND '2020-10-01'
        
        GROUP BY
            rep_sales."Rep"
            
        ORDER BY
            total_units DESC;
        '''
print()
print('The total number of units each rep sold between April and September:')
queryResult(query)


The total number of units each rep sold between April and September:


Unnamed: 0,rep,total_units
0,Andrews,309
1,Gill,256
2,Thompson,245
3,Howard,99
4,Kivell,94
5,Jones,86
6,Sorvino,79
7,Morgan,55
8,Joe,38


**Andrews** sold the most units between April and September.

----

### 7. Which sales rep had the highest price difference from the minimum price on the item pen set?

In [26]:
query = '''
        SELECT
            rep_sales."Rep" AS rep,
            rep_sales."Item" AS item,
            rep_sales."Unit_Cost" AS unit_cost,
            item."MinPrice" AS min_price,
            (rep_sales."Unit_Cost" - item."MinPrice") AS diff_cost
            
        FROM
            rep_sales
            LEFT JOIN item ON item."Item" = rep_sales."Item"
            
        WHERE
            rep_sales."Item" = 'Pen Set'
            
        ORDER BY
            diff_cost DESC;
        '''
print()
print('The difference in price per unit compared to min price per rep:')
queryResult(query)


The difference in price per unit compared to min price per rep:


Unnamed: 0,rep,item,unit_cost,min_price,diff_cost
0,Kivell,Pen Set,17.0,4.99,12.01
1,Kivell,Pen Set,17.0,4.99,12.01
2,Jones,Pen Set,15.99,4.99,11.0
3,Jones,Pen Set,15.99,4.99,11.0
4,Parent,Pen Set,12.99,4.99,8.0
5,Parent,Pen Set,12.99,4.99,8.0
6,Morgan,Pen Set,12.49,4.99,7.5
7,Morgan,Pen Set,12.49,4.99,7.5
8,Thompson,Pen Set,8.99,4.99,4.0
9,Thompson,Pen Set,8.99,4.99,4.0


**Kivell** has the highest difference in price.

----

### 8. Was there a sales rep that sold an item below the minimum price? If so, which item was it and what was the percent difference from the minimum price?

In [29]:
query = '''
        SELECT
            rep_sales."Rep" AS rep,
            rep_sales."Item" AS item,
            rep_sales."Unit_Cost" AS unit_cost,
            item."MinPrice" AS min_price,
            (rep_sales."Unit_Cost" - item."MinPrice") AS price_diff,
            
            ("MinPrice" - "Unit_Cost")*100 / "MinPrice" AS prcnt_price_diff
            
        FROM
            rep_sales
            LEFT JOIN item ON item."Item" = rep_sales."Item"
        
        WHERE
            (rep_sales."Unit_Cost" - item."MinPrice") < 0
        
        GROUP BY
            rep_sales."Rep",
            rep_sales."Item",
            rep_sales."Unit_Cost",
            item."MinPrice"
            
        ORDER BY
            price_diff;
        '''
print()
print('The reps with negative difference in price for pens:')
queryResult(query)


The reps with negative difference in price for pens:


Unnamed: 0,rep,item,unit_cost,min_price,price_diff,prcnt_price_diff
0,Gill,Pen,1.49,1.75,-0.26,14.857143
1,Andrews,Pen,1.5,1.75,-0.25,14.285714
2,Thompson,Pen,1.59,1.75,-0.16,9.142857


**Gill** has the biggeset negative difference in sales.

----

### 9. Which region had the highest share of  price difference?

In [12]:
query = '''
        SELECT
            DISTINCT "Region",
            
            SUM("Unit_Cost" - "MinPrice") OVER
            (PARTITION BY "Region") * 100 / SUM("Unit_Cost" - "MinPrice") OVER()
            AS share_of_price_diff_prcnt
            
        FROM 
            rep_sales
            LEFT JOIN item ON item."Item" = rep_sales."Item"
            
        ORDER BY 
            share_of_price_diff_prcnt DESC;

        '''
print()
print('The share of price difference per region:')
queryResult(query)


The share of price difference per region:


Unnamed: 0,Region,share_of_price_diff_prcnt
0,West,40.762948
1,East,39.357517
2,Central,19.879535


**West** has the biggest share in price fidderence

----

### 10. For each sale above the minimum price the rep gets 10% commission of the total price as a bonus. Which 3 sales reps have earned the most commission (show their commission as well)?

In [30]:
query = '''
        SELECT
            rep_sales."Rep" AS rep,
            SUM((rep_sales."Total") * 0.1) AS commission
            
        FROM
            rep_sales
            LEFT JOIN (SELECT DISTINCT * FROM item) AS item ON item."Item" = rep_sales."Item"
            
        WHERE
            rep_sales."Unit_Cost" > item."MinPrice"
            
        GROUP BY
            rep_sales."Rep"
        ORDER BY
            commission DESC
            
        LIMIT 3;
        '''
print()
print('The top 3 reps with the highest commission:')
queryResult(query)


The top 3 reps with the highest commission:


Unnamed: 0,rep,commission
0,Jardine,275.73
1,Thompson,272.633
2,Morgan,266.998
