# Power Users

### Introduction

In this lesson, we'll perform work based on the 80 - 20 principle.  With the 80 - 20 principle.  The 80 - 20 rule, aka the pareto principle, is the idea that 80% of the consequences typically come from 20% of the causes.  

For example, in the age of video rentals, video stores reported that 80% of revenue came from 20% of video tapes.  The pareto principle is valuable because  it means that we can then focus on that 20%, which is having 80% of the impact.

In this lesson, we will rely on the 80 - 20 rule to find the top products that are driving our business and the top customers that most responsible for our revenue.  From there, we can move forward trying to find more customers or produce more products that are driving the business.

### Identifying our Top Users

Let's start by connecting to our data.

In [1]:
from sqlalchemy import create_engine
# change the jeffreykatz to your postgres username
conn_string = 'postgresql://jeffreykatz@localhost/ecommerce'
engine = create_engine(conn_string)

In [2]:
import pandas as pd

transactions_df = pd.read_sql('select * from transactions', engine)
transactions_df[:2]

Unnamed: 0,index,transaction_id,customer_id,product,gender,device_type,country,state,city,category,...,delivery_type,quantity,transaction_start,transaction_result,amount,individual_price,month,week,dow,hour
0,0,40170,1348959766,Hair Band,Female,Web,United States,New York,New York City,Accessories,...,one-day deliver,12,1,0,6910.0,576.0,11.0,46.0,4.0,22.0
1,1,33374,2213674919,Hair Band,Female,Web,United States,California,Los Angles,Accessories,...,one-day deliver,17,1,1,1699.0,100.0,5.0,19.0,6.0,6.0


Now to start we have our transactions, and what we want to get to is a SQL query that give us each of the users and calculate the cumulative percentage of total spend.

We loaded the result of the query below, so you can better see what we're getting at.

In [3]:
percent_spend_df = pd.read_csv('./user_percentage_spend.csv', index_col = 0)
percent_spend_df[:3]

Unnamed: 0,customer_id,total_amount,percentage_spend
0,1929979702,894869.0,0.002978
1,1430453333,840000.0,0.005774
2,1884522075,767733.0,0.008329


We start at zero, and see that the top user accounts for `.02%` of total revenue, then the top two users account for `.5%` of spend, and the top three account for `.8%` of spend, and so on.

Write the SQL to go from transactions, to the data above.  Then in the next lesson we'll see how we can work with that data.

In [53]:
import pandas as pd

query = """select * from transactions"""

user_percentage_spend = pd.read_sql(query, engine)


user_percentage_spend[:2]

Unnamed: 0,index,transaction_id,customer_id,product,gender,device_type,country,state,city,category,...,delivery_type,quantity,transaction_start,transaction_result,amount,individual_price,month,week,dow,hour
0,0,40170,1348959766,Hair Band,Female,Web,United States,New York,New York City,Accessories,...,one-day deliver,12,1,0,6910.0,576.0,11.0,46.0,4.0,22.0
1,1,33374,2213674919,Hair Band,Female,Web,United States,California,Los Angles,Accessories,...,one-day deliver,17,1,1,1699.0,100.0,5.0,19.0,6.0,6.0


> <img src="./customer_percentile_spend.png" width="60%">

So we can see above that we have a percentile spend  