# Practice 7 - First order and average order value
This notebook uses three tables in the Olist database which are olist_orders_dataset (renamed to orders), olist_order_items_dataset (renamed to order_items), and olist_closed_deals_dataset (renamed to deals).<br>
Two questions are:

1. **How long does it take for a new seller to have the first order?**
2. **Which seller has the highest average order value? and in which business segment are they?**

## Connect and load in the database

In [1]:
%load_ext sql
%sql mysql+mysqlconnector://root:***@localhost/olist

'Connected: root@olist'

### Relational Schema <br>
<img src="files/photos/P7.png">

## SQL queries

## How long does it take for a new seller to have the first order?
Among 842 new sellers, how many of them already had their first order?

In [2]:
%%sql
SELECT COUNT(seller_id) AS num_sellers
FROM deals 
WHERE EXISTS (SELECT seller_id
             FROM order_items
             WHERE deals.seller_id = order_items.seller_id)

 * mysql+mysqlconnector://root:***@localhost/olist
1 rows affected.


num_sellers
380


This is under half number of them. Recall that Olist closed these deals from December 2017 to November 2018, and the orders in 2018 spread from January to October 2018, which might not record all of their orders in 2018.
I would like to know the median number of days that take a new seller to have the first order in 2018 by business segment.

In [3]:
%%sql
SELECT business_segment,
       ROUND(AVG(days_diff),0) AS median_days_diff
FROM (SELECT ROW_NUMBER()
             OVER (PARTITION BY t4.business_segment
             ORDER BY t4.days_diff) AS count_of_group,
             t4.business_segment, t4.days_diff, t5.total_of_group
      FROM (SELECT business_segment,
                   ROUND(TIMESTAMPDIFF(day,won_date,first_order_date),0)
                   AS days_diff
            FROM (SELECT seller_id,
                         MIN(order_purchase_timestamp)
                         AS first_order_date
                  FROM (SELECT order_id,
                               order_purchase_timestamp
                        FROM orders
                        WHERE YEAR(order_purchase_timestamp) = 2018) t1
                  JOIN (SELECT DISTINCT order_id, seller_id
                        FROM order_items) t2
                  ON t1.order_id = t2.order_id
                  GROUP BY seller_id) t3
            JOIN deals d ON d.seller_id = t3.seller_id) t4
       JOIN (SELECT business_segment,
                    COUNT(days_diff) AS total_of_group
             FROM (SELECT business_segment,
                          ROUND(TIMESTAMPDIFF(day,won_date,first_order_date),0)
                          AS days_diff
                   FROM (SELECT seller_id,
                                MIN(order_purchase_timestamp)
                                AS first_order_date
                         FROM (SELECT order_id,
                                      order_purchase_timestamp
                               FROM orders
                               WHERE YEAR(order_purchase_timestamp) = 2018) t1
                         JOIN (SELECT DISTINCT order_id, seller_id
                               FROM order_items) t2
                         ON t1.order_id = t2.order_id
                         GROUP BY seller_id) t3
                   JOIN deals d ON d.seller_id = t3.seller_id) t4
             GROUP BY business_segment) t5
        ON t4.business_segment = t5.business_segment) t6
WHERE count_of_group BETWEEN total_of_group/2 AND total_of_group/2 + 1
GROUP BY business_segment
ORDER BY median_days_diff DESC;

 * mysql+mysqlconnector://root:***@localhost/olist
29 rows affected.


business_segment,median_days_diff
handcrafted,103
air_conditioning,87
food_drink,84
party,78
fashion_accessories,77
car_accessories,66
games_consoles,51
watches,50
phone_mobile,49
home_decor,47


This is quite informative. Let's check this number by lead type/type of platform.

In [4]:
%%sql
SELECT lead_type,
       ROUND(AVG(days_diff),0) AS median_days_diff
FROM (SELECT ROW_NUMBER()
             OVER (PARTITION BY t4.lead_type
             ORDER BY t4.days_diff) AS count_of_group,
             t4.lead_type, t4.days_diff, t5.total_of_group
      FROM (SELECT lead_type,
                   ROUND(TIMESTAMPDIFF(day,won_date,first_order_date),0)
                   AS days_diff
            FROM (SELECT seller_id,
                         MIN(order_purchase_timestamp)
                         AS first_order_date
                  FROM (SELECT order_id,
                               order_purchase_timestamp
                        FROM orders
                        WHERE YEAR(order_purchase_timestamp) = 2018) t1
                  JOIN (SELECT DISTINCT order_id, seller_id
                        FROM order_items) t2
                  ON t1.order_id = t2.order_id
                  GROUP BY seller_id) t3
            JOIN deals d ON d.seller_id = t3.seller_id) t4
       JOIN (SELECT lead_type,
                    COUNT(days_diff) AS total_of_group
             FROM (SELECT lead_type,
                          ROUND(TIMESTAMPDIFF(day,won_date,first_order_date),0)
                          AS days_diff
                   FROM (SELECT seller_id,
                                MIN(order_purchase_timestamp)
                                AS first_order_date
                         FROM (SELECT order_id,
                                      order_purchase_timestamp
                               FROM orders
                               WHERE YEAR(order_purchase_timestamp) = 2018) t1
                         JOIN (SELECT DISTINCT order_id, seller_id
                               FROM order_items) t2
                         ON t1.order_id = t2.order_id
                         GROUP BY seller_id) t3
                   JOIN deals d ON d.seller_id = t3.seller_id) t4
             GROUP BY lead_type) t5
        ON t4.lead_type = t5.lead_type) t6
WHERE count_of_group BETWEEN total_of_group/2 AND total_of_group/2 + 1
GROUP BY lead_type
ORDER BY median_days_diff DESC;

 * mysql+mysqlconnector://root:***@localhost/olist
8 rows affected.


lead_type,median_days_diff
offline,60
online_medium,48
industry,46
online_small,45
,43
online_beginner,34
online_big,28
online_top,28


It's more reasonable now. Sellers who have online selling experience get the first orders faster.

### Average order value by sellers
Which seller has the highest average order value? and in which business segment are they?

In [5]:
%%sql
SELECT t1.seller_id, business_segment,
       ROUND(total_order_value/num_orders*100,2)
       AS avg_order_value
FROM (SELECT seller_id,
             COUNT(DISTINCT order_id)
             AS num_orders,
             SUM(price) + SUM(freight_value)
             AS total_order_value
      FROM order_items
      GROUP BY seller_id) t1
JOIN deals d
ON t1.seller_id = d.seller_id
ORDER BY avg_order_value DESC
LIMIT 10;

 * mysql+mysqlconnector://root:***@localhost/olist
10 rows affected.


seller_id,business_segment,avg_order_value
1444c08e64d55fb3c25f0f09c07ffcf2,car_accessories,281874.0
c004e5ea15737026cecaee0447e00b75,construction_tools_house_garden,243716.0
8de8fe3af4449ed695d2434c933ed73e,air_conditioning,215535.0
d7827b2af99326a03b0ed9c7a24db0d3,construction_tools_house_garden,155670.0
04843805947f0fc584fc1969b6e50fe7,home_decor,147476.0
9b1585752613ec342d03bbab9997ec48,car_accessories,144968.0
33dd941c27854f7625b968cc6195a552,household_utilities,143368.2
0873d9f8f36123f8d910f4760e788cfb,audio_video_electronics,123777.5
ba90964cff9b9e0e6f32b23b82465f7b,small_appliances,119798.68
28872dc528e978a639754bc8c2ce5a4c,household_utilities,103890.0
