# Practice 2 - Shipping and on-time delivery rate
This notebook uses two tables in the Olist database which are olist_orders_dataset (renamed to orders) and olist_order_items_dataset (renamed to order_items).
Two main questions:
1. **What is the on-time delivery rate?**
2. **How fast is it for customers to receive an order?**

## Understand how merchants process an order on Olist platform<br>
<img src="files/photos/order_process.png"><br>

The diagram provides an overview of how an order is processed on Olist. Four timestamp columns we have in the orders table are generated during the order procedure when each step is completed by the customer or merchant. It's worth mentioning that:
1. The estimated delivery date is actually the deadline Olist assigned for the merchant to complete an order, and it does not reflect any responsibility of the shipping carrier.
2. Merchants have to create one sale invoice per order on Olist platform and send to customers.
3. Merchants must use Olist's logistics partners, which means they are not allowed to ship orders using other carrier outside Olist's.
4. Olist supports merchants to create shipping labels whenever they need for an order. However, Olist is not responsible for shipping orders, or any logistics related procedure. Merchants will prepare the order, print out shipping label and drop the package at provided carrier location. This means the order_status 'shipped' implies that the package has been delivered to the carrier, not that the carrier has shipped it out.
5. The customer might cancel an order within up to 7 calendar days from the date of receipt of the order.

More information can be found at [Operation Help](https://olist.com/faq/) and [Olist common questions](https://get.olist.help/pt-BR/articles/413334-como-proceder-em-casos-de-cancelamento-de-pedidos)

## Connect and load in the database

In [1]:
%load_ext sql
%sql mysql+mysqlconnector://root:***@localhost/olist

'Connected: root@olist'

### Relational Schema <br>
<img src="files/photos/P2.png">

## SQL queries
### Check order_status categories

In [2]:
%%sql
SELECT a.*,
       ROUND(num_orders/COUNT(o.order_id)*100,2) AS percentage
FROM (SELECT order_status, COUNT(order_id) AS num_orders
      FROM orders
      GROUP BY order_status) a
CROSS JOIN orders o
GROUP BY a.order_status
ORDER BY a.order_status;

 * mysql+mysqlconnector://root:***@localhost/olist
8 rows affected.


order_status,num_orders,percentage
approved,2,0.0
canceled,625,0.63
created,5,0.01
delivered,96478,97.02
invoiced,314,0.32
processing,301,0.3
shipped,1107,1.11
unavailable,609,0.61


97% orders are delivered and in this notebook we will focus on delivered orders only. Let's check if there's any null values of each timestamp column before moving forward.

In [3]:
%%sql
SELECT SUM(IF(order_purchase_timestamp = '0000-00-00 00:00:00',1,0))
       AS order_purchase_timestamp_null,
       SUM(IF(order_approved_at = '0000-00-00 00:00:00',1,0))
       AS order_approved_at_null,
       SUM(IF(order_delivered_carrier_date = '0000-00-00 00:00:00',1,0))
       AS order_delivered_carrier_date_null,
       SUM(IF(order_delivered_customer_date = '0000-00-00 00:00:00',1,0))
       AS order_delivered_customer_date_null,
       SUM(IF(order_estimated_delivery_date = '0000-00-00 00:00:00',1,0))
       AS order_estimated_delivery_date_null
FROM orders
WHERE order_status = 'delivered';

 * mysql+mysqlconnector://root:***@localhost/olist
1 rows affected.


order_purchase_timestamp_null,order_approved_at_null,order_delivered_carrier_date_null,order_delivered_customer_date_null,order_estimated_delivery_date_null
0,14,2,8,0


### On-time delivery rate
Olist requires merchants to maintain their on-time delivery rate at least 96% for the last 30 days to guarantee their performance.<br>
I would like to know the number of sellers who have 96% or higher on-time delivery rate lately. First, let's check if there are any sellers who delivered orders on time each year in 2018.

In [4]:
%%sql
SELECT MONTH(order_purchase_timestamp) AS month,
       COUNT(DISTINCT seller_id) AS num_sellers
FROM (SELECT oi.seller_id, o.order_purchase_timestamp
            FROM orders o JOIN order_items oi
            ON o.order_id = oi.order_id
            WHERE order_status = 'delivered'
            AND YEAR(order_purchase_timestamp) = 2018
            AND order_delivered_customer_date != '0000-00-00 00:00:00'
            AND TIMESTAMPDIFF(day,o.order_delivered_customer_date,
                              o.order_estimated_delivery_date) >= 0) a
GROUP BY month;

 * mysql+mysqlconnector://root:***@localhost/olist
8 rows affected.


month,num_sellers
1,933
2,878
3,923
4,1084
5,1085
6,1158
7,1225
8,1234


The nearest month we have is August. The query below calculate the number of sellers who have 96% or higher on-time delivery rate in August 2018.

In [5]:
%%sql
SELECT c.num_sellers, COUNT(b.seller_id) AS num_qualified_sellers,
       ROUND(COUNT(b.seller_id)/c.num_sellers*100,2) AS percentage
FROM (SELECT a.seller_id,
             ROUND(num_on_time_orders/COUNT(DISTINCT order_id)*100,2)
             AS percent_on_time
      FROM order_items 
      JOIN (SELECT oi.seller_id,
                   COUNT(DISTINCT o.order_id) AS num_on_time_orders
            FROM orders o JOIN order_items oi
            ON o.order_id = oi.order_id
            WHERE order_status = 'delivered'
            AND MONTH(order_purchase_timestamp) = 8
            AND YEAR(order_purchase_timestamp) = 2018
            AND order_delivered_customer_date != '0000-00-00 00:00:00'
            AND TIMESTAMPDIFF(day,o.order_delivered_customer_date,
                              o.order_estimated_delivery_date) >= 0
            GROUP BY oi.seller_id) a
      ON order_items.seller_id = a.seller_id
      GROUP BY a.seller_id
      HAVING percent_on_time >= 96) b
CROSS JOIN (SELECT COUNT(DISTINCT seller_id) num_sellers
            FROM orders o JOIN order_items oi
            ON o.order_id = oi.order_id
            WHERE order_status = 'delivered'
            AND MONTH(order_purchase_timestamp) = 8
            AND YEAR(order_purchase_timestamp) = 2018
            AND order_delivered_customer_date != '0000-00-00 00:00:00') c;

 * mysql+mysqlconnector://root:***@localhost/olist
1 rows affected.


num_sellers,num_qualified_sellers,percentage
1261,135,10.71


In August 2018, among 1261 sellers who have delivered orders to customers, only 135 (10.71%) have 96% on-time delivery rate.

Below is the top 10 sellers who have the highest number of delivered orders with 96% on-time delivery rate in August 2018.

In [6]:
%%sql
SELECT a.seller_id,
       COUNT(DISTINCT order_id) AS num_delivered_orders,
       num_on_time_orders,
       ROUND(num_on_time_orders/COUNT(DISTINCT order_id)*100,2)
       AS percent_on_time
FROM order_items 
JOIN (SELECT oi.seller_id,
             COUNT(DISTINCT o.order_id) AS num_on_time_orders
      FROM orders o JOIN order_items oi
      ON o.order_id = oi.order_id
      WHERE order_status = 'delivered'
      AND MONTH(order_purchase_timestamp) = 8
      AND YEAR(order_purchase_timestamp) = 2018
      AND order_delivered_customer_date != '0000-00-00 00:00:00'
      AND TIMESTAMPDIFF(day,o.order_delivered_customer_date,
                        o.order_estimated_delivery_date) >= 0
      GROUP BY oi.seller_id) AS a
ON order_items.seller_id = a.seller_id
GROUP BY a.seller_id
HAVING percent_on_time >= 96
ORDER BY num_delivered_orders DESC
LIMIT 10;

 * mysql+mysqlconnector://root:***@localhost/olist
10 rows affected.


seller_id,num_delivered_orders,num_on_time_orders,percent_on_time
81f89e42267213cb94da7ddc301651da,46,46,100.0
aac51c486b672a9850d59f3e84b1cf88,10,10,100.0
402916f742e5c740cc751493d9cf5053,9,9,100.0
55dedd83e501d8248880557d9073cbfd,8,8,100.0
ff1e15b778c700abdd4d239b81ac466d,7,7,100.0
7f40d06aa0b5f1aa4f41af8c0480e2ef,6,6,100.0
c3aad7dc65449ae90a5e9c3c6c1e78e0,6,6,100.0
c5e60e39c0f42b8e827daa13cff74afa,6,6,100.0
09bad886111255c5b5030314fc7f1a4a,5,5,100.0
5def4c3732941a971cba8fdee992ede1,5,5,100.0


### How fast is it for a customer to receive an order?
This query calculates the percentage of orders arriving within 2 days, 1 week, 2 weeks, or more than 2 weeks after they are placed.

In [7]:
%%sql
SELECT ROUND(SUM(IF(TIMESTAMPDIFF
              (day,order_purchase_timestamp,
               order_delivered_customer_date) <= 2,1,0))/
               COUNT(order_id)*100,2)
       AS under_two_days,
       ROUND(SUM(IF(TIMESTAMPDIFF
              (day,order_purchase_timestamp,
               order_delivered_customer_date) BETWEEN 3 AND 5,1,0))/
               COUNT(order_id)*100,2)
       AS in_one_week,
       ROUND(SUM(IF(TIMESTAMPDIFF
              (day,order_purchase_timestamp,
               order_delivered_customer_date) BETWEEN 6 AND 14,1,0))/
               COUNT(order_id)*100,2)
       AS in_two_weeks,
       ROUND(SUM(IF(TIMESTAMPDIFF
              (day,order_purchase_timestamp,
               order_delivered_customer_date) > 14,1,0))/
               COUNT(order_id)*100,2)
       AS more_than_two_weeks
FROM orders
WHERE order_status = 'delivered'
AND order_delivered_customer_date != '0000-00-00 00:00:00'
AND TIMESTAMPDIFF(day,order_purchase_timestamp,
                  order_delivered_customer_date) >= 0;

 * mysql+mysqlconnector://root:***@localhost/olist
1 rows affected.


under_two_days,in_one_week,in_two_weeks,more_than_two_weeks
4.93,15.02,52.71,27.34


Most orders were delivered in 2 weeks (52.71%).

### Shipping limit date
When an order is approved, Olist platform creates a deadline requiring the merchant to handle the order to Olist logistics partner before that date. In the past 30 days, how many merchants meet this requirement? First, let's see if there's any null value in the shipping_limit_date for orders that shipped to the carrier.

In [8]:
%%sql
SELECT COUNT(DISTINCT oi.order_id) AS shipping_limit_date_null
FROM orders o JOIN order_items oi
ON o.order_id = oi.order_id
WHERE order_delivered_carrier_date != '0000-00-00 00:00:00'
AND shipping_limit_date = '0000-00-00 00:00:00'
AND MONTH(order_purchase_timestamp) = 8
AND YEAR(order_purchase_timestamp) = 2018;

 * mysql+mysqlconnector://root:***@localhost/olist
1 rows affected.


shipping_limit_date_null
0


In [9]:
%%sql
SELECT c.num_sellers, COUNT(b.seller_id) AS num_meet_deadline,
       ROUND(COUNT(b.seller_id)/c.num_sellers*100,2) AS percentage
FROM (SELECT a.seller_id,
             ROUND(meet_deadline/COUNT(DISTINCT order_id)*100,2)
             AS percentage
      FROM order_items 
      JOIN (SELECT oi.seller_id,
                   COUNT(DISTINCT o.order_id) AS meet_deadline
      FROM orders o JOIN order_items oi
      ON o.order_id = oi.order_id
      WHERE order_delivered_carrier_date != '0000-00-00 00:00:00'
      AND MONTH(order_purchase_timestamp) = 8
      AND YEAR(order_purchase_timestamp) = 2018
      AND TIMESTAMPDIFF(day,order_delivered_carrier_date,
                        shipping_limit_date) >= 0
      GROUP BY oi.seller_id) a
ON order_items.seller_id = a.seller_id
GROUP BY a.seller_id) b
CROSS JOIN (SELECT COUNT(DISTINCT seller_id) num_sellers
                   FROM orders o JOIN order_items oi
                   ON o.order_id = oi.order_id
                   WHERE order_delivered_carrier_date != '0000-00-00 00:00:00'
                   AND MONTH(order_purchase_timestamp) = 8
                   AND YEAR(order_purchase_timestamp) = 2018) c;

 * mysql+mysqlconnector://root:***@localhost/olist
1 rows affected.


num_sellers,num_meet_deadline,percentage
1266,1236,97.63


97.63% merchants meet the shipping deadline in August 2018, which is very high. Let's see how this rate changes with delivered orders.

In [10]:
%%sql
SELECT c.num_sellers, COUNT(b.seller_id) AS num_meet_deadline,
       ROUND(COUNT(b.seller_id)/c.num_sellers*100,2) AS percentage
FROM (SELECT a.seller_id,
             ROUND(meet_deadline/COUNT(DISTINCT order_id)*100,2)
             AS percent_meet_deadline
FROM order_items 
JOIN (SELECT oi.seller_id,
             COUNT(DISTINCT o.order_id) AS meet_deadline
      FROM orders o JOIN order_items oi
      ON o.order_id = oi.order_id
      WHERE order_delivered_carrier_date != '0000-00-00 00:00:00'
      AND MONTH(order_purchase_timestamp) = 8
      AND YEAR(order_purchase_timestamp) = 2018
      AND TIMESTAMPDIFF(day,order_delivered_carrier_date,
                        shipping_limit_date) >= 0
      AND order_status = 'delivered'
      GROUP BY oi.seller_id) a
ON order_items.seller_id = a.seller_id
GROUP BY a.seller_id) b
CROSS JOIN (SELECT COUNT(DISTINCT seller_id) num_sellers
            FROM orders o JOIN order_items oi
            ON o.order_id = oi.order_id
            WHERE order_delivered_carrier_date != '0000-00-00 00:00:00'
            AND MONTH(order_purchase_timestamp) = 8
            AND YEAR(order_purchase_timestamp) = 2018
            AND order_status = 'delivered') c;

 * mysql+mysqlconnector://root:***@localhost/olist
1 rows affected.


num_sellers,num_meet_deadline,percentage
1261,1232,97.7


We have 1261 merchants who sucessfully delivered orders to their customers in August 2018. 97.70% of them meet the shipping deadline, but only 10.71% has 96% or more on-time delivery rate.