<a href="https://colab.research.google.com/github/v-gaurav/Amway/blob/main/map_latest_completed_orders_th.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# @title Setup
from google.colab import auth
from google.cloud import bigquery
from google.colab import data_table

project = 'vgaurav' # Project ID inserted based on the query results selected to explore
location = 'US' # Location inserted based on the query results selected to explore
client = bigquery.Client(project=project, location=location)
data_table.enable_dataframe_formatter()
auth.authenticate_user()

## Reference SQL syntax from the original job
Use the ```jobs.query```
[method](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query) to
return the SQL syntax from the job. This can be copied from the output cell
below to edit the query now or in the future. Alternatively, you can use
[this link](https://console.cloud.google.com/bigquery?j=vgaurav:US:bquxjob_3e42a366_186144f4a7d)
back to BigQuery to edit the query within the BigQuery user interface.

In [21]:
# Running this code will display the query used to generate your previous job

# job = client.get_job('bquxjob_3e42a366_186144f4a7d') # Job ID inserted based on the query results selected to explore
# print(job.query)

# Result set loaded from BigQuery job as a DataFrame
Query results are referenced from the Job ID ran from BigQuery and the query
does not need to be re-run to explore results. The ```to_dataframe```
[method](https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJob.html#google.cloud.bigquery.job.QueryJob.to_dataframe)
downloads the results to a Pandas DataFrame by using the BigQuery Storage API.

To edit query syntax, you can do so from the BigQuery SQL editor or in the
```Optional:``` sections below.

In [22]:
DATASET_NAME = "`amw-dna-ingestion-prd.streaming_thailand_raw.order`"
query = f"""
WITH
  order_details_raw AS (
  SELECT
    data.affiliateOrderNumber AS order_number,
    data.account.accountType AS buyer_type,
    data.account.entryDate AS buyer_entry_date,
    data.account.pinLevel AS buyer_pin_level,
    EXTRACT(datetime from data.orderDate at time zone 'Asia/Bangkok') AS order_datetime,
    EXTRACT(date from data.orderDate at time zone 'Asia/Bangkok') AS order_date,
    EXTRACT(datetime from data.orderPeriod at time zone 'Asia/Bangkok') AS order_period,
    EXTRACT(datetime from data.bonusPeriod at time zone 'Asia/Bangkok') AS bonus_period,
    data.channel.code AS channel_code,
    data.deliveryAddress.postalCode AS postal_code,
    data.deliveryMode.code AS delivery_mode,
    data.deliveryType AS delivery_type,
    data.orderType.code AS order_type,
    data.status.code AS order_status,
    data.totalPrice.value AS order_value,
    generationTime AS collection_time,
    countryCode AS market_entity

  FROM
    {DATASET_NAME}),

  -- deduping logic goes here
  -- get unique rows
  unique_rows AS (
  SELECT
    order_number,
    market_entity,
    MAX(collection_time) AS latest_time
  FROM
    order_details_raw
  GROUP BY
    order_number, market_entity ),

  -- get all rows
  all_rows AS (
  SELECT
    *
  FROM
    order_details_raw ),
  
  -- merge the two
  joint AS (
  SELECT
    all_rows.*
  FROM
    all_rows
  INNER JOIN
    unique_rows
  ON
    all_rows.order_number=unique_rows.order_number
    AND all_rows.collection_time=unique_rows.latest_time
    AND all_rows.market_entity=unique_rows.market_entity ),
  
  -- get it deduped
  order_details_deduped AS (
  SELECT
    * EXCEPT(rn)
  FROM (
    SELECT
      *,
      ROW_NUMBER() OVER (PARTITION BY order_number) AS rn
    FROM
      joint )
  WHERE
    rn=1 ),

  -- filter rows based on our question criteria. We only care about comepleted orders created after 2021 bonus period
  order_details_filtered AS (
  SELECT
    * EXCEPT(order_status)
  FROM
    order_details_deduped
  WHERE
    order_status = 'COMPLETED'
    AND bonus_period >= '2021-09-01' )

    SELECT * FROM order_details_filtered

"""

In [26]:
job = client.query(query)
results = job.to_dataframe()

In [27]:
results.set_index(['order_number', 'market_entity'], inplace=True)
results.sample(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,buyer_type,buyer_entry_date,buyer_pin_level,order_datetime,order_date,order_period,bonus_period,channel_code,postal_code,delivery_mode,delivery_type,order_type,order_value,collection_time
order_number,market_entity,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
966301932,TH,AMWAY BUSINESS OWNER,2022-02-10 04:46:50+00:00,NORMAL_ABO,2022-05-21 19:39:33,2022-05-21,2022-05-01,2022-05-01,Web,10270.0,standard-shipping,Home-Delivery,NORMAL_ORDER,11535.49,2022-05-23 03:16:50+00:00
962375391,TH,AMWAY BUSINESS OWNER,2021-01-30 13:44:17+00:00,NORMAL_ABO,2022-03-03 20:34:39,2022-03-03,2022-03-01,2022-03-01,Web,,pickup,Pickup,RENEWAL_ORDER,420.56,2022-03-03 13:36:11+00:00
959861178,TH,AMWAY BUSINESS OWNER,2020-12-17 06:42:13+00:00,NORMAL_ABO,2022-01-21 15:56:06,2022-01-21,2022-01-01,2022-01-01,pos,,walkin,Pickup,NORMAL_ORDER,11461.67,2022-01-21 08:58:36+00:00
963882236,TH,AMWAY BUSINESS OWNER,2019-11-27 10:25:45+00:00,NORMAL_ABO,2022-03-31 20:59:35,2022-03-31,2022-03-01,2022-03-01,pos,,walkin,Pickup,NORMAL_ORDER,3349.52,2022-03-31 14:01:56+00:00
954064820,TH,AMWAY BUSINESS OWNER,2004-04-09 17:00:00+00:00,SILVER_PRODUCER_AND_ABOVE,2021-10-01 23:12:59,2021-10-01,2021-10-01,2021-10-01,Web,10500.0,standard-shipping,Home-Delivery,NORMAL_ORDER,10184.1,2021-10-04 11:44:55+00:00
976242763,TH,MEMBER,2022-10-03 17:01:52+00:00,NORMAL_ABO,2022-10-18 14:23:22,2022-10-18,2022-10-01,2022-10-01,pos,,walkin,Pickup,NORMAL_ORDER,1495.33,2022-10-18 07:25:45+00:00
956938491,TH,MEMBER,2019-06-01 17:00:00+00:00,NORMAL_ABO,2021-11-26 13:24:02,2021-11-26,2021-11-01,2021-11-01,pos,,walkin,Pickup,NORMAL_ORDER,2258.88,2021-11-26 06:26:29+00:00
956604446,TH,MEMBER,2020-10-14 11:42:07+00:00,NORMAL_ABO,2021-11-19 15:14:25,2021-11-19,2021-11-01,2021-11-01,Web,76000.0,standard-shipping,Home-Delivery,RENEWAL_ORDER,93.46,2021-11-21 01:04:18+00:00
968075878,TH,AMWAY BUSINESS OWNER,2019-10-18 05:24:08+00:00,NORMAL_ABO,2022-06-25 11:32:42,2022-06-25,2022-06-01,2022-06-01,pos,,walkin,Pickup,NORMAL_ORDER,8536.46,2022-07-07 08:22:47+00:00
952559960,TH,AMWAY BUSINESS OWNER,2011-07-23 17:00:00+00:00,NORMAL_ABO,2021-09-03 17:23:28,2021-09-03,2021-09-01,2021-09-01,pos,,walkin,Pickup,NORMAL_ORDER,1325.23,2021-09-03 10:25:45+00:00


In [32]:
results.index.get_level_values(1).unique()

Index(['TH'], dtype='object', name='market_entity')

In [7]:
results['buyer_type'].unique()

array(['MEMBER', 'AMWAY BUSINESS OWNER', 'FRIEND OF AMWAY', 'EMPLOYEE'],
      dtype=object)

In [8]:
results['buyer_type'].replace({'AMWAY BUSINESS OWNER': 'Distributor', 'MEMBER': 'Member', 'FRIEND OF AMWAY': 'Customer', 'EMPLOYEE': 'Employee'}, inplace=True)
results['buyer_type'].unique()

array(['Member', 'Distributor', 'Customer', 'Employee'], dtype=object)

In [9]:
results['buyer_pin_level'].unique()

array(['NORMAL_ABO', 'PLATINUM_AND_ABOVE', 'SILVER_PRODUCER_AND_ABOVE',
       'DIAMOND_AND_ABOVE'], dtype=object)

In [10]:
results['buyer_pin_level'].replace({'NORMAL_ABO': 'Beginner', 'PLATINUM_AND_ABOVE': 'Platinum+', 'SILVER_PRODUCER_AND_ABOVE': 'Silver+',
       'DIAMOND_AND_ABOVE': 'Diamond+'}, inplace=True)
results['buyer_pin_level'].unique()

array(['Beginner', 'Platinum+', 'Silver+', 'Diamond+'], dtype=object)

In [11]:
results['channel_code'].unique()

array(['Web', 'Line', 'BODYKEY', 'pos', 'WebMobile', 'DMS', 'TMS', 'ASM',
       'KIOSK', 'SOP'], dtype=object)

In [12]:
results['delivery_mode'].unique()

array(['standard-gross', 'pickup', 'combined_delivery_standard_gross',
       'walkin', 'shop-delivery', 'standard-shipping',
       'free-standard-shipping'], dtype=object)

In [13]:
results['delivery_type'].unique()

array(['Home-Delivery', 'Pickup'], dtype=object)

In [14]:
results['delivery_type'].replace({'Home-Delivery' : 'Shipping', 'Pickup': 'Pickup'}, inplace=True)

results['delivery_type'].unique()

array(['Shipping', 'Pickiup'], dtype=object)

In [15]:
results['order_type'].unique()

array(['NORMAL_ORDER', 'RENEWAL_ORDER', 'BACK_ORDER', 'REGISTRATION',
       'AMC_ORDER', 'SERVICE_ORDER', 'REPLACEMENT'], dtype=object)

In [17]:
results['order_type'].replace(
    {'NORMAL_ORDER': 'Normal', 'RENEWAL_ORDER': 'Renewal', 'BACK_ORDER': 'Backorder', 'REGISTRATION': 'Registration', 'AMC_ORDER': 'AMC', 'SERVICE_ORDER': 'Service', 'REPLACEMENT': 'Replacement'}
      , inplace=True
     )
results['order_type'].unique()

array(['Normal', 'Renewal', 'Backorder', 'Registration', 'AMC', 'Service',
       'Replacement'], dtype=object)

In [18]:
cols = {'sum':'daily_sales','mean':'avg_order_value','count':'daily_orders'}
daily_sales_summary = results.groupby(
    ['order_date','bonus_period', 'buyer_type', 'buyer_pin_level', 'channel_code', 'delivery_type', 'order_type' ]
    ).agg(
        {'order_value': ['sum', 'mean', 'count']}
        ).round().rename(
            columns = cols
        )

daily_sales_summary.columns = daily_sales_summary.columns.droplevel(0)    

In [19]:
daily_sales_summary.sample(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,daily_sales,avg_order_value,daily_orders
order_date,bonus_period,buyer_type,buyer_pin_level,channel_code,delivery_type,order_type,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2022-11-22,2022-11-01,Distributor,Platinum+,pos,Pickiup,Service,12149.0,868.0,14
2023-01-24,2023-01-01,Member,Beginner,Line,Shipping,Normal,2564.0,1282.0,2
2022-05-26,2022-05-01,Distributor,Beginner,WebMobile,Shipping,Renewal,1168.0,584.0,2
2021-09-02,2021-09-01,Employee,Beginner,pos,Pickiup,AMC,2907.0,162.0,18
2022-07-07,2022-07-01,Member,Beginner,pos,Shipping,Normal,142393.0,3651.0,39
2022-10-02,2022-10-01,Distributor,Platinum+,DMS,Pickiup,Renewal,7570.0,421.0,18
2022-03-19,2022-03-01,Member,Beginner,WebMobile,Shipping,Renewal,1215.0,93.0,13
2021-12-15,2021-12-01,Member,Beginner,pos,Pickiup,Renewal,20982.0,95.0,221
2021-10-31,2021-12-01,Distributor,Beginner,Web,Shipping,Backorder,34735.0,2316.0,15
2022-06-23,2022-06-01,Member,Beginner,pos,Pickiup,Renewal,20281.0,93.0,217


In [108]:
from datetime import datetime
import pytz

bangkok = pytz.timezone('Asia/Bangkok')
today = datetime.now(bangkok).date()
daily_sales_summary.loc[]

datetime.date(2023, 2, 3)

In [102]:
cols = {'sum':'monthly_sales','mean':'avg_order_value','count':'monthly_orders'}

monthly_sales_summary = daily_sales_summary.groupby(
    ['bonus_period', 'buyer_type', 'buyer_pin_level', 'channel_code', 'delivery_type', 'order_type' ]
    ).agg(
        {'daily_sales': ['sum', 'mean', 'count']}
        ).round().rename(
         columns = cols
        )

monthly_sales_summary.columns = monthly_sales_summary.columns.droplevel(0)    
monthly_sales_summary.sample(10)       

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,monthly_sales,avg_order_value,monthly_orders
bonus_period,buyer_type,buyer_pin_level,channel_code,delivery_type,order_type,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2022-09-01,Distributor,Beginner,ASM,Shipping,Normal,58224.0,4852.0,12
2022-03-01,Member,Beginner,pos,Pickiup,Normal,144319909.0,4655481.0,31
2021-10-01,Distributor,Platinum+,pos,Pickiup,Service,317478.0,10583.0,30
2022-08-01,Distributor,Beginner,WebMobile,Pickiup,Normal,5199105.0,173304.0,30
2022-08-01,Member,Beginner,BODYKEY,Shipping,Backorder,0.0,0.0,2
2022-12-01,Employee,Beginner,pos,Pickiup,Service,4747.0,2374.0,2
2022-07-01,Member,Beginner,Line,Shipping,Normal,48063.0,3697.0,13
2022-06-01,Member,Beginner,ASM,Shipping,Renewal,559.0,112.0,5
2022-12-01,Member,Beginner,Web,Shipping,Backorder,1277252.0,27766.0,46
2022-05-01,Member,Beginner,WebMobile,Shipping,Renewal,54765.0,1767.0,31


In [113]:
monthly_sales_summary['monthly_sales_sply'] = monthly_sales_summary['monthly_sales'].shift(12)
monthly_sales_summary['change_in_monthly_sales_over_sply'] = (monthly_sales_summary['monthly_sales'] - monthly_sales_summary['monthly_sales_sply']) 

monthly_sales_summary['pct_change_in_monthly_sales_over_sply'] = monthly_sales_summary['change_in_monthly_sales_over_sply']  / monthly_sales_summary['monthly_sales_sply']


In [115]:
monthly_sales_summary.loc['2023-01-01']

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,monthly_sales,avg_order_value,monthly_orders,monthly_sales_sply,change_in_monthly_sales_over_sply,pct_change_in_monthly_sales_over_sply
bonus_period,buyer_type,buyer_pin_level,channel_code,delivery_type,order_type,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2023-01-01,Customer,Beginner,Line,Shipping,Normal,8767.0,2922.0,3,78972.0,-70205.0,-0.888986
2023-01-01,Customer,Beginner,Web,Pickiup,Normal,271.0,271.0,1,236925.0,-236654.0,-0.998856
2023-01-01,Customer,Beginner,Web,Shipping,Backorder,4515.0,1505.0,3,3907.0,608.0,0.155618
2023-01-01,Customer,Beginner,Web,Shipping,Normal,221285.0,7138.0,31,133851763.0,-133630478.0,-0.998347
2023-01-01,Customer,Beginner,WebMobile,Shipping,Normal,45386.0,1891.0,24,591136.0,-545750.0,-0.923222
2023-01-01,...,...,...,...,...,...,...,...,...,...,...
2023-01-01,Member,Beginner,pos,Pickiup,Service,389881.0,12996.0,30,1248811.0,-858930.0,-0.687798
2023-01-01,Member,Beginner,pos,Shipping,Backorder,13026.0,1303.0,10,608331.0,-595305.0,-0.978587
2023-01-01,Member,Beginner,pos,Shipping,Normal,3727944.0,124265.0,30,1770694.0,1957250.0,1.105358
2023-01-01,Member,Beginner,pos,Shipping,Registration,55609.0,1794.0,31,127799.0,-72190.0,-0.564871


In [56]:
monthly_sales_cube.loc[:,('total_price', 'mean'):('total_price', 'count')].groupby(level = 0).agg({('total_price','mean'): 'mean', ('total_price', 'count'): 'sum'})

Unnamed: 0_level_0,total_price,total_price
Unnamed: 0_level_1,mean,count
bonus_period_datetime,Unnamed: 1_level_2,Unnamed: 2_level_2
2021-09-01,1828.905172,606157
2021-10-01,2024.89313,603151
2021-11-01,1866.728,592261
2021-12-01,1670.037879,607908
2022-01-01,2026.909774,579471
2022-02-01,1882.725926,627131
2022-03-01,1546.280576,634189
2022-04-01,1524.580882,558751
2022-05-01,1448.913669,603361
2022-06-01,1835.323529,557582


In [32]:
monthly_sales_cube.index.get_level_values(1) 

Index(['Customer', 'Distributor', 'Distributor', 'Distributor', 'Distributor',
       'Distributor', 'Distributor', 'Distributor', 'Distributor',
       'Distributor',
       ...
       'Member', 'Member', 'Member', 'Member', 'Member', 'Member', 'Member',
       'Member', 'Member', 'Member'],
      dtype='object', name='buyer_type', length=2459)