# Order Brushing - Data Analysis
Shopee code league round 1

### Given Assumption and Definition 
- Brushing shop = concentrate rate greater of equal to 3 at any instances
- Concentrate rate = Number of orders within 1 hour/ Number of unique Buyers within 1 hours
- Brushing buyer = buyer that contributed the ```highest proportion of orders to a shop```
- Highest proportion of orders should include the ordres that occured in instances when brushing

### Basic Concepts
- Each ```orderid``` represnets a distinct transation on Shopee.
- Each unique ```shopid``` is a distinct seller on Shopee.
- Each unique ```userid``` ia a distinct buyer on Shopee.
- Event time refers to the exact time that an order was replaced on Shopee.

In [222]:
import pandas as pd

In [223]:
df = pd.read_csv('order_brush_order.csv')

In [224]:
df.columns

Index(['orderid', 'shopid', 'userid', 'event_time'], dtype='object')

In [225]:
df.head(5)

Unnamed: 0,orderid,shopid,userid,event_time
0,31076582227611,93950878,30530270,2019-12-27 00:23:03
1,31118059853484,156423439,46057927,2019-12-27 11:54:20
2,31123355095755,173699291,67341739,2019-12-27 13:22:35
3,31122059872723,63674025,149380322,2019-12-27 13:01:00
4,31117075665123,127249066,149493217,2019-12-27 11:37:55


In [226]:
df['event_time'] = pd.to_datetime(df['event_time'])

In [227]:
df.shape

(222750, 4)

In [228]:
df.isna().sum()

orderid       0
shopid        0
userid        0
event_time    0
dtype: int64

In [229]:
def get_sorted_unique_by_column_name(column_name):
    t = df[column_name].unique()
    t.sort()
    return t

unique_shop_ids = get_sorted_unique_by_column_name('shopid')

In [230]:
groupby_shop_id_df = df.groupby('shopid')

In [237]:
def get_1hour_window(t):
    return t + pd.Timedelta(hours = 1)

In [242]:
def concentration_rate(lower_time, upper_time, same_shop_id_df):
    windowed_df = same_shop_id_df[(lower_time <= same_shop_id_df['event_time']) & 
                                  (same_shop_id_df['event_time'] <= upper_time)]
    
    number_of_order = len(windowed_df)
    unique_user_id = windowed_df['userid'].unique()
    
    return float(number_of_order)/float(unique_user_id) >= 3.0

In [243]:
def get_time_range_to_brushing_user_ids_to_max_number_of_orders_pair(same_shop_id_df, time_range):
    #TBD
    
    return (time_range, [], 0)

In [244]:
def flatten(list_of_list):
    return [val for sublist in list_of_list for val in sublist]

In [231]:
def get_brushing_use_id(shop_id):
    same_shope_id = groupby_shop_id_df.get_group(shop_id)
    
    time_ranges = list(map(lambda event_time : 
                           (event_time, get_1hour_window(event_time)), 
                           same_shop_id _df['event_time']))
    
    
    time_ranges_to_brushing_use_ids_to_max_number = [
        get_time_range_to_brushing_user_ids_to_max_number_of_orders_pair(same_shop_id_df, time_range) for time_range in time_ranges()  
    ]
    
    ultimate_max_number_of_orders = max(list(map(
        (lambda _, _, max_number_of_orders: max_number_of_orders), 
        time_ranges_to_brushing_use_ids_to_max_number)))
    
    ps = filter((lambda _, _, max_number_of_orders: 
                 max_number_of_orders  == ultimate_max_number_of_orders),
                time_ranges_to_brushing_use_ids_to_max_number)
    
    brushing_user_ids = np.array([brushing_uise_ids for _, brushing_user_ids, _ in ps]).flatten().unique()
    brushing_user_ids.sort()
    
    return 0

In [232]:
result = list(map(
    lambda shop_id: (shop_id,get_brushing_use_id(shop_id)), unique_shop_ids)
             )

In [234]:
result_df = pd.DataFrame(result, columns = ['shopid', 'userid'])

In [235]:
result_df

Unnamed: 0,shopid,userid
0,10009,0
1,10051,0
2,10061,0
3,10084,0
4,10100,0
...,...,...
18765,214662358,0
18766,214949521,0
18767,214964814,0
18768,215175775,0


## Trial

In [159]:
temp_df = df[df['shopid'] == 147941492].sort_values(by = 'event_time')
temp_df = temp_df.set_index('event_time')

In [160]:
count_df = temp_df.groupby([temp_df.index.date, temp_df.index.hour])['orderid'].count().reset_index()

In [161]:
count_df

Unnamed: 0,level_0,event_time,orderid
0,2019-12-27,0,268
1,2019-12-27,1,104
2,2019-12-27,2,52
3,2019-12-27,3,34
4,2019-12-27,4,9
...,...,...,...
115,2019-12-31,19,89
116,2019-12-31,20,73
117,2019-12-31,21,59
118,2019-12-31,22,72


In [106]:
(count_df['orderid'] >= 3).any()

True

In [107]:
print(temp_df['shopid'].unique())

[147941492]
