<left>
    <a href=img><img src = "https://github.com/tiagottmoraes/CS-s-Data-Science-Test/blob/main/Misc/1c86bfb3-cab0-4255-99bf-24c5488d888a-1632765661247.png?raw=true" width="400"  />
</left>

# Data Science Test - Intro

----

The following notebooks presents the complete dataset analysis, exploration and pre-processing steps required to implement a tailored machine learning algorithm designed to **predict the total minutes** ('total_minutes' label) it takes to a shopper to complete a given order.The rows not containing a total_minutes value were set aside to be used in the predictions phase.

For that, **3 different notebooks** were prepared for each task, as following:

- **[1. ETL process notebook:](#download-the-data)** Complete ETL (data wrangling) process of the provided datasets, including feature engineering, feature transformations, and data standardizations, necessary to carry out all the statistics tests necessaries in the next phase;

- **[2. EDA notebook:](#download-the-data)** Statistical exploration and understanding of the pre-processed dataset. Needed to define the best machine learning strategy for the target Label;

- **[3. Machine Learning Model development notebook:](#download-the-data)** Build, train and test a ML model that takes in the process the dataset, generates a model and deploy it with unseen data to generate predictions ("total_minutes" label)
<br />
<br />

For this test, 4 different datasets were provided, each of them with different types of features and volume of data. 

**order_products.csv (198500 rows)**

*order_id*: ID of the order\
*product_id*: ID of the product\
*quantity*: The quantity ordered of this product\
*buy_unit*: The unit of the product (KG/UN)

**orders.csv (10000 rows)**

*order_id*: ID of the order\
*lat*: The latitude of the delivery location\
*lng*: The longitude of the delivery location\
*promised_time*: The delivery time promised to the user\
*on_demand*: If true, the order was promised to be delivered in less than X minutes\
*shopper_id*: ID representing the shopper completed the order\
*store_branch_id*: ID of the store branch\
***total_minutes***: The total minutes it took to complete the order (label)

**shopper.csv (2864 rows)**

*shopper_id*: ID of the shopper\
*seniority*: The experience level of the shopper\
*found_rate*: Percentage of products found by shopper historical\
*picking_speed*: Historical picking speed, products pr minutes\
*accepted_rate*: Percentage of orders historically accepted by shopper\
*rating*: client rating of shopper

**storebranch.csv (476 rows)**

*store_branch_id*: ID of the store branch\
*store*: ID representing the store\
*lat*: Latitude of the branch location\
*lng*: Longitude of the branch location
<br />
<br />


# 1. ETL process - Data wrangling, featute engineering and data standardization


----

## 1. Importing necessary modules

In [53]:
import pandas as pd
import numpy as np
import pandasql as sqldf
from geopy import Point
from geopy.distance import distance
import datetime

#### 1.1 Understanding primary keys between tables

Before start loading the datasets, let's take a look at how the tables are connected to each other and have a better undestanding of case at hand.



<center>
    <a href=img><img src = "https://github.com/tiagottmoraes/CS-s-Data-Science-Test/blob/main/entity_mapping.PNG?raw=true" width="800"  />
</center>

As we can see from the entity map above, primary keys from "storebranch.csv", "order_product.csv" , "shopper_ID.csv" can be found in the "orders.csv", which happens to be the table where the label ("total_minutes") is located. In this case, the better approach would be to unify (merge) all tables together, using the primary keys, into the table "order.csv". 

## 2. Importing datasets

### 2.1 Loading dataset "order_prod.csv"

After loading the order_prod.csv into a dataframe called **df_ordprod**, let's look at the first five rows using the `head()` function:

In [6]:
## Order_products
filename_ordprod = "https://raw.githubusercontent.com/tiagottmoraes/CS-s-Data-Science-Test/main/data/order_products.csv"
df_ordprod = pd.read_csv(filename_ordprod)
df_ordprod.head()


Unnamed: 0,order_id,product_id,quantity,buy_unit
0,47099653730fb1b76537fc10ad876255,c1244453d731c77416cb4766e3bd76cb,1.0,UN
1,689d8866915acf87e851c2591a23a82f,43cc2b100bec640fe563cd16f2db669f,1.0,KG
2,f26d16bf6f38c9e31d0be877f4013a9e,b8f880759d014134e272d881d49989a2,1.0,UN
3,161ccc896835ab41761b0e726becb6b1,dbc062b9bef805d27a6f4bea7edfe1f1,1.0,UN
4,4713deca10bb5db98fae150b52d61fc0,93a060f269bb569398921100f84c519a,2.0,UN


Checking missing data in df_ordprod:

In [7]:
missing_data_order = df_ordprod.isnull()
for column in missing_data_order.columns.values.tolist():
    print(column)
    print(missing_data_order[column].value_counts())
    print("")

order_id
False    198500
Name: order_id, dtype: int64

product_id
False    198500
Name: product_id, dtype: int64

quantity
False    198500
Name: quantity, dtype: int64

buy_unit
False    198500
Name: buy_unit, dtype: int64



NOTE:No missing values were found within this dataframe 

2.1.1 Feature engineering

In [8]:
print('Unique products listed:', df_ordprod['product_id'].nunique(),'\n'
      'Unique orders listed:', df_ordprod['order_id'].nunique())

Unique products listed: 15422 
Unique orders listed: 9978


The number of products is much bigger than the number of orders made, which is to be expected, since a single order takes many products. With that prospect in mind, an assumption made from now on is that, **what defines the complexicity of a given order is the number of unique products, not the the quantity of each product**.\
\
A simple example that illustrates this: an order consisting of 12 different products (one item per product) demands much more attention and movement (and most likely *time*) from a shopper than what an order consisting of only 3 products, but with several items per product would.\
\
With that in mind, the "df_ordprod" was _grouped by the order_ID feature_, since that will give us the number of unique products present in each order, hence, giving us the indicator that is most likely linked to the complexicity of a given order (which, in turn, makes orders take more time to be completed).

In [42]:
df_ordprod = df_ordprod.groupby(["order_id"], as_index=False).count()
df_ordprod.rename(columns={'product_id':'order_size'}, inplace=True)
df_ordprod.head()

Unnamed: 0,order_id,order_size,quantity,buy_unit
0,0004a3841c1eeb6c6e77585a941c21e0,4,4,4
1,0005a6ecbbde1e8d273f5577bcff2c9c,1,1,1
2,0007baeb6700fc203be2d1f1e11222d7,22,22,22
3,0012195a6a8ca9ec308a3010eeea8ebc,11,11,11
4,0013011fa72b498b9feb84f4e7104980,44,44,44


NOTE: "product_id" was renamed to "order_size", since it shows the number of unique products in each order, and to avoid confusion from now on

### 2.2 Loading dataset "shopper.csv"

After loading the shoppers.csv into a dataframe called **df_shop**, let's look at the first five rows using the `head()` function:

In [39]:
## Shopper
filename_shop = "https://raw.githubusercontent.com/tiagottmoraes/CS-s-Data-Science-Test/main/data/shoppers.csv"
df_shop = pd.read_csv(filename_shop)
df_shop.head()

Unnamed: 0,shopper_id,seniority,found_rate,picking_speed,accepted_rate,rating
0,1fc20b0bdf697ac13dd6a15cbd2fe60a,41dc7c9e385c4d2b6c1f7836973951bf,0.8606,1.94,1.0,4.87
1,e1c679ac73a69c01981fdd3c5ab8beda,6c90661e6d2c7579f5ce337c3391dbb9,0.8446,1.23,0.92,4.92
2,09d369c66ca86ebeffacb133410c5ee1,6c90661e6d2c7579f5ce337c3391dbb9,0.8559,1.56,1.0,4.88
3,db39866e62b95bb04ebb1e470f2d1347,50e13ee63f086c2fe84229348bc91b5b,,2.41,,
4,8efbc238660053b19f00ca431144fdae,6c90661e6d2c7579f5ce337c3391dbb9,0.877,1.31,0.92,4.88


Checking missing data in df_shop:

In [40]:
missing_data_order = df_shop.isnull()
for column in missing_data_order.columns.values.tolist():
    print(column)
    print(missing_data_order[column].value_counts())
    print("")

shopper_id
False    2864
Name: shopper_id, dtype: int64

seniority
False    2864
Name: seniority, dtype: int64

found_rate
False    2763
True      101
Name: found_rate, dtype: int64

picking_speed
False    2864
Name: picking_speed, dtype: int64

accepted_rate
False    2837
True       27
Name: accepted_rate, dtype: int64

rating
False    2780
True       84
Name: rating, dtype: int64



NOTE: There were found:
- 101 NaN elements in the 'found_rate' feature
- 27 NaN elements in the 'accepted_rate' feature
- 84 NaN elements in the 'rating' feature

In [41]:
print('Unique shoppers listed:', df_shop['shopper_id'].nunique(),'\n' 
      'Unique seniority listed:', df_shop['seniority'].nunique())

Unique shoppers listed: 2864 
Unique seniority listed: 4


Despite being anonymized, the "seniority" feature consists of only 4 different classes. In order to help clarify this feature and its relationship with other elements in this dataframe, the original hash code was replaced by random letters (T, I, M, W)

In [42]:
df_shop['seniority'].unique()

array(['41dc7c9e385c4d2b6c1f7836973951bf',
       '6c90661e6d2c7579f5ce337c3391dbb9',
       '50e13ee63f086c2fe84229348bc91b5b',
       'bb29b8d0d196b5db5a5350e5e3ae2b1f'], dtype=object)

In [43]:
df_shop.replace('41dc7c9e385c4d2b6c1f7836973951bf', 'T', inplace=True)
df_shop.replace('6c90661e6d2c7579f5ce337c3391dbb9','I', inplace=True)
df_shop.replace('50e13ee63f086c2fe84229348bc91b5b','M', inplace=True)
df_shop.replace('bb29b8d0d196b5db5a5350e5e3ae2b1f','W', inplace=True)

In [46]:
# Counting the number of shoppers in each seniority class
df_shop['seniority'].value_counts()

I    1643
M     719
T     440
W      62
Name: seniority, dtype: int64

In [47]:
df_shop.sample(2)

Unnamed: 0,shopper_id,seniority,found_rate,picking_speed,accepted_rate,rating
1971,1a9e69d910b4f1ce90528b745b803615,M,0.8819,4.84,0.84,4.36
1229,4fd9450479e368c09cf4a9c8cef76f43,I,0.875,1.36,1.0,4.96


### 2.3 Loading dataset "storebranch.csv"

After loading the storebranch.csv into a dataframe called **df_store**, let's look at the first five rows using the `head()` function:

In [58]:
## Storebranch
filename_store = "https://raw.githubusercontent.com/tiagottmoraes/CS-s-Data-Science-Test/main/data/storebranch.csv"
df_store = pd.read_csv(filename_store)
df_store.head()

Unnamed: 0,store_branch_id,store_id,lat,lng
0,aff1621254f7c1be92f64550478c56e6,92cc227532d17e56e07902b254dfad10,-33.422497,-70.609231
1,56352739f59643540a3a6e16985f62c7,0336dcbab05b9d5ad24f4333c7658a0e,-33.385484,-70.555579
2,7d04bbbe5494ae9d2f5a76aa1c00fa2f,9bf31c7ff062936a96d3c8bd1f8f2ff3,-33.416579,-70.565224
3,2b24d495052a8ce66358eb576b8912c8,c4ca4238a0b923820dcc509a6f75849b,-33.512578,-70.655952
4,5487315b1286f907165907aa8fc96619,d82c8d1619ad8176d665453cfb2e55f0,-33.347645,-70.542229


Checking missing data in df_store:

In [59]:
missing_data_order = df_store.isnull()
for column in missing_data_order.columns.values.tolist():
    print(column)
    print(missing_data_order[column].value_counts())
    print("")

store_branch_id
False    476
Name: store_branch_id, dtype: int64

store_id
False    476
Name: store_id, dtype: int64

lat
False    476
Name: lat, dtype: int64

lng
False    476
Name: lng, dtype: int64



NOTE:No missing values were found within this dataframe 

In [60]:
print('Unique store_branch listed:', df_store['store_branch_id'].nunique(),'\n' 'Unique store listed:', df_store['store_id'].nunique())

Unique store_branch listed: 476 
Unique store listed: 221


Considering that both "storebranch.csv" and "orders.csv" have different data on latitude and longitude coordinates but uses the same name convention, let's change both column's names to help differentiate them

In [61]:
# renaming the columns
df_store = df_store.rename(columns={'lat':'store_lat', 'lng':'store_long'})

Let's change the latitude and longitude values and convert it into a single geographic point using Geopy

In [62]:
df_store['store_coord'] = df_store.apply(lambda row: Point(latitude=row['store_lat'], longitude=row['store_long']), axis=1)
df_store = df_store.drop(['store_lat', 'store_long'], axis=1)
df_store.head(3)

Unnamed: 0,store_branch_id,store_id,store_coord
0,aff1621254f7c1be92f64550478c56e6,92cc227532d17e56e07902b254dfad10,"33 25m 20.9892s S, 70 36m 33.2316s W"
1,56352739f59643540a3a6e16985f62c7,0336dcbab05b9d5ad24f4333c7658a0e,"33 23m 7.7424s S, 70 33m 20.0844s W"
2,7d04bbbe5494ae9d2f5a76aa1c00fa2f,9bf31c7ff062936a96d3c8bd1f8f2ff3,"33 24m 59.6844s S, 70 33m 54.8064s W"
3,2b24d495052a8ce66358eb576b8912c8,c4ca4238a0b923820dcc509a6f75849b,"33 30m 45.2794s S, 70 39m 21.4254s W"
4,5487315b1286f907165907aa8fc96619,d82c8d1619ad8176d665453cfb2e55f0,"33 20m 51.522s S, 70 32m 32.0244s W"


### 2.4 Loading dataset "orders.csv"

After loading the orders.csv into a dataframe called **df_order**, let's look at the first five rows using the `head()` function:

In [67]:
## ORDERS - LABEL
filename_ORDER = "https://raw.githubusercontent.com/tiagottmoraes/CS-s-Data-Science-Test/main/data/orders.csv"
df_ORDER = pd.read_csv(filename_ORDER)
df_ORDER.head()

Unnamed: 0,order_id,lat,lng,promised_time,on_demand,shopper_id,store_branch_id,total_minutes
0,e750294655c2c7c34d83cc3181c09de4,-33.501675,-70.579369,2019-10-18 20:48:00+00:00,True,e63bc83a1a952fa2b3cc9d558fb943cf,65ded5353c5ee48d0b7d48c591b8f430,67.684264
1,6581174846221cb6c467348e87f57641,-33.440584,-70.556283,2019-10-19 01:00:00+00:00,False,195f9e9d84a4ba9033c4b6a756334d8b,45fbc6d3e05ebd93369ce542e8f2322d,57.060632
2,3a226ea48debc0a7ae9950d5540f2f34,-32.987022,-71.544842,2019-10-19 14:54:00+00:00,True,a5b9ddc0d82e61582fca19ad43dbaacb,07563a3fe3bbe7e3ba84431ad9d055af,
3,7d2ed03fe4966083e74b12694b1669d8,-33.328075,-70.512659,2019-10-18 21:47:00+00:00,True,d0b3f6bf7e249e5ebb8d3129341773a2,f1748d6b0fd9d439f71450117eba2725,52.067742
4,b4b2682d77118155fe4716300ccf7f39,-33.403239,-70.56402,2019-10-19 20:00:00+00:00,False,5c5199ce02f7b77caa9c2590a39ad27d,1f0e3dad99908345f7439f8ffabdffc4,140.724822


Checking missing data in df_ORDER:

In [64]:
missing_data_order = df_ORDER.isnull()
for column in missing_data_order.columns.values.tolist():
    print(column)
    print(missing_data_order[column].value_counts())
    print("")

order_id
False    10000
Name: order_id, dtype: int64

lat
False    10000
Name: lat, dtype: int64

lng
False    10000
Name: lng, dtype: int64

promised_time
False    10000
Name: promised_time, dtype: int64

on_demand
False    10000
Name: on_demand, dtype: int64

shopper_id
False    10000
Name: shopper_id, dtype: int64

store_branch_id
False    10000
Name: store_branch_id, dtype: int64

total_minutes
False    8000
True     2000
Name: total_minutes, dtype: int64



NOTE: No null values, EXCEPT for the label "total_minutes". The rows containg this missing values in the label will be later taken apart to be used during the predictions phase.

Considering that both "storebranch.csv" and "orders.csv" have different data on latitude and longitude coordinates but uses the same name convention, let's change both column's names to help differentiate them.

In [69]:
df_ORDER = df_ORDER.rename(columns={'lat':'delivery_lat', 'lng':'delivery_long'})

Again, let's change the latitude and longitude values and convert it into a single geographic point using Geopy

In [70]:
df_ORDER['delivery_coord'] = df_ORDER.apply(lambda row: Point(latitude=row['delivery_lat'], longitude=row['delivery_long']), axis=1)
df_ORDER = df_ORDER.drop(['delivery_lat', 'delivery_long'], axis=1)
df_ORDER.head(3)

Unnamed: 0,order_id,promised_time,on_demand,shopper_id,store_branch_id,total_minutes,delivery_coord
0,e750294655c2c7c34d83cc3181c09de4,2019-10-18 20:48:00+00:00,True,e63bc83a1a952fa2b3cc9d558fb943cf,65ded5353c5ee48d0b7d48c591b8f430,67.684264,"33 30m 6.0284s S, 70 34m 45.727s W"
1,6581174846221cb6c467348e87f57641,2019-10-19 01:00:00+00:00,False,195f9e9d84a4ba9033c4b6a756334d8b,45fbc6d3e05ebd93369ce542e8f2322d,57.060632,"33 26m 26.1024s S, 70 33m 22.6182s W"
2,3a226ea48debc0a7ae9950d5540f2f34,2019-10-19 14:54:00+00:00,True,a5b9ddc0d82e61582fca19ad43dbaacb,07563a3fe3bbe7e3ba84431ad9d055af,,"32 59m 13.2807s S, 71 32m 41.4316s W"


### 2.5 Merging dataframes

Merging dataframes using primary keys

1. First, merge df_ordprod in df_ORDER, using "order_id" as a relationship key:

In [57]:
df_merge = pd.merge(df_ORDER, df_ordprod, on='order_id')

2. Then, merge df_shop in df_merge (result from previous merge operation), using "shopper_id" as a relationship key:

In [58]:
df_merge = pd.merge(df_merge, df_shop, on='shopper_id')

3. Finally, merge df_store in df_merge (result from previous merge operation), using "store_branch_id" as a relationship key:

In [59]:
df_merge = pd.merge(df_merge, df_store, on='store_branch_id')

As result, we get the following dataframe:

In [60]:
pd.set_option('display.max_columns', None)
df_merge.sample(2)

Unnamed: 0,order_id,delivery_lat,delivery_long,promised_time,on_demand,shopper_id,store_branch_id,total_minutes,order_size,quantity,buy_unit,seniority,found_rate,picking_speed,accepted_rate,rating,store_id,store_lat,store_long
9793,7b0968b6c2f4005c610189daed4cb539,-33.37634,-70.55103,2019-10-19 19:00:00+00:00,False,deab3cbcc3ec4063b7eac6033d76b8bf,c215b446bcdf956d848a8419c1b5a920,37.636804,1,1,1,I,0.8663,1.0,1.0,4.92,41ae36ecb9b3eee609d05b90c14222fb,-33.424688,-70.581248
7047,1085db45141dea3d8a4113d3ef3311c0,-33.456306,-70.658099,2019-10-19 01:00:00+00:00,False,fe3c9f683121b6f5375e4563374001c3,3871bd64012152bfb53fdf04b401193f,148.336725,41,41,41,T,0.8199,2.51,0.941176,4.83,c4ca4238a0b923820dcc509a6f75849b,-33.451695,-70.69216


## 3 Dealing with missing data

### 3.1 Removing/Replacing missing values

As shown before, some features presents missing values that must be addressed before we head to the exploratory analysis. 


In [63]:
df_merge.drop(['promised_time', 'store_id','quantity', 'buy_unit'], axis=1, inplace=True)

In [64]:
df_merge = df_merge.reindex(['order_id','delivery_lat','delivery_long','store_branch_id','store_lat','store_long','shopper_id','on_demand','seniority', 'order_size', 'found_rate','picking_speed','accepted_rate','rating', 'total_minutes'], axis=1)

In [65]:
df_merge.sample(2)

Unnamed: 0,order_id,delivery_lat,delivery_long,store_branch_id,store_lat,store_long,shopper_id,on_demand,seniority,order_size,found_rate,picking_speed,accepted_rate,rating,total_minutes
9268,e47788ba1fc7376688a5f9837abd4624,-33.434524,-70.572432,4ea6a546c19499318091a9df40a13181,-33.412419,-70.57907,c1fb26dbd4f09a5fb001d2fa521ec64d,False,T,3,0.8667,2.43,0.96,4.84,48.174585
2424,0d0678b430fb06d091ff695d53c7138d,-33.322136,-70.567584,1f0e3dad99908345f7439f8ffabdffc4,-33.386547,-70.568075,a950bb7f44cf4253538140e02ae33fb9,False,T,25,0.877,2.17,1.0,4.96,99.063378


In [66]:
print(" \nTotal number of NaN items present in the columns of the DataFrame 'df_merge': \n\n", df_merge.isnull().sum())

 
Total number of NaN items present in the columns of the DataFrame 'df_merge': 

 order_id              0
delivery_lat          0
delivery_long         0
store_branch_id       0
store_lat             0
store_long            0
shopper_id            0
on_demand             0
seniority             0
order_size            0
found_rate          199
picking_speed         0
accepted_rate        46
rating              162
total_minutes      1995
dtype: int64


In [67]:
df_merge.dtypes

order_id            object
delivery_lat       float64
delivery_long      float64
store_branch_id     object
store_lat          float64
store_long         float64
shopper_id          object
on_demand             bool
seniority           object
order_size           int64
found_rate         float64
picking_speed      float64
accepted_rate      float64
rating             float64
total_minutes      float64
dtype: object

In [68]:
# calculating the average
avg_found = df_merge['found_rate'].mean(axis=0)
avg_accept = df_merge['accepted_rate'].mean(axis=0)
avg_rate = df_merge['rating'].mean(axis=0)
print("Average of found_rate:", avg_found )
print("Average of accepted_rate:", avg_accept )
print("Average of rating:", avg_rate )


Average of found_rate: 0.8633177727784027
Average of accepted_rate: 0.9170133784336325
Average of rating: 4.849341890790547


In [69]:
# replacing the NaN with the average values
df_merge['found_rate'].replace(np.nan, avg_found, inplace=True)
df_merge['accepted_rate'].replace(np.nan, avg_accept, inplace=True)
df_merge['rating'].replace(np.nan, avg_rate, inplace=True)

In [70]:
print(" \nTotal number of NaN items present in the columns of the DataFrame 'df_merge': \n\n", df_merge.isnull().sum())

 
Total number of NaN items present in the columns of the DataFrame 'df_merge': 

 order_id              0
delivery_lat          0
delivery_long         0
store_branch_id       0
store_lat             0
store_long            0
shopper_id            0
on_demand             0
seniority             0
order_size            0
found_rate            0
picking_speed         0
accepted_rate         0
rating                0
total_minutes      1995
dtype: int64


In [72]:
df_merge['distance_km'] = df_merge.apply(lambda row: distance(row['delivery_coord'], row['store_coord']).km , axis=1)


In [73]:
df_merge.drop(['delivery_lat','delivery_long','store_lat','store_long','delivery_coord', 'store_coord'], axis=1, inplace=True)
df_merge = df_merge.reindex(['order_id','store_branch_id','distance_km','shopper_id','on_demand','seniority', 'order_size', 'found_rate','picking_speed','accepted_rate','rating', 'total_minutes'], axis=1)
df_merge.sample(2)

Unnamed: 0,order_id,store_branch_id,distance_km,shopper_id,on_demand,seniority,order_size,found_rate,picking_speed,accepted_rate,rating,total_minutes
265,6da913f1948ae412b1e3843ec09fe8e0,45fbc6d3e05ebd93369ce542e8f2322d,3.901431,c62fa64bbba97382f77b8c4a3c507fc4,True,T,6,0.8725,2.8,0.88,4.88,45.419323
1464,aafc765551d9ffe7dadde9f79cc10663,1679091c5a880faf6fb5e6087eb1b2dc,4.569004,dbab5f4147fb719554010645dcfcd44c,False,I,104,0.8366,1.81,1.0,4.52,229.031673


In [74]:
df_TEST = df_merge.loc[(pd.isna(df_merge['total_minutes']))]
df_TEST.to_csv('df_TEST.csv')

In [75]:
df_TRAIN = df_merge.loc[(pd.notna(df_merge['total_minutes']))]
df_TRAIN.to_csv('df_TRAIN.csv')

In [76]:
df_TRAIN.sample(2)

Unnamed: 0,order_id,store_branch_id,distance_km,shopper_id,on_demand,seniority,order_size,found_rate,picking_speed,accepted_rate,rating,total_minutes
3240,7fc782a82e8c75f83d077478359e0ee1,1f0e3dad99908345f7439f8ffabdffc4,1.367594,45c0e362f8d52c3969805f6edef08aad,True,I,10,0.8769,1.32,0.92,4.92,30.99782
4619,f0941968ba1a154b87fd11482bbfd134,c4ca4238a0b923820dcc509a6f75849b,1.637458,8e9c3d057240e3d1308f47d58fd71e69,True,I,22,0.8644,1.43,1.0,4.96,75.10325
