# **BOL Product Dimension Validator**

Purpose of this notebook is to express solutions and validations of the shipment costs and their tariff sizes for multi platform e-commerce businesses.

**WORK CASE:**
Consider you are selling products in two well known e-commerce platforms Amazon and Bol. Both platforms have different tariff for several products according to their product and package sizes for shipments. To be able to validate that your business is getting cut correct tariff size and products are evaluated with correct sizes in the platforms. This approach is a basic automation approach with given inputs of the invoices and the products, following months and validation for financial departments of the businesses can be performed with couple clicks instead of huge sized excel files and manual validation methods. Instead of days long workload can be performed each month with couple clicks ahead! 

**Requirements**


In [3]:
!pip install -r requirements.txt

Collecting python-dateutil>=2.8.2 (from pandas==2.2.2->-r requirements.txt (line 3))
  Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Installing collected packages: python-dateutil
  Attempting uninstall: python-dateutil
    Found existing installation: python-dateutil 2.9.0.post0
    Uninstalling python-dateutil-2.9.0.post0:
      Successfully uninstalled python-dateutil-2.9.0.post0
Successfully installed python-dateutil-2.9.0.post0


DEPRECATION: Loading egg at c:\users\omer\anaconda3\lib\site-packages\nose-1.3.7-py3.12.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330
DEPRECATION: Loading egg at c:\users\omer\anaconda3\lib\site-packages\pycausality-1.2.0-py3.12.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330
DEPRECATION: Loading egg at c:\users\omer\anaconda3\lib\site-packages\python_dateutil-2.6.1-py3.12.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the s

**Packages** 

In [4]:
import os
import pandas as pd 
import numpy as np 

**Data**

Consider your business have a report for the collection of products with their dimensions for variables Length, Width, Height and Unique identifier code.

In [23]:
dim_df = pd.read_excel('product_dimensions.xlsx')
dim_df.head()

Unnamed: 0,Productidentificatie,Length,Width,Height
0,8712345678901,19.6,15.6,11.6
1,8712345678902,61.7,27.8,20.6
2,8712345678903,31.8,10.5,10.1
3,8712345678904,47.0,14.2,8.8
4,8712345678905,12.7,12.7,6.5


Further the collection of invoice reports provided from platforms with respect to tarif groups costs, location, unique identifiers of the products.

In [24]:
df = pd.read_excel('example_data.xlsx')
df.head()

Unnamed: 0,Type,Type productidentificatie,Productidentificatie,Artikelomschrijving,Datum,Bestelnummer,Aantal,Tarief-\ngroep,Tarief,Bedrag,BTW %,Btw-bedrag,Bedrag\n(incl. BTW),Land van verzending,Reden,Opmerking
0,Pick&pack kosten,EAN,8712345678901,product 1,2024-08-01,1,1,M,2.19,2.19,21,0.4599,2.6499,NL,,
1,Pick&pack kosten,EAN,8712345678902,product 2,2024-08-01,2,1,L,2.07,2.07,21,0.4347,2.5047,NL,,
2,Pick&pack kosten,EAN,8712345678903,product 3,2024-08-01,3,1,S,1.65,1.65,21,0.3465,1.9965,NL,,
3,Pick&pack kosten,EAN,8712345678904,product 4,2024-08-01,4,1,M,2.07,2.07,21,0.4347,2.5047,NL,,
4,Pick&pack kosten,EAN,8712345678905,product 5,2024-08-01,5,1,M,2.19,2.19,21,0.4599,2.6499,NL,,


**Bol.com Dimension Tariff Validator Function**

Function below performs operation over two dataframes, first the one user provides with their actual packaging sizes for the products and second one is the invoice report from the platform. Function first evaluates your product sizes into tariff size provided from the Bol.com labels them accordingly and stores. Once this step is performed goes through the invoice reports and runs through all entries to compare products with respect to their identifier codes and validates if the size is labeled correct or not. Following steps for the output will be if the product tariff size is labeled wrong extracts the entry in the invoice report and stores in a list, another output is the correct tarif size for the products. With this application both businesses can find the mistakes occured in the system and easily extract which product labeled wrong and fix the price gaps in the invoices to ensure a safe and legit transactions are ongoing. 

In [25]:
def bol_dim_validate(df,df2):

    df2['tarief_size'] = ''
    wrong_dimensions = []

    for i in range(0,len(df2['Productidentificatie'])):
        if (df2['Length'].iloc[i] < 23.5) & (df2['Width'].iloc[i] < 16.5) & (df2['Height'].iloc[i] < 3):
            df2.loc[i,'tarief_size'] = '3XS'
        elif (df2['Length'].iloc[i] < 37.5) & (df2['Width'].iloc[i] < 26) & (df2['Height'].iloc[i] < 3):
            df2.loc[i,'tarief_size'] = 'XXS'
        elif (df2['Length'].iloc[i] < 37.5) & (df2['Width'].iloc[i] < 26) & (df2['Height'].iloc[i] < 5):
            df2.loc[i,'tarief_size'] = 'XS'
        elif (df2['Length'].iloc[i] < 45) & (df2['Width'].iloc[i] < 30) & (df2['Height'].iloc[i] < 8):
            df2.loc[i,'tarief_size'] = 'S'
        elif (df2['Length'].iloc[i] < 55) & (df2['Width'].iloc[i] < 35) & (df2['Height'].iloc[i] < 20):
            df2.loc[i,'tarief_size'] = 'M'
        elif (df2['Length'].iloc[i] < 72) & (df2['Width'].iloc[i] < 50) & (df2['Height'].iloc[i] < 41):
            df2.loc[i,'tarief_size'] = 'L'
        else:
            df2.loc[i,'tarief_size'] = 'XL'

    for id in df['Productidentificatie'].unique():
        id_df = df[df['Productidentificatie'] == id]

        size_df = df2[df2['Productidentificatie'] == id]

        # Convert unique values to sets for comparison
        tariefgroep_unique = id_df['Tarief-\ngroep'].unique()
        tariefsize_unique = size_df['tarief_size'].unique()

        if (len(tariefgroep_unique) == len(tariefsize_unique)) and (tariefgroep_unique != tariefsize_unique):
            wrong_dimensions.append(id_df)
        else:
            for i in range(0,len(tariefgroep_unique)):
                if tariefgroep_unique[i] != tariefsize_unique[0]:
                    wrong_dimensions.append(id_df[id_df['Tarief-\ngroep'] == tariefgroep_unique[i]])

    wrong_dimensions

    wrong_df = pd.concat(wrong_dimensions, ignore_index=True)

    size_catalogue = df2.copy()

    trimmed_wrong = wrong_df[['Productidentificatie','Tarief-\ngroep']]
    unique_products_tarief = trimmed_wrong[['Productidentificatie', 'Tarief-\ngroep']].drop_duplicates(subset='Productidentificatie').reset_index(drop = True)

    comparison_merged =pd.merge(size_catalogue, unique_products_tarief, on= "Productidentificatie", how= "inner")
    comparison_merged.rename(columns={'Tarief-\ngroep': 'bol.com_tarief'}, inplace=True)


    return wrong_df, size_catalogue, comparison_merged


An example case: 

In [26]:
# Uploading the product
dim_df = pd.read_excel('product_dimensions.xlsx')
dim_df.head()

Unnamed: 0,Productidentificatie,Length,Width,Height
0,8712345678901,19.6,15.6,11.6
1,8712345678902,61.7,27.8,20.6
2,8712345678903,31.8,10.5,10.1
3,8712345678904,47.0,14.2,8.8
4,8712345678905,12.7,12.7,6.5


In [27]:
# Running the function defined 
wrong_dimensions, correct_sizes, tarif_sizes_comparison = bol_dim_validate(df,dim_df)

Function consists three different outputs for different first one is the wrong invoice table entries with in shape of the voice report, another table to update correct sizes of the dimension inputs from business' products itself and the last one is comparison of which unique identifier products conflict with the wrong sized products from bol.com 

In [30]:
wrong_dimensions

Unnamed: 0,Type,Type productidentificatie,Productidentificatie,Artikelomschrijving,Datum,Bestelnummer,Aantal,Tarief-\ngroep,Tarief,Bedrag,BTW %,Btw-bedrag,Bedrag\n(incl. BTW),Land van verzending,Reden,Opmerking
0,Pick&pack kosten,EAN,8712345678903,product 3,2024-08-01,3,1,S,1.65,1.65,21,0.3465,1.9965,NL,,
1,Pick&pack kosten,EAN,8712345678905,product 5,2024-08-01,5,1,M,2.19,2.19,21,0.4599,2.6499,NL,,
2,Pick&pack kosten,EAN,8712345678913,product 13,2024-08-01,13,1,XS,2.93,2.93,21,0.6153,3.5453,NL,,
3,Pick&pack kosten,EAN,8712345678920,product 20,2024-08-01,20,1,L,2.93,2.93,21,0.6153,3.5453,NL,,
4,Pick&pack kosten,EAN,8712345678925,product 25,2024-08-01,25,1,S,2.93,2.93,21,0.6153,3.5453,NL,,
5,Pick&pack kosten,EAN,8712345678933,product 33,2024-08-01,33,1,XS,1.65,1.65,21,0.3465,1.9965,NL,,
6,Pick&pack kosten,EAN,8712345678939,product 39,2024-08-01,39,1,L,2.07,2.07,21,0.4347,2.5047,NL,,
7,Pick&pack kosten,EAN,8712345678952,product 52,2024-08-01,52,1,S,1.5,1.5,21,0.315,1.815,NL,,
8,Pick&pack kosten,EAN,8712345678957,product 57,2024-08-02,57,1,XL,2.07,2.07,21,0.4347,2.5047,NL,,
9,Pick&pack kosten,EAN,8712345678963,product 63,2024-08-02,63,1,S,2.93,2.93,21,0.6153,3.5453,NL,,


In [21]:
correct_sizes

Unnamed: 0,Productidentificatie,Length,Width,Height,tarief_size
0,8712345678901,19.6,15.6,11.6,M
1,8712345678902,61.7,27.8,20.6,L
2,8712345678903,31.8,10.5,10.1,M
3,8712345678904,47.0,14.2,8.8,M
4,8712345678905,12.7,12.7,6.5,S
...,...,...,...,...,...
60,8712345678961,55.0,60.0,55.0,XL
61,8712345678962,59.0,59.0,5.5,XL
62,8712345678963,37.3,31.8,3.3,M
63,8712345678964,38.0,40.0,47.0,XL


In [31]:
tarif_sizes_comparison

Unnamed: 0,Productidentificatie,Length,Width,Height,tarief_size,bol.com_tarief
0,8712345678903,31.8,10.5,10.1,M,S
1,8712345678905,12.7,12.7,6.5,S,M
2,8712345678913,24.0,19.0,7.0,S,XS
3,8712345678920,40.0,30.0,5.0,M,L
4,8712345678925,49.4,22.4,5.0,M,S
5,8712345678933,25.0,21.0,7.5,S,XS
6,8712345678939,21.3,18.7,14.9,M,L
7,8712345678952,32.2,24.4,13.3,M,S
8,8712345678957,43.0,33.0,5.0,M,XL
9,8712345678963,37.3,31.8,3.3,M,S


In [None]:
# tarif_sizes_comparison.to_excel('comparison_sizes.xlsx',index = False)
# wrong_dimensions.to_excel('wrong_dimensions.xlsx',index = False)
# correct_sizes.to_excel('correct_sizes.xlsx', index=False)