# **Cleaning and Preparation of data**

This dataset is from the Amazon Sales Report file from https://www.kaggle.com/datasets/thedevastator/unlock-profits-with-e-commerce-sales-data/data. 

It contains data capturing the fulfilment and delivery status of different transactions. However, this dataset primarily focuses on textile products while our project focuses on electronics. Hence, we will be generating synthetic data based on the constraints in the columns we are interested in in the Amazon Sales Report dataset.

In [151]:
# Importing packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
from datetime import timedelta
import warnings
warnings.filterwarnings('ignore')

In [152]:
# Load datasets

df = pd.read_csv('../source/Amazon Sale Report.csv')
online_sales_df = pd.read_csv('../online_sales_edited.csv')

In [153]:
# Check the first few rows of df

df.head()

Unnamed: 0,index,Order ID,Date,Status,Fulfilment,Sales Channel,ship-service-level,Style,SKU,Category,...,currency,Amount,ship-city,ship-state,ship-postal-code,ship-country,promotion-ids,B2B,fulfilled-by,Unnamed: 22
0,0,405-8078784-5731545,04-30-22,Cancelled,Merchant,Amazon.in,Standard,SET389,SET389-KR-NP-S,Set,...,INR,647.62,MUMBAI,MAHARASHTRA,400081.0,IN,,False,Easy Ship,
1,1,171-9198151-1101146,04-30-22,Shipped - Delivered to Buyer,Merchant,Amazon.in,Standard,JNE3781,JNE3781-KR-XXXL,kurta,...,INR,406.0,BENGALURU,KARNATAKA,560085.0,IN,Amazon PLCC Free-Financing Universal Merchant ...,False,Easy Ship,
2,2,404-0687676-7273146,04-30-22,Shipped,Amazon,Amazon.in,Expedited,JNE3371,JNE3371-KR-XL,kurta,...,INR,329.0,NAVI MUMBAI,MAHARASHTRA,410210.0,IN,IN Core Free Shipping 2015/04/08 23-48-5-108,True,,
3,3,403-9615377-8133951,04-30-22,Cancelled,Merchant,Amazon.in,Standard,J0341,J0341-DR-L,Western Dress,...,INR,753.33,PUDUCHERRY,PUDUCHERRY,605008.0,IN,,False,Easy Ship,
4,4,407-1069790-7240320,04-30-22,Shipped,Amazon,Amazon.in,Expedited,JNE3671,JNE3671-TU-XXXL,Top,...,INR,574.0,CHENNAI,TAMIL NADU,600073.0,IN,,False,,


In [154]:
# Check column names of df

df.columns

Index(['index', 'Order ID', 'Date', 'Status', 'Fulfilment', 'Sales Channel ',
       'ship-service-level', 'Style', 'SKU', 'Category', 'Size', 'ASIN',
       'Courier Status', 'Qty', 'currency', 'Amount', 'ship-city',
       'ship-state', 'ship-postal-code', 'ship-country', 'promotion-ids',
       'B2B', 'fulfilled-by', 'Unnamed: 22'],
      dtype='object')

In [155]:
# Check column names of online_sales_df

online_sales_df.columns

Index(['user_id', 'transaction_id', 'date', 'product_id', 'Quantity',
       'Delivery_Charges', 'Coupon_Status', 'Coupon_Code', 'Discount_pct'],
      dtype='object')

## Shipping Status Table

We are interested in the Status, Fulfilment, ship-service-level and fulfilled-by columns. Hence, we create these columns in `online_sales_df` and drop the columns we are not interested in `online_sales_df`. Additionally, we create shipping_id and estimated_delivery_date columns.

In [156]:
# Drop uninterested columns, create column names of interest in online_sales_df

online_sales_df.drop(columns = ['Quantity',
                                'Delivery_Charges',
                                'Coupon_Status',
                                'Coupon_Code', 
                                'Discount_pct'], inplace = True)

column_names = ['shipping_id', 'status', 'fulfilment', 'ship_service_level', 'estimated_delivery_date', 'fulfilled_by']

for i in column_names:
    online_sales_df[i] = np.nan

online_sales_df.head()

Unnamed: 0,user_id,transaction_id,date,product_id,shipping_id,status,fulfilment,ship_service_level,estimated_delivery_date,fulfilled_by
0,17850,16679,2019-01-01,B09DL9978Y,,,,,,
1,17850,16680,2019-01-01,B09DL9978Y,,,,,,
2,17850,16681,2019-01-01,B07GXHC691,,,,,,
3,17850,16682,2019-01-01,B08NCKT9FG,,,,,,
4,17850,16682,2019-01-01,B08H21B6V7,,,,,,


In [157]:
# Generate synthetic shipping_id
online_sales_df['shipping_id'] = range(10001, 10001 + len(online_sales_df))

online_sales_df.head()

Unnamed: 0,user_id,transaction_id,date,product_id,shipping_id,status,fulfilment,ship_service_level,estimated_delivery_date,fulfilled_by
0,17850,16679,2019-01-01,B09DL9978Y,10001,,,,,
1,17850,16680,2019-01-01,B09DL9978Y,10002,,,,,
2,17850,16681,2019-01-01,B07GXHC691,10003,,,,,
3,17850,16682,2019-01-01,B08NCKT9FG,10004,,,,,
4,17850,16682,2019-01-01,B08H21B6V7,10005,,,,,


Now, we populate the columns with synthetic data based on the values and probabilities we have derived from our original Amazon Sales Report dataset, starting with the status column.

In [158]:
# Check the distinct values in the status column

print(df['Status'].unique())

['Cancelled' 'Shipped - Delivered to Buyer' 'Shipped'
 'Shipped - Returned to Seller' 'Shipped - Rejected by Buyer'
 'Shipped - Lost in Transit' 'Shipped - Out for Delivery'
 'Shipped - Returning to Seller' 'Shipped - Picked Up' 'Pending'
 'Pending - Waiting for Pick Up' 'Shipped - Damaged' 'Shipping']


Since many of the distinct values are similar, we will combine them to form fewer variables in the status column. 
Entries relevant to us are:
- Cancelled
- Delivered to buyer ('Shipped - Delivered to Buyer', 'Shipped', 'Shipped - Damaged')
- Lost ('Shipped - Lost in Transit')
- Shipping ('Shipped - Out for Delivery')
- At logistics facility ('Shipped - Picked Up')
- Packed ('Pending - Waiting for Pick Up')
- Pending
- Returned ('Shipped - Returned to Seller', 'Shipped - Rejected by Buyer', 'Shipped - Returning to Seller')

In [159]:
# Combine the similar distinct values

df['Status'] = df['Status'].replace({
    'Shipped - Delivered to Buyer': 'Delivered to buyer',
    'Shipped': 'Delivered to buyer',
    'Shipped - Damaged': 'Delivered to buyer',
    'Shipped - Lost in Transit': 'Lost',
    'Shipped - Out for Delivery': 'Shipping',
    'Shipped - Picked Up': 'At logistics facility',
    'Pending - Waiting for Pick Up': 'Packed',
    'Shipped - Returned to Seller': 'Returned',
    'Shipped - Rejected by Buyer': 'Returned',
    'Shipped - Returning to Seller': 'Returned'
})

df['Status'].unique()

array(['Cancelled', 'Delivered to buyer', 'Returned', 'Lost', 'Shipping',
       'At logistics facility', 'Pending', 'Packed'], dtype=object)

In [160]:
# Tabulate the values and constraints of df for synthetic data

values = []
prob = []

data_generation_names = ["Status", "Fulfilment", "ship-service-level"]

for i in data_generation_names:
    values.append(df[i].value_counts(normalize=True).index.tolist())
    prob.append(df[i].value_counts(normalize=True).values)

Here, we want to check if there is a relationship between the fulfilment and the ship_service_level columns. We note that if fulfilment == 'Merchant', then ship_service_level will only be 'Standard'.

In [161]:
# Check relationship between fulfilment and ship_service_level

df["ship-service-level"][df["Fulfilment"] == "Merchant"].unique()

array(['Standard'], dtype=object)

In [162]:
# Generate synthetic data for status, fulfilment and ship_service_level column based on constraints tabulated

synthetic_data_names = ["status", "fulfilment", "ship_service_level"]
x = 0

for i in synthetic_data_names:
    online_sales_df[i] = np.random.choice(values[x], size = len(online_sales_df), p = prob[x])
    x += 1

online_sales_df.loc[online_sales_df['fulfilment'] == 'Merchant', 'ship_service_level'] = 'Standard'

online_sales_df.head()

Unnamed: 0,user_id,transaction_id,date,product_id,shipping_id,status,fulfilment,ship_service_level,estimated_delivery_date,fulfilled_by
0,17850,16679,2019-01-01,B09DL9978Y,10001,Delivered to buyer,Amazon,Expedited,,
1,17850,16680,2019-01-01,B09DL9978Y,10002,Delivered to buyer,Merchant,Standard,,
2,17850,16681,2019-01-01,B07GXHC691,10003,Cancelled,Amazon,Standard,,
3,17850,16682,2019-01-01,B08NCKT9FG,10004,Delivered to buyer,Amazon,Expedited,,
4,17850,16682,2019-01-01,B08H21B6V7,10005,Delivered to buyer,Amazon,Expedited,,


The data in the fulfilled_by column is dependent on the value of the entry in the fulfilment column, and hence to generate synthetic data for the fulfilled_by column, we consider the entry in the fulfilment column. The top 3rd party e-commerce fulfilment companies in India include: Quickshift Fulfillment, Shiprocket Fulfillment, Prozo and DHL. We include the company featured in the original dataset, Easy Ship.

In [163]:
# Generate synthetic data for fulfilled_by column

fulfilled_by = ["Easy Ship", "Quickshift Fulfillment", "Shiprocket Fulfillment", "Prozo", "DHL"]

conditions = [(online_sales_df["fulfilment"] == "Merchant")]
choices = [np.random.choice(fulfilled_by, size = len(online_sales_df))]

online_sales_df["fulfilled_by"] = (
    np.select(
        conditions,
        choices,
        default = ""
    )
)
online_sales_df[online_sales_df["fulfilment"] == "Merchant"].head()

Unnamed: 0,user_id,transaction_id,date,product_id,shipping_id,status,fulfilment,ship_service_level,estimated_delivery_date,fulfilled_by
1,17850,16680,2019-01-01,B09DL9978Y,10002,Delivered to buyer,Merchant,Standard,,Prozo
6,17850,16682,2019-01-01,B09Y5FZK9N,10007,Delivered to buyer,Merchant,Standard,,DHL
7,17850,16682,2019-01-01,B09Y5FZK9N,10008,Delivered to buyer,Merchant,Standard,,Prozo
12,13047,16682,2019-01-01,B07Z1X6VFC,10013,Delivered to buyer,Merchant,Standard,,Quickshift Fulfillment
18,13047,16687,2019-01-01,B08ZN4B121,10019,Delivered to buyer,Merchant,Standard,,Easy Ship


Now we wish to synthesise data for the estimated_delivery_date column. We gathered from sources online the delivery date estimations that Amazon gives to customers as parameters for our synthetic data generation.

In [164]:
# Generate data for estimated_delivery_date
online_sales_df['date'] = pd.to_datetime(online_sales_df['date'])

def generate_estimated_delivery_date(row):
    if row['status'] == "Cancelled":
        return
    if row['fulfilment'] == 'Amazon':
        if row['ship_service_level'] == 'Expedited':
            return row['date'] + timedelta(days=np.random.randint(2, 5))  # 2 to 4 days
        elif row['ship_service_level'] == 'Standard':
            return row['date'] + timedelta(days=np.random.randint(5, 9))  # 5 to 8 days

    if row['fulfilment'] == 'Merchant':
        return row['date'] + timedelta(days=np.random.randint(5, 9))
    
    
        

online_sales_df['estimated_delivery_date'] = online_sales_df.apply(generate_estimated_delivery_date, axis=1)

online_sales_df.head()

Unnamed: 0,user_id,transaction_id,date,product_id,shipping_id,status,fulfilment,ship_service_level,estimated_delivery_date,fulfilled_by
0,17850,16679,2019-01-01,B09DL9978Y,10001,Delivered to buyer,Amazon,Expedited,2019-01-04,
1,17850,16680,2019-01-01,B09DL9978Y,10002,Delivered to buyer,Merchant,Standard,2019-01-08,Prozo
2,17850,16681,2019-01-01,B07GXHC691,10003,Cancelled,Amazon,Standard,NaT,
3,17850,16682,2019-01-01,B08NCKT9FG,10004,Delivered to buyer,Amazon,Expedited,2019-01-03,
4,17850,16682,2019-01-01,B08H21B6V7,10005,Delivered to buyer,Amazon,Expedited,2019-01-04,


In [165]:
# Save the resulting df 

online_sales_df.to_csv('../shipping_status.csv', index = False)

## Shipping History table

In [166]:
# Load datasets
shipping_history_df = online_sales_df.drop(columns = 
                                           ['user_id',
                                            'transaction_id',
                                            'product_id',
                                            'fulfilment',
                                            ])

shipping_history_df.head()

Unnamed: 0,date,shipping_id,status,ship_service_level,estimated_delivery_date,fulfilled_by
0,2019-01-01,10001,Delivered to buyer,Expedited,2019-01-04,
1,2019-01-01,10002,Delivered to buyer,Standard,2019-01-08,Prozo
2,2019-01-01,10003,Cancelled,Standard,NaT,
3,2019-01-01,10004,Delivered to buyer,Expedited,2019-01-03,
4,2019-01-01,10005,Delivered to buyer,Expedited,2019-01-04,


In [167]:
# Create update_date which is when status was updated

shipping_history_df["update_date"] = np.nan

shipping_history_df["update_date"] = pd.to_datetime(shipping_history_df["update_date"])

shipping_history_df.head()

Unnamed: 0,date,shipping_id,status,ship_service_level,estimated_delivery_date,fulfilled_by,update_date
0,2019-01-01,10001,Delivered to buyer,Expedited,2019-01-04,,NaT
1,2019-01-01,10002,Delivered to buyer,Standard,2019-01-08,Prozo,NaT
2,2019-01-01,10003,Cancelled,Standard,NaT,,NaT
3,2019-01-01,10004,Delivered to buyer,Expedited,2019-01-03,,NaT
4,2019-01-01,10005,Delivered to buyer,Expedited,2019-01-04,,NaT


We wish to generate rows of synthetic data for each shipping_id that contains information on when the status changed.

The status flow for each shipping_id is:
Pending -> Packed -> At logistics facility -> Shipping -> Delivered to buyer

At Pending, it is possible that the item becomes Cancelled.

At Shipping, it is possible that the item becomes Lost.

At Delivered to buyer, it is possible that the item will be Returned.

In [168]:
# Define status flow and generate new rows
flows = {
    'Cancelled': ['Pending', 'Cancelled'],  # Flow 1
    'Lost': ['Pending', 'Packed', 'At logistics facility', 'Shipping', 'Lost'],  # Flow 2
    'Returned': ['Pending', 'Packed', 'At logistics facility', 'Shipping', 'Delivered to buyer', 'Returned'],  # Flow 4
    'default': ['Pending', 'Packed', 'At logistics facility', 'Shipping', 'Delivered to buyer']  # Flow 3
}

def generate_status_flow(row):
    # Determine the flow based on the current status
    current_status = row['status']
    if current_status in flows:
       flow = flows[current_status]
    else:
        flow = flows['default']
    
    # Create a list of new rows for this shipping_id
    new_rows = []
    for status in flow:
        new_row = row.copy()  # Copy the original row
        new_row['status'] = status  # Update the status
        new_rows.append(new_row)
    
    return new_rows

# Apply the function to create new rows for each shipping_id
expanded_rows = []
for _, row in shipping_history_df.iterrows():
    expanded_rows.extend(generate_status_flow(row))

expanded_df = pd.DataFrame(expanded_rows)

# Reset index to clean up
expanded_df.reset_index(drop=True, inplace=True)

expanded_df.head()

Unnamed: 0,date,shipping_id,status,ship_service_level,estimated_delivery_date,fulfilled_by,update_date
0,2019-01-01,10001,Pending,Expedited,2019-01-04,,NaT
1,2019-01-01,10001,Packed,Expedited,2019-01-04,,NaT
2,2019-01-01,10001,At logistics facility,Expedited,2019-01-04,,NaT
3,2019-01-01,10001,Shipping,Expedited,2019-01-04,,NaT
4,2019-01-01,10001,Delivered to buyer,Expedited,2019-01-04,,NaT


In [169]:
# Generate update_date

def generate_update_date(row, previous_update_date):
    date = pd.to_datetime(row['date'])

    if previous_update_date is None:
        previous_update_date = date
    
    if row['status'] == 'Pending':
        if row['ship_service_level'] == 'Expedited':
            return date + timedelta(days=np.random.randint(0, 2))
        else:
            return date + timedelta(days=np.random.randint(0, 3))
    
    elif row['status'] == 'Cancelled':
        return date + timedelta(days=np.random.randint(2, 4))
    
    elif row['status'] == 'Packed':
        return previous_update_date + timedelta(days=np.random.randint(0, 6) if row['ship_service_level'] != 'Expedited' else np.random.randint(0, 2))
    
    elif row['status'] == 'At logistics facility':
        return previous_update_date + timedelta(days=np.random.randint(1, 4) if row['ship_service_level'] != 'Expedited' else np.random.randint(0, 3))
    
    elif row['status'] == 'Shipping':
        if row['fulfilled_by'] == 'Quickshift Fulfillment':
            return previous_update_date + timedelta(days=np.random.randint(1, 6))
        else:
            return previous_update_date + timedelta(days=np.random.randint(1, 4))
    
    elif row['status'] == 'Lost':
        return previous_update_date + timedelta(days=np.random.randint(1, 3))
    
    elif row['status'] == 'Delivered to buyer':
        if row['fulfilled_by'] == 'Quickshift Fulfillment':
            return previous_update_date + timedelta(days=np.random.randint(1, 6))
        else:
            return previous_update_date + timedelta(days=np.random.randint(1, 4))
    
    elif row['status'] == 'Returned':
        return previous_update_date + timedelta(days=np.random.randint(3, 6))

# Apply the function to generate update_date for each row

for shipping_id in expanded_df['shipping_id'].unique():
    mask = expanded_df['shipping_id'] == shipping_id
    previous_update_date = None
    for idx in expanded_df[mask].index:
        expanded_df.loc[idx, 'update_date'] = generate_update_date(expanded_df.loc[idx], previous_update_date)
        previous_update_date = pd.to_datetime(expanded_df.loc[idx, 'update_date'])

In [170]:
# Check resulting df
expanded_df['update_date'] = pd.to_datetime(expanded_df['update_date'], errors='coerce')
expanded_df['update_date'] = expanded_df['update_date'].dt.date

expanded_df.head()

Unnamed: 0,date,shipping_id,status,ship_service_level,estimated_delivery_date,fulfilled_by,update_date
0,2019-01-01,10001,Pending,Expedited,2019-01-04,,2019-01-01
1,2019-01-01,10001,Packed,Expedited,2019-01-04,,2019-01-01
2,2019-01-01,10001,At logistics facility,Expedited,2019-01-04,,2019-01-02
3,2019-01-01,10001,Shipping,Expedited,2019-01-04,,2019-01-03
4,2019-01-01,10001,Delivered to buyer,Expedited,2019-01-04,,2019-01-05


In [171]:
# Remove unnecessary columns with repeated information from other df

expanded_df.drop(columns = ['fulfilled_by',
                            'estimated_delivery_date'], inplace = True)

expanded_df.head()

Unnamed: 0,date,shipping_id,status,ship_service_level,update_date
0,2019-01-01,10001,Pending,Expedited,2019-01-01
1,2019-01-01,10001,Packed,Expedited,2019-01-01
2,2019-01-01,10001,At logistics facility,Expedited,2019-01-02
3,2019-01-01,10001,Shipping,Expedited,2019-01-03
4,2019-01-01,10001,Delivered to buyer,Expedited,2019-01-05


In [172]:
# Save the resulting df

shipping_history_df.to_csv('../shipping_history.csv', index = False)

## Addition of origin country for Products table

To analyse supplier performance, we create a new column in `products.csv`, origin_area.

We want to generate synthetic data for the two created columns. We will use another dataset containing information about the suppliers for products sold on Amazon to obtain the trend and constraints for origin_area, which will contain information on both the city and country of origin.

In [173]:
# Load datasets
products_df = pd.read_csv('../products.csv')
suppliers_df = pd.read_csv('../source/Amazon Supplier List.csv')

suppliers_df.head()

Unnamed: 0,SITE,ADDRESS,CITY,STATE/REGION,COUNTRY
0,3Q Vina,"8 An Duong Vuong Street, Ward 16, District 8",Hồ Chí Minh,Hồ Chí Minh City,Vietnam
1,A Mount Inc.,"No. 65, Dongshing Street, Shulin District",New Taipei City,Taipei,Taiwan
2,A. R. Industries,"Mauza Rampur Jattan, Dhakwala Moginand, Tehsil...",Kala Amb,Himachal Pradesh,India
3,AAC Technologies Holdings Inc.\r\n(Shenzhen),"No.1, Chengxin Road, Baolong Ind. Park, Longga...",Shenzhen,Guangdong,China
4,"Ability Opto-Electronics Technology Co., Ltd.","4F. No.31, Keya Rd., Daya Dist.",Taichung City,Taichung,Taiwan


We filter the sites to ensure we only consider cities and countries that export electronics through Amazon.

In [174]:
# Filter companies to show only tech sites

keywords = ["elec", "tech"]
pattern = '|'.join(keywords)

filtered_supplier_df = suppliers_df[suppliers_df['SITE'].str.contains(pattern, case = False, na = False)]

filtered_supplier_df.head()

Unnamed: 0,SITE,ADDRESS,CITY,STATE/REGION,COUNTRY
3,AAC Technologies Holdings Inc.\r\n(Shenzhen),"No.1, Chengxin Road, Baolong Ind. Park, Longga...",Shenzhen,Guangdong,China
4,"Ability Opto-Electronics Technology Co., Ltd.","4F. No.31, Keya Rd., Daya Dist.",Taichung City,Taichung,Taiwan
6,"AcBel Electronic (Dong Guan) Co., Ltd.","No.17-28, (Hong Yeh Rd.), Hong Yeh Industrial ...",Dongguan,Guangdong,China
10,"Acrox Technologies Co., Ltd","No. 2 Xinmin Road, Xinmin Village, Changan Town",Dongguan,Guangdong,China
30,Amperex Technology Limited,"1 West Industrial Road, North Zone of SongShan...",Dongguan,Guangdong,China


In [175]:
# Check the length of the filtered df

len(filtered_supplier_df)

188

In [176]:
# Join the city and country columns together to form area

filtered_supplier_df['AREA'] = filtered_supplier_df['CITY'].astype(str) + ', ' + filtered_supplier_df['COUNTRY'].astype(str)

filtered_supplier_df.head()

Unnamed: 0,SITE,ADDRESS,CITY,STATE/REGION,COUNTRY,AREA
3,AAC Technologies Holdings Inc.\r\n(Shenzhen),"No.1, Chengxin Road, Baolong Ind. Park, Longga...",Shenzhen,Guangdong,China,"Shenzhen, China"
4,"Ability Opto-Electronics Technology Co., Ltd.","4F. No.31, Keya Rd., Daya Dist.",Taichung City,Taichung,Taiwan,"Taichung City, Taiwan"
6,"AcBel Electronic (Dong Guan) Co., Ltd.","No.17-28, (Hong Yeh Rd.), Hong Yeh Industrial ...",Dongguan,Guangdong,China,"Dongguan, China"
10,"Acrox Technologies Co., Ltd","No. 2 Xinmin Road, Xinmin Village, Changan Town",Dongguan,Guangdong,China,"Dongguan, China"
30,Amperex Technology Limited,"1 West Industrial Road, North Zone of SongShan...",Dongguan,Guangdong,China,"Dongguan, China"


In [177]:
# Generate synthetic data in online_sales_df based on the counts and constraints of the suppliers df

area_values = filtered_supplier_df['AREA'].value_counts(normalize=True).index.tolist()
area_probs = filtered_supplier_df['AREA'].value_counts(normalize=True).values

products_df["origin_area"] = np.random.choice(area_values, size = len(products_df), p =area_probs)

products_df.head()

Unnamed: 0,product_id,product_name,about_product,category,actual_price,discounted_price,discount_percentage,origin_area
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,High Compatibility : Compatible With iPhone 12...,Computers&Accessories|Accessories&Peripherals|...,13.19,4.79,0.64,"Shenzhen, China"
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,"Compatible with all Type C enabled devices, be...",Computers&Accessories|Accessories&Peripherals|...,4.19,2.39,0.43,"Dongguan, China"
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,【 Fast Charger& Data Sync】-With built-in safet...,Computers&Accessories|Accessories&Peripherals|...,22.79,2.39,0.9,"Kunshan, China"
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,The boAt Deuce USB 300 2 in 1 cable is compati...,Computers&Accessories|Accessories&Peripherals|...,8.39,3.95,0.53,"Shandong, China"
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,[CHARGE & SYNC FUNCTION]- This cable comes wit...,Computers&Accessories|Accessories&Peripherals|...,4.79,1.85,0.61,"Shenzhen, China"


In [178]:
# Save the resulting df

products_df.to_csv('../products.csv', index = False)