# **Cleaning and Preparation of data**

This dataset consists **more than 1000 of real products with their identification number listed in the Amazon marketplace** specifically from the region India. The currency used in the dataset is Rupee India and it will be converted to usd which is more universal.

Before data analysis, it is important to clean and prepare data. The methods used to clean and prepare the data are as listed below:

1. Changing Data Types of Columns from object to Floats
2. Filling in Missing Information
3. Checking For Duplicate Rows
4. Creating Various New Columns

In [1]:
# Importing Packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
# Importing Files
df = pd.read_csv('../source/amazon.csv')

In [4]:
df.head()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,₹399,"₹1,099",64%,4.2,24269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,₹199,₹349,43%,4.0,43994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,₹199,"₹1,899",90%,3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,₹329,₹699,53%,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,₹154,₹399,61%,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...


In [5]:
#Checking Column Names

df.columns

Index(['product_id', 'product_name', 'category', 'discounted_price',
       'actual_price', 'discount_percentage', 'rating', 'rating_count',
       'about_product', 'user_id', 'user_name', 'review_id', 'review_title',
       'review_content', 'img_link', 'product_link'],
      dtype='object')

In [6]:
#Checking Number of Rows and Columns

df.shape

(1465, 16)

In [7]:
#Checking Data Types for each Column

df.dtypes

product_id             object
product_name           object
category               object
discounted_price       object
actual_price           object
discount_percentage    object
rating                 object
rating_count           object
about_product          object
user_id                object
user_name              object
review_id              object
review_title           object
review_content         object
img_link               object
product_link           object
dtype: object

Note that the currency being used in **Indian Rupee**.

In [8]:
#Changing the data type of discounted price and actual price

df['discounted_price'] = df['discounted_price'].str.replace("₹",'')
df['discounted_price'] = df['discounted_price'].str.replace(",",'')
df['discounted_price'] = df['discounted_price'].astype('float64')

df['actual_price'] = df['actual_price'].str.replace("₹",'')
df['actual_price'] = df['actual_price'].str.replace(",",'')
df['actual_price'] = df['actual_price'].astype('float64')


In [9]:
# converting from rupee to usd
conversion_rate = 0.012
df['actual_price'] = df['actual_price'] * conversion_rate
df['discounted_price'] =df['discounted_price'] * conversion_rate

In [10]:
#Changing Datatype and values in Discount Percentage

df['discount_percentage'] = df['discount_percentage'].str.replace('%','').astype('float64')

df['discount_percentage'] = df['discount_percentage'] / 100

df['discount_percentage']

0       0.64
1       0.43
2       0.90
3       0.53
4       0.61
        ... 
1460    0.59
1461    0.25
1462    0.28
1463    0.26
1464    0.22
Name: discount_percentage, Length: 1465, dtype: float64

In [11]:
#Finding unusual string in the rating column

df['rating'].value_counts()

rating
4.1    244
4.3    230
4.2    228
4.0    129
3.9    123
4.4    123
3.8     86
4.5     75
4       52
3.7     42
3.6     35
3.5     26
4.6     17
3.3     16
3.4     10
4.7      6
3.1      4
5.0      3
3.0      3
4.8      3
3.2      2
2.8      2
2.3      1
|        1
2        1
3        1
2.6      1
2.9      1
Name: count, dtype: int64

In [12]:
#Inspecting the exception row

df.query('rating == "|"')

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
1279,B08L12N5H1,Eureka Forbes car Vac 100 Watts Powerful Sucti...,"Home&Kitchen|Kitchen&HomeAppliances|Vacuum,Cle...",25.188,29.988,0.16,|,992,No Installation is provided for this product|1...,"AGTDSNT2FKVYEPDPXAA673AIS44A,AER2XFSWNN4LAUCJ5...","Divya,Dr Nefario,Deekshith,Preeti,Prasanth R,P...","R2KKTKM4M9RDVJ,R1O692MZOBTE79,R2WRSEWL56SOS4,R...","Decent product,doesn't pick up sand,Ok ok,Must...","Does the job well,doesn't work on sand. though...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Eureka-Forbes-Vacuum-Cle...


I went to the amazon page to get the rating and found that the product id of **B08L12N5H1 has a rating of 4**. So I am going to give the item rating a 4.0 as well.

Source: https://www.amazon.in/Eureka-Forbes-Vacuum-Cleaner-Washable/dp/B08L12N5H1

In [13]:
#Changing Rating Columns Data Type

df['rating'] = df['rating'].str.replace('|', '4.0').astype('float64')

In [14]:
#Changing Rating Column Data Type

df['rating_count'] = df['rating_count'].str.replace(',', '').astype('float64')

In [15]:
#Checking for Duplicates

duplicates = df.duplicated()
df[duplicates]

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link


In [16]:
#Checking Missing Values

df.isna().sum()

product_id             0
product_name           0
category               0
discounted_price       0
actual_price           0
discount_percentage    0
rating                 0
rating_count           2
about_product          0
user_id                0
user_name              0
review_id              0
review_title           0
review_content         0
img_link               0
product_link           0
dtype: int64

## Products Table

In [27]:
#Creating a new DataFrame with Selected Column to use for products table
df1 = df[['product_id', 'product_name', 'about_product','category', 'actual_price', 'discounted_price','discount_percentage','img_link']].copy()

In [28]:
df1['actual_price'] = df1['actual_price'].round(2)
df1['discounted_price'] = df1['discounted_price'].round(2)

In [29]:
df1[df1.duplicated()]

Unnamed: 0,product_id,product_name,about_product,category,actual_price,discounted_price,discount_percentage,img_link
622,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,"Compatible with all Type C enabled devices, be...",Computers&Accessories|Accessories&Peripherals|...,4.19,2.39,0.43,https://m.media-amazon.com/images/W/WEBP_40237...
642,B08Y1TFSP6,pTron Solero TB301 3A Type-C Data and Fast Cha...,Fast Charging & Data Sync: Solero TB301 Type-C...,Computers&Accessories|Accessories&Peripherals|...,12.0,1.79,0.85,https://m.media-amazon.com/images/I/31wOPjcSxl...
658,B08WRWPM22,"boAt Micro USB 55 Tangle-free, Sturdy Micro US...",It Ensures High Speed Transmission And Chargin...,Computers&Accessories|Accessories&Peripherals|...,5.99,2.12,0.65,https://m.media-amazon.com/images/W/WEBP_40237...
668,B08DDRGWTJ,MI Usb Type-C Cable Smartphone (Black),1m long Type-C USB Cable|Sturdy and Durable. W...,Computers&Accessories|Accessories&Peripherals|...,3.59,2.75,0.23,https://m.media-amazon.com/images/I/31XO-wfGGG...
684,B07KSMBL2H,AmazonBasics Flexible Premium HDMI Cable (Blac...,"Flexible, lightweight HDMI cable for connectin...","Electronics|HomeTheater,TV&Video|Accessories|C...",8.4,2.63,0.69,https://m.media-amazon.com/images/I/41nPYaWA+M...
689,B085DTN6R2,Portronics Konnect CL 20W POR-1067 Type-C to 8...,[20W PD FAST CHARGING]-It’s supports 20W PD qu...,Computers&Accessories|Accessories&Peripherals|...,10.79,4.2,0.61,https://m.media-amazon.com/images/I/31J6qGhAL9...
692,B09KLVMZ3B,Portronics Konnect L 1.2M POR-1401 Fast Chargi...,[CHARGE & SYNC FUNCTION]- This cable comes wit...,Computers&Accessories|Accessories&Peripherals|...,4.79,1.91,0.6,https://m.media-amazon.com/images/W/WEBP_40237...
714,B08DPLCM6T,LG 80 cm (32 inches) HD Ready Smart LED TV 32L...,Resolution: HD Ready (1366x768) | Refresh Rate...,"Electronics|HomeTheater,TV&Video|Televisions|S...",263.88,161.88,0.39,https://m.media-amazon.com/images/W/WEBP_40237...
727,B09NHVCHS9,Flix Micro Usb Cable For Smartphone (Black),"Micro usb cable is 1 meter in length, optimize...",Computers&Accessories|Accessories&Peripherals|...,2.39,0.71,0.7,https://m.media-amazon.com/images/I/31qGpf8uzu...
731,B01M4GGIVU,Tizum High Speed HDMI Cable with Ethernet | Su...,Latest Standard HDMI A Male to A Male Cable: S...,"Electronics|HomeTheater,TV&Video|Accessories|C...",8.39,2.39,0.72,https://m.media-amazon.com/images/I/41da4tk7N+...


In [32]:
df[df['product_id'] == 'B096MSW6CT']

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,2.388,22.788,0.9,3.9,7928.0,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
379,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,2.388,11.988,0.8,3.9,7928.0,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/I/31IvNJZnmd...,https://www.amazon.in/Sounce-iPhone-Charging-C...
623,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,2.388,11.988,0.8,3.9,7928.0,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...


In [33]:
df.loc[623].compare(df.loc[379])

Unnamed: 0,self,other
about_product,【 Fast Charger& Data Sync】-With built-in safet...,【 Fast Charger& Data Sync】-With built-in safet...
review_content,"Not quite durable and sturdy,https://m.media-a...","Not quite durable and sturdy,https://m.media-a..."
img_link,https://m.media-amazon.com/images/W/WEBP_40237...,https://m.media-amazon.com/images/I/31IvNJZnmd...
product_link,https://www.amazon.in/Sounce-iPhone-Charging-C...,https://www.amazon.in/Sounce-iPhone-Charging-C...


In [34]:
# Finding out the difference in the two entries
df.loc[623]==df.loc[379]

product_id              True
product_name            True
category                True
discounted_price        True
actual_price            True
discount_percentage     True
rating                  True
rating_count            True
about_product          False
user_id                 True
user_name               True
review_id               True
review_title            True
review_content         False
img_link               False
product_link           False
dtype: bool

To ensure all product_ids are unique in the products table, we will be dropping any duplicates and retaining the first occurence.

In [35]:
df1.drop_duplicates(subset=['product_id'],keep='first',inplace=True)

In [36]:
df1.shape

(1351, 8)

In [45]:
df1.to_csv("data/products.csv",index=False)

## Ratings Table 

In [74]:
#Creating a new DataFrame with Selected Column to use for ratings table
df2 = df[['product_id','rating','rating_count','review_title','review_content']].copy()

In [65]:
count_series = df2.groupby('product_id').count()['rating']
count_series[count_series > 1].index

Index(['B002PD61Y4', 'B008FWZGSG', 'B00NH11KIK', 'B00NH11PEY', 'B0141EZMAI',
       'B01DEWVZ2C', 'B01FSYQ2A4', 'B01GGKYKQM', 'B01GGKZ0V6', 'B01M4GGIVU',
       'B07232M876', 'B077Z65HSD', 'B0789LZTCJ', 'B078G6ZF5Z', 'B07DJLFMPS',
       'B07GVGTSLN', 'B07JW9H4J1', 'B07KRCW6LZ', 'B07KSMBL2H', 'B07P681N66',
       'B07RD611Z8', 'B07WG8PDCW', 'B07XJYYH7L', 'B07XLCFSSN', 'B082LSVT4B',
       'B082LZGK39', 'B082T6V3DT', 'B083342NKJ', 'B085194JFL', 'B085DTN6R2',
       'B085HY1DGR', 'B08BCKN299', 'B08BQ947H3', 'B08CDKQ8T6', 'B08CF3B7N1',
       'B08CF3D7QR', 'B08DDRGWTJ', 'B08DPLCM6T', 'B08HDJ86NZ', 'B08K4PSZ3V',
       'B08MTCKDYN', 'B08QSC1XY8', 'B08R69VDHT', 'B08WRBG3XW', 'B08WRWPM22',
       'B08Y1SJVV5', 'B08Y1TFSP6', 'B0949SBKMP', 'B094JNXNPV', 'B094YFFSMY',
       'B096MSW6CT', 'B096VF5YYF', 'B097R25DP7', 'B098NS6PVG', 'B0994GFWBH',
       'B09C6HWG18', 'B09C6HXFC1', 'B09CMM3VGK', 'B09CMP1SC8', 'B09F6S8BT6',
       'B09F9YQQ7B', 'B09KGV7WSV', 'B09KLVMZ3B', 'B09MQSCJQ1', 'B09MT84WV5',

There are product ids with more than 1 rating, so I will take the average of their ratings and concatenate the review titles and contents.

In [66]:
avg_ratings = df2.groupby('product_id')['rating'].mean().reset_index()
avg_ratings.rename(columns={'rating': 'average_rating'}, inplace=True)

In [68]:
# Concatenate review_title and review_content for each product_id
concatenated_titles = df2.groupby('product_id')['review_title'].apply(lambda x: ' | '.join(x)).reset_index()
concatenated_contents = df2.groupby('product_id')['review_content'].apply(lambda x: ' | '.join(x)).reset_index()

# Combine the results into one DataFrame
result = avg_ratings.merge(concatenated_titles, on='product_id')
result = result.merge(concatenated_contents, on='product_id')

In [75]:
total_rating_counts = df2.groupby('product_id')['rating_count'].sum().reset_index()
result = result.merge(total_rating_counts, on='product_id')

In [80]:
result['rating_count'] = result['rating_count'].astype(int)

In [81]:
result.to_csv('data/ratings.csv',index=False)

In [82]:
result.head()

Unnamed: 0,product_id,average_rating,review_title,review_content,rating_count
0,B002PD61Y4,4.1,"good tool to use for,Brand is always good,Over...",good quality tool from d linkWiFi signal is go...,16262
1,B002SZEOLG,4.2,Works on linux for me. Get the model with ante...,I use this to connect an old PC to internet. I...,179692
2,B003B00484,4.3,"Works Good,Perfect replacement cell for trimme...","Works good,Bought it to replace my Phillips QT...",27201
3,B003L62T7W,4.3,"Handy Mouse,Good quality mouse,Good one.,Good,...","Liked this Product,https://m.media-amazon.com/...",31534
4,B004IO5BMQ,4.5,"Good silent mouse,Too small to hold!,Reviewing...",It's little small for big hands. But best avai...,54405


## Daily Transaction History Table
This version is outdated and no longer in use, refer to online_sales_cleaning.ipynb

In [92]:
# https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/YGLYDY
sales = pd.read_csv("../source/amazon-purchases.csv")
sales.shape


Unnamed: 0,Order Date,Purchase Price Per Unit,Quantity,Shipping Address State,Title,ASIN/ISBN (Product Code),Category,Survey ResponseID
1601,2018-03-21,7.99,1.0,SD,"Amazon Basics High-Speed HDMI Cable (18 Gbps, ...",B014I8SX4Y,ELECTRONIC_CABLE,R_06RZP9pS7kONINr
2872,2022-11-27,7.68,1.0,KS,"Amazon Basics High-Speed HDMI Cable (18 Gbps, ...",B014I8SSD0,CABLE_OR_ADAPTER,R_0Arj0ePpTnReV1v
4832,2018-04-12,3.99,1.0,UT,Amazon Basics HDMI Female to Female Coupler Ad...,B06XR9PR5X,ELECTRONIC_ADAPTER,R_0DoXqOQl0hxEeFH
5496,2020-12-24,13.49,1.0,UT,Amazon Basics Uni-Directional DisplayPort to H...,B015OW3M1W,ELECTRONIC_CABLE,R_0DoXqOQl0hxEeFH
7115,2019-08-24,9.29,1.0,MI,"Amazon Basics 16-Gauge Speaker Wire Cable, 50 ...",B006LW0WDQ,ELECTRONIC_CABLE,R_0IBgnbXoP4Uvvhv
...,...,...,...,...,...,...,...,...
1847559,2022-01-09,6.99,1.0,MD,Logitech B100 Corded Mouse – Wired USB Mouse f...,B003L62T7W,INPUT_MOUSE,R_zTi3j2QuqAzr7NL
1849170,2021-02-02,8.61,1.0,IL,"Amazon Basics High-Speed HDMI Cable (18 Gbps, ...",B014I8SSD0,ELECTRONIC_CABLE,R_zd4E1BgAdaM2761
1849692,2022-03-17,9.21,1.0,WI,Amazon Basics Uni-Directional DisplayPort to H...,B015OW3M1W,ELECTRONIC_CABLE,R_zdLOP8JD2pe1brj
1850677,2018-10-11,8.99,1.0,MA,Amazon Basics DisplayPort to DisplayPort 1.2 C...,B01J8S6X2I,ELECTRONIC_CABLE,R_zfqnsBzlOAKibzb


In [95]:
df[df['product_id']=='B002SZEOLG']


Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
143,B002SZEOLG,TP-Link Nano USB WiFi Dongle 150Mbps High Gain...,Computers&Accessories|NetworkingDevices|Networ...,8.988,16.068,0.44,4.2,179692.0,150 Mbps Wi-Fi —— Exceptional wireless speed u...,"AGV3IEFANZCKECFGUM42MRH5FNOA,AEBO7NWCNXKT4AESA...","Azhar JuMan,Anirudh Sood,Hari Krishnan PS,Akas...","R1LW6NWSVTVZ2H,R3VR5WFKUS15C5,R2F6GC79OYWUKQ,R...",Works on linux for me. Get the model with ante...,I use this to connect an old PC to internet. I...,https://m.media-amazon.com/images/I/31Wb+A3VVd...,https://www.amazon.in/TP-Link-TL-WN722N-150Mbp...


In [96]:
sales_new[sales_new['ASIN/ISBN (Product Code)']=='B002SZEOLG']

Unnamed: 0,Order Date,Purchase Price Per Unit,Quantity,Shipping Address State,Title,ASIN/ISBN (Product Code),Category,Survey ResponseID
1249384,2019-12-11,14.99,1.0,ID,TP-Link Nano USB Wifi Dongle 150Mbps High Gain...,B002SZEOLG,NETWORK_INTERFACE_CONTROLLER_ADAPTER,R_3iJPosktkZFzmM4


In [94]:
sales_new= sales[sales["ASIN/ISBN (Product Code)"].isin(df['product_id'])]
sales_new.groupby('ASIN/ISBN (Product Code)')['Quantity'].count()

ASIN/ISBN (Product Code)
B002SZEOLG      1
B003L62T7W     63
B005FYNT3G      1
B005LJQMZC      1
B006LW0WDQ     10
B0088TKTY2      1
B008IFXQFU     57
B00MUTWLW4      7
B00NH11KIK     48
B00NH11PEY     30
B00NH12R1O      4
B00NH13Q8W      7
B00SH18114      8
B00ZYLMQH0     10
B0148NPH9I     14
B014I8SSD0    178
B014I8SX4Y     31
B015OW3M1W     46
B01D5H8LDM      6
B01D5H8ZI8     10
B01D5H90L4      6
B01EZ0X3L8      4
B01GGKYKQM     26
B01GGKZ0V6      5
B01GGKZ4NU      4
B01J8S6X2I     31
B06XR9PR5X      4
B0711PVX6Z     26
B07232M876     40
B073BRXPZX      2
B075ZTJ9XR     12
B07CWDX49D      2
B07DC4RZPY     53
B07G3YNLJB      4
B07KSB1MLX      1
B07KSMBL2H      2
B07TMCXRFV      4
B07VTFN6HM      6
B07XLCFSSN      1
B07ZR4S1G4      2
B082T6GVG9      8
B082T6GVLJ      4
B082T6GXS5      5
B082T6V3DT      9
B086JTMRYL      1
B08C4Z69LN     11
B08GTYFC37     11
B08GYG6T12     16
B08JD36C6H      1
B095RTJH1M     22
B09BVCVTBC      1
B09G5TSGXV     11
B09X7DY7Q4      1
B09YLFHFDW      1
B0B

In [97]:
sales_new.head()

Unnamed: 0,Order Date,Purchase Price Per Unit,Quantity,Shipping Address State,Title,ASIN/ISBN (Product Code),Category,Survey ResponseID
1601,2018-03-21,7.99,1.0,SD,"Amazon Basics High-Speed HDMI Cable (18 Gbps, ...",B014I8SX4Y,ELECTRONIC_CABLE,R_06RZP9pS7kONINr
2872,2022-11-27,7.68,1.0,KS,"Amazon Basics High-Speed HDMI Cable (18 Gbps, ...",B014I8SSD0,CABLE_OR_ADAPTER,R_0Arj0ePpTnReV1v
4832,2018-04-12,3.99,1.0,UT,Amazon Basics HDMI Female to Female Coupler Ad...,B06XR9PR5X,ELECTRONIC_ADAPTER,R_0DoXqOQl0hxEeFH
5496,2020-12-24,13.49,1.0,UT,Amazon Basics Uni-Directional DisplayPort to H...,B015OW3M1W,ELECTRONIC_CABLE,R_0DoXqOQl0hxEeFH
7115,2019-08-24,9.29,1.0,MI,"Amazon Basics 16-Gauge Speaker Wire Cable, 50 ...",B006LW0WDQ,ELECTRONIC_CABLE,R_0IBgnbXoP4Uvvhv


In [100]:
sales_new.index

Index([   1601,    2872,    4832,    5496,    7115,    9966,   10787,   10902,
         12653,   13115,
       ...
       1836353, 1838628, 1846239, 1846623, 1846868, 1847559, 1849170, 1849692,
       1850677, 1850678],
      dtype='int64', length=887)

In [None]:
sales_new['transaction_id'] = sales_new.index

In [None]:
sales_new['Quantity'] = sales_new['Quantity'].astype(int)

In [110]:
final_sales = sales_new[['transaction_id','Order Date','ASIN/ISBN (Product Code)','Quantity',]]

In [111]:
final_sales.to_csv("data/daily_sales.csv",index=False)