# E-commerce Website Scraping Project
There are two method that we can use for scraping website in this project. The first one is CSS selector method and the other is using website API. We will use Python programming language for this project and library like BeautifulSoup and Selenium.

## CSS Selector Method

### Preparation
The first thing that we have to do is installing BeautifulSoup library and Selenium library.
For installing BeautifulSoup on Windows, you can use this command on terminal/Command Prompt:

**pip install beautifulsoup4** or for python 3 and above, you can use **pip3 install beautifulsoup4**

To install Selenium on WIndows, you can use this command and run it on the terminal/ Command Prompt:

**pip install selenium** or for python 3 and above, you can use **pip3 install selenium**

After installing Selenium library, the next step is we have to install browser driver

after that, we can use import to use that library 

In [2]:
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options

ModuleNotFoundError: ignored

Setting Up for Selenium

In [None]:
#Link Digimap
main_link = 'https://shopee.co.id/ternakmart#product_list'
s = Service('D:\Ternaknesia\Web Scrapping\chromedriver.exe')

#Customize chrome display
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument('--disable-infobars')


driver = webdriver.Chrome(service=s, options=chrome_options)
driver.get(main_link)

Set Target

In [None]:
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
soup = BeautifulSoup(html,"html.parser")

Test Scraping

In [None]:
#Get one product name
product_name = soup.find_all('div', class_='_1sRyv_ _2j2K92 _3j20V6')
product_name[0].get_text()

'Daging Sapi Slice Karubi 500gr | Ternakmart'

In [None]:
#Get one product price
product_price = soup.find_all('div', class_='_3JdP1I _1qxg6T _1NAEoM')
product_price[0].get_text()

'Rp89.250'

In [None]:
#Get one product sold
product_sold = soup.find_all('div', class_='_2Tc7Qg')
product_sold[1].get_text()

'1 Terjual'

Scrap for all product name, product price, product sold in 1 page 

In [None]:
#creating empty list to store data for every product in one page
product_namelist1, product_pricelist1, product_soldlist1 = [], [], []

#creating looping for every product name
for i in soup.find_all('div', class_='_1sRyv_ _2j2K92 _3j20V6'):
    product_namelist1.append(i.text)

#creating looping for every product price
for i in soup.find_all('div', class_='_3JdP1I _1qxg6T _1NAEoM'):
    product_pricelist1.append(i.text)

#creating looping for every product sold
for i in soup.find_all('div', class_='_2Tc7Qg'):
    product_soldlist1.append(i.text)

In [None]:
#executing
print(product_namelist1)
print(product_pricelist1)
print(product_soldlist1)


['Daging Sapi Slice Karubi 500gr | Ternakmart', 'Daging Sapi Lidah Slice / Gyutan 500gr | Ternakmart', 'Daging Sapi Slice Bulgogi 250gr | Ternakmart', 'Daging Sapi Slice Gyuniku 250gr | Ternakmart', 'Daging Sapi Slice Tenderloin 500gr | Ternakmart', 'Daging Sapi Slice Premium / Yoshinoya 500gr | Ternakmart', 'Fillet Paha Ayam/ Boneless Paha Ayam/ Paha Ayam Tanpa Tulang 500gr | Ternakmart', 'Telur Ayam Negeri 1 Pack (Isi 15 Butir)  | Ternakmart', 'Daging Ayam Kulit Ayam 1 kg | Ternakmart', 'Salmon Fillet Premium / ONS  | Ternakmart', 'Daging Sapi Sirloin / Has Luar 250gr | Ternakmart', 'Daging Ayam Sayap/Chicken Wings 500gr | Ternakmart', 'Daging Sapi Giling Premium / Minced Beef Meat 1 kg | Ternakmart', 'Ceker Ayam / Kaki Ayam 500gr | Ternakmart', 'Daging Ayam Paha Bawah / Drumstick 1kg | Ternakmart', 'Tenderloin / Has Dalam Daging Sapi / Steak Daging Sapi 250gr | Ternakmart', 'Dori Fillet 500Gr  | Ternakmart', 'Kornet Daging Sapi / Cornet Corned Beef Tornado | Ternakmart', 'Inofu Tahu

Scrap for all product name, product price, product sold for more pages

In [None]:
#Many Pages
product_namelist, product_pricelist, product_soldlist, store_location = [], [], [], []

page_link = "<div class="shopee-page-controller"
for page in range(0,10):
    main_link = 'https://shopee.co.id/ternakmart?page={}&sortBy=pop'.format(page)
    driver.get(main_link)

    #creating looping for every product name
    for i in soup.find_all('div', class_='_1sRyv_ _2j2K92 _3j20V6'):
        product_namelist.append(i.text)

    #creating looping for every product price
    for i in soup.find_all('div', class_='_3JdP1I _1qxg6T _1NAEoM'):
        product_pricelist.append(i.text)

    #creating looping for every product sold
    for i in soup.find_all('div', class_='_2Tc7Qg'):
        product_soldlist.append(i.text)

    #creating looping for store location
    for i in soup.find_all('div', class_='_1IbMik'):
        store_location.append(i.text)

print("All product scrapped!")

All product scrapped!


Save Data

In [None]:
#Creating Column
listCols = ['product name', 'product price', 'product sold', 'store location']
dict_data = dict(zip(
    listCols,
    (product_namelist,
    product_pricelist,
    product_soldlist,
    store_location)))

Creating DataFrame

In [None]:
df = pd.DataFrame(data = dict_data)
df.head()

Unnamed: 0,product name,product price,product sold,store location
0,Daging Sapi Slice Karubi 500gr | Ternakmart,Rp89.250,,KOTA SURABAYA
1,Daging Sapi Lidah Slice / Gyutan 500gr | Terna...,Rp64.890,1 Terjual,KOTA SURABAYA
2,Daging Sapi Slice Bulgogi 250gr | Ternakmart,Rp44.625,,KOTA SURABAYA
3,Daging Sapi Slice Gyuniku 250gr | Ternakmart,Rp44.625,,KOTA SURABAYA
4,Daging Sapi Slice Sirloin 500gr | Ternakmart,Rp75.000,2 Terjual,KOTA SURABAYA


Saving to CSV format

In [None]:
df.to_csv('Ternakmart_Data_Shopee_new.csv', index=False)

## API Method

### Preparation
First thing that we have to do is to install requests library, if you don't have, you can use this command to install it

**pip install requests** or for python 3 and above, you can use **pip3 install requests**

after that, we can import that library

In [None]:
import requests

Setting up link

In [None]:
shopee_url = 'https://shopee.co.id/'
keyword = 'daging sapi slice karubi 500gr'

#setting up browser that ou used, for this project I use Google Chrome
header = {
    'User-Agent' : 'Chrome',
    'Referer' : '{}search?keyword={}'.format(shopee_url, keyword)
}

url = 'https://shopee.co.id/api/v4/search/search_items?by=relevancy&keyword={}&limit=60&newest=0&order=desc&page_type=search&scenario=PAGE_GLOBAL_SEARCH&version=2'