## Requirement:
1. Your function should be able to take in an URL and return a pandas dataframe
2. The final dataframe should contain the following informations: 
    * Product ID
    * Seller ID
    * Product title
    * Price
    * URL of the product image
    * URL of that product page

Bonus information:
* Is it TikiNow (delivery within 2 hours) <img src="https://salt.tikicdn.com/ts/upload/9f/32/dd/8a8d39d4453399569dfb3e80fe01de75.png">?
* Is it free delivery?
* Number of reviews?
* How many stars or percentage of stars?
* Does it got "badge under price" (Rẻ hơn hoàn tiền) <img src="https://salt.tikicdn.com/ts/upload/51/ac/cc/528e80fe3f464f910174e2fdf8887b6f.png">?
* Discount percentage?
* Does it got "shocking price" badge ? <img src="https://salt.tikicdn.com/ts/upload/75/34/d2/4a9a0958a782da8930cdad8f08afff37.png">
* Does it allowed to be paid by installments? <img src="https://salt.tikicdn.com/ts/upload/ba/4e/6e/26e9f2487e9f49b7dcf4043960e687dd.png">
* Does it comes with free gifts? <img src="https://salt.tikicdn.com/ts/upload/47/35/8c/446f61d046eba9a305d3f39dc0834c4a.png">
    

In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup

In [23]:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}

r = requests.get('https://tiki.vn/laptop-may-vi-tinh-linh-kien/c1846?page=400', headers=headers)
# r.text is a HTML file so we will use html.parser
soup = BeautifulSoup(r.text, 'html.parser')

# Make the soup object look nicer
print(soup.prettify()[:10000])

<!DOCTYPE html>
<html class="no-js" lang="">
 <head>
  <style>
   html { background: #f4f4f4; } .async-hide body { opacity: 0 !important}
  </style>
  <script>
   (function(a,s,y,n,c,h,i,d,e){s.className+=' '+y;h.start=1*new Date;
    h.end=i=function(){s.className=s.className.replace(RegExp(' ?'+y),'')};
    (a[n]=a[n]||[]).hide=h;setTimeout(function(){i();h.end=null},c);h.timeout=c;
    })(window,document.documentElement,'async-hide','dataLayer',1500,
    {'GTM-53B3KKW':true});
  </script>
  <script>
   !function(){if('PerformanceLongTaskTiming' in window){var g=window.__tti={e:[]};
g.o=new PerformanceObserver(function(l){g.e=g.e.concat(l.getEntries())});
g.o.observe({entryTypes:['longtask']})}}();
  </script>
  <script>
   (function() {    
            function getCookie(name) {
              var value = "; " + document.cookie;
              var parts = value.split("; " + name + "=");
              if (parts.length == 2) return parts.pop().split(";").shift();
            }
         

In [9]:
import re

In [24]:
# All occurences of the products in that page
print("\nAll occurences of the product div sections:")
products = soup.find_all('a', {'class':'product-item'})

print("Type:", type(products))
print("Number of products:", len(products))


All occurences of the product div sections:
Type: <class 'bs4.element.ResultSet'>
Number of products: 0


In [13]:
product_id_ls = []
product_title_ls = []
price_ls = []
discount_ls = []
image_url_ls = []
product_url_ls = []
tikinow_ls = []
free_delivery_ls = []
num_reviews_ls = []
percentage_ratings_ls = []
badge_under_price_ls = []
discount_percent_ls = []
shocking_price_ls = []
paid_installment_ls = []
free_gift_ls = []

In [14]:
tiki_now_img_url = 'https://salt.tikicdn.com/ts/upload/9f/32/dd/8a8d39d4453399569dfb3e80fe01de75.png'
under_price_url = 'https://salt.tikicdn.com/ts/upload/51/ac/cc/528e80fe3f464f910174e2fdf8887b6f.png'
badge_benefit_url = 'https://salt.tikicdn.com/ts/upload/ba/4e/6e/26e9f2487e9f49b7dcf4043960e687dd.png'

In [15]:
regex = re.compile('-p\d*\.')


for product in products:
    # print(product.prettify())
    try:
        product_link = product['href']
        product_url = 'http://tiki.vn' + product_link
        product_id = regex.findall(product_link)[0][1:-1]
        
        product_url_ls.append(product_url)
    except:
        print('product link got error. move on to next product')
        continue
        
    # grab image url
    try:
        image_url = product.img['src']
        
    except:
        image_url = "NA"
        
    image_url_ls.append(image_url)
    
    product_id_ls.append(product_id)
    
    # find name
    try:
        product_name = product.find('div', {'class':'name'}).span.text
    except:
        product_name = "NA"
    
    product_title_ls.append(product_name)
    
    # find price
    try:
        product_price = product.find('div', {'class': 'price-discount__price'}).text
        discount_pct = product.find('div', {'class': 'price-discount__discount'}).text
    except:
        product_price = "NA"
        discount_pct = "NA"
        
    discount_ls.append(discount_pct)
    price_ls.append(product_price)
    
    # Shocking price - FreeShip
    shock_price = None
    freeship = None
    
    try:
        addon = product.find('div', {'class': 'item top'})
        if addon.text == 'Freeship':
            print('Got free ship!')
            freeship = 1
            shock_price = 0
        else:
            print('Got shocking price!')
            freeship = 0
            shock_price = 1
    except:
        # print('cant find div item top')
        shock_price = "NA"
        freeship = "NA"
    
    shocking_price_ls.append(shock_price)
    free_delivery_ls.append(freeship)
    
    # Extract review information
    num_review = None
    rating_pct = None
    try:
        review_rating = product.find('div', {'class': 'rating-review'})
        rating_pct = review_rating.find('div', {'class': 'rating__average'})['style'][6:]
        #print(rating_pct)
        num_review = product.find('div', {'class': 'review'}).text[1:-1]
        #print(num_review)
    except:
        num_review = "NA"
        rating_pct = "NA"
    
    num_reviews_ls.append(num_review)
    percentage_ratings_ls.append(rating_pct)
    
    
    # check TikiNow
    tikinow = 0
    try:
        badge_service = product.find('div', {'class': 'badge-service'})
        print('Got tikinow')
        if badge_service.img['src'] == tiki_now_img_url:
            tikinow = 1
    except:
        tikinow = "NA"
    
    tikinow_ls.append(tikinow)
    
    # check under price
    under_price = 0
    try:
        under_price_badge = product.find('div', {'class': 'badge-under-price'})
        if under_price_badge.img['src'] == under_price_url:
            #print('Got underprice!')
            under_price = 1
    except:
        under_price = "NA"
        
    badge_under_price_ls.append(under_price)
    
    # check paid by installments:
    installment = 0
    try:
        badge_benefit = product.find('div', {'class': 'badge-benefits'})
        if badge_benefit.img['src'] == badge_benefit_url:
            #print('Tra gop!')
            installment = 1
    except:
        installment = 0
    
    paid_installment_ls.append(installment)
    
    # free gifts
    try:
        free_gift = product.find('div', {'class': 'freegift-list'}).span.text
        
    except:
        free_gift = "NA"
        
    free_gift_ls.append(free_gift)

Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got shocking price!
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got shocking price!
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got free ship!
Got tikinow
Got tikinow
Got tikinow
Got tikinow
Got shocking price!
Got tikinow


In [16]:
data = pd.DataFrame({
    'Product id': product_id_ls,
    'Product title': product_title_ls,
    'Product URL': product_url_ls,
    'Image URL': image_url_ls,
    'Price': price_ls,
    'Discount': discount_ls,
    'Tiki Now': tikinow_ls,
    'Free Delivery': free_delivery_ls,
    'Total reviews': num_reviews_ls,
    'Rating %': percentage_ratings_ls,
    'Under price badge': badge_under_price_ls,
    'Shocking price': shocking_price_ls,
    'Paid installments': paid_installment_ls,
    'Free Gifts': free_gift_ls
})

In [17]:
data.to_csv('tiki_products.csv')

In [18]:
data

Unnamed: 0,Product id,Product title,Product URL,Image URL,Price,Discount,Tiki Now,Free Delivery,Total reviews,Rating %,Under price badge,Shocking price,Paid installments,Free Gifts
0,p3054369,Phần Mềm Diệt Virus BKAV Profressional 1 PC 12...,http://tiki.vn/phan-mem-diet-virus-bkav-profre...,https://salt.tikicdn.com/cache/280x280/ts/prod...,195.000 ₫,-35%,1,,893,100%,1.0,,0,
1,p405243,USB Kingston DT100G3 32GB USB 3.0 - Hàng Chính...,http://tiki.vn/usb-kingston-dt100g3-32gb-usb-3...,https://salt.tikicdn.com/cache/280x280/ts/prod...,87.000 ₫,-76%,1,,1543,90%,,,0,
2,p547563,Bộ Kích Sóng Wifi Repeater 300Mbps Totolink EX...,http://tiki.vn/bo-kich-song-wifi-repeater-300m...,https://salt.tikicdn.com/cache/280x280/ts/prod...,195.000 ₫,-29%,1,,1491,80%,,,0,
3,p356188,Chuột Có Dây Logitech B100 - Hàng Chính Hãng,http://tiki.vn/chuot-co-day-logitech-b100-hang...,https://salt.tikicdn.com/cache/280x280/ts/prod...,69.000 ₫,-23%,1,,1213,90%,,,0,
4,p405240,USB Kingston DT100G3 16GB USB 3.0 - Hàng Chính...,http://tiki.vn/usb-kingston-dt100g3-16gb-usb-3...,https://salt.tikicdn.com/cache/280x280/media/c...,115.000 ₫,-50%,1,,1478,90%,,,0,
5,p10683512,USB Kingston DataTraveler SWIVL 32GB Chính hãng,http://tiki.vn/usb-kingston-datatraveler-swivl...,https://salt.tikicdn.com/cache/280x280/ts/prod...,94.000 ₫,-53%,1,,140,90%,,,0,
6,p618526,Ổ Cứng SSD Kingston A400 (240GB) - Hàng Chính ...,http://tiki.vn/o-cung-ssd-kingston-a400-240gb-...,https://salt.tikicdn.com/cache/280x280/ts/prod...,666.000 ₫,-45%,1,,704,90%,1.0,,0,
7,p646020,Bàn Phím Có Dây Dell KB216 - Đen - Hàng Chính ...,http://tiki.vn/ban-phim-co-day-dell-kb216-den-...,https://salt.tikicdn.com/cache/280x280/media/c...,153.000 ₫,-39%,1,,905,90%,1.0,,0,
8,p56318256,Apple Macbook Air 2020 - 13 Inchs (i3-10th/ 8G...,http://tiki.vn/apple-macbook-air-2020-13-inchs...,https://salt.tikicdn.com/cache/280x280/ts/prod...,24.699.000 ₫,-15%,1,,28,90%,1.0,,1,
9,p405252,USB Kingston DT100G3 - 64GB - USB 3.0 - Hàng C...,http://tiki.vn/usb-kingston-dt100g3-64gb-usb-3...,https://salt.tikicdn.com/cache/280x280/media/c...,170.000 ₫,-77%,1,,450,90%,,,0,
