### Problem statement:
Extracting Product Information from Flipkart Website using Web Scraping


### Objective:
Design a web scraping solution to extract essential product information from the Flipkart website. The goal is to retrieve data such as brands, price, original price, description, and URL of various products listed on Flipkart.

### Requirements:

Website: Utilize the Flipkart website (www.flipkart.com) as the data source for product information extraction.
Data Fields: The following fields need to be extracted for each product:
- Brand: The brand name of the product.
- Price: The current selling price of the product.
- Original Price: The original price of the product (if applicable, considering discounts).
- Description: A brief description of the product.
- URL: The URL of the product's page on Flipkart.

### Flipkart Website

### Importing Necessary Libraries

In [1]:
import pandas as pd               
from bs4 import BeautifulSoup     # BeautifulSoup is used to parse the text information
import requests
from selenium import webdriver

### Code Documentation

In [6]:
# Create a WebDriver instance (make sure you have the appropriate driver executable installed)
driver = webdriver.Chrome()

brands = []

price = []
original_price = []
description = []
link = []
original_url = input('Enter what you want: ')

for i in range(1,5):
    def search_url(url):
        template = 'https://www.flipkart.com/search?q={}&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page={}'
        url = url.replace(' ','+')
        return template.format(url,i)

    driver.get(search_url(original_url))

    soup = BeautifulSoup(driver.page_source, 'html.parser')
    
    # Brands
    try:
        brand = soup.find_all('div', class_='_4rR01T')
        for a in brand:
            brands.append(a.text.split('(')[0])
    except AttributeError:
        brands = ''

    # Price
    try:
        pr = soup.find_all('div', class_='_30jeq3 _1_WHN1')
        for b in pr:
            price.append(int(b.text.replace('₹', '').replace(',', '')))
    except AttributeError:
        price.append(None)  # Append None for missing values

    # Original Price
    try:
        orp = soup.find_all('div', class_='_3I9_wc _27UcVY')
        for c in orp:
            original_price.append(int(c.text.replace('₹', '').replace(',', '')))
    except AttributeError:
        original_price.append(None)  # Append None for missing values

    # Append missing values for other lists if necessary
    missing_count = len(brands) - len(price)
    if missing_count > 0:
        price.extend([None] * missing_count)

    missing_count = len(brands) - len(original_price)
    if missing_count > 0:
        original_price.extend([None] * missing_count)


    # Description
    try:
        ul = soup.find_all('ul', class_='_1xgFaf')
        for d in ul:
            li = d.find_all('li', class_='rgWa7D')
            description.append([j.text for j in li])
    except AttributeError:
        description = ''

    # Website link
    try:
        a_tag = soup.find_all('a', class_='_1fQZEK')
        for i in a_tag:
            link.append('https://www.flipkart.com' + i.get('href'))
    except AttributeError:
        link = ''
        
        
# Data Frame
df = pd.DataFrame()
df['brands'] = brands
df['price'] = price
df['original_price'] = original_price
df['description'] = description
df['link'] = link
df.head()

# Close the WebDriver instance

Enter what you want: Laptops


Unnamed: 0,brands,price,original_price,description,link
0,ASUS Chromebook Celeron Dual Core N4020 -,13990,24990,"[Intel Celeron Dual Core Processor, 4 GB LPDDR...",https://www.flipkart.com/asus-chromebook-celer...
1,HP 360 Intel Celeron Quad Core N4020 -,22990,31156,"[Intel Celeron Quad Core Processor, 4 GB LPDDR...",https://www.flipkart.com/hp-360-intel-celeron-...
2,HP Laptop Core i3 11th Gen 1115G4 -,38990,49025,"[Intel Core i3 Processor (11th Gen), 8 GB DDR4...",https://www.flipkart.com/hp-laptop-core-i3-11t...
3,Infinix INBook Y1 Plus Intel Core i3 10th Gen ...,26990,49990,"[Intel Core i3 Processor (10th Gen), 8 GB LPDD...",https://www.flipkart.com/infinix-inbook-y1-plu...
4,Lenovo V15 Ryzen 5 Hexa Core 5500U -,34640,69525,"[AMD Ryzen 5 Hexa Core Processor, 8 GB DDR4 RA...",https://www.flipkart.com/lenovo-v15-ryzen-5-he...


In [3]:
df.head()

Unnamed: 0,brands,price,original_price,description,link
0,CHUWI Core i3 10th Gen -,23990,49990.0,"[Intel Core i3 Processor (10th Gen), 8 GB DDR4...",https://www.flipkart.com/chuwi-core-i3-10th-ge...
1,CHUWI Celeron Dual Core 10th Gen -,16990,34990.0,"[Intel Celeron Dual Core Processor (10th Gen),...",https://www.flipkart.com/chuwi-celeron-dual-co...
2,HP Laptop Core i3 11th Gen 1115G4 -,38990,49025.0,"[Intel Core i3 Processor (11th Gen), 8 GB DDR4...",https://www.flipkart.com/hp-laptop-core-i3-11t...
3,Infinix INBook Y1 Plus Intel Core i3 10th Gen ...,26990,49990.0,"[Intel Core i3 Processor (10th Gen), 8 GB LPDD...",https://www.flipkart.com/infinix-inbook-y1-plu...
4,Lenovo V15 Ryzen 5 Hexa Core 5500U -,34640,69525.0,"[AMD Ryzen 5 Hexa Core Processor, 8 GB DDR4 RA...",https://www.flipkart.com/lenovo-v15-ryzen-5-he...


In [4]:
df

Unnamed: 0,brands,price,original_price,description,link
0,CHUWI Core i3 10th Gen -,23990,49990.0,"[Intel Core i3 Processor (10th Gen), 8 GB DDR4...",https://www.flipkart.com/chuwi-core-i3-10th-ge...
1,CHUWI Celeron Dual Core 10th Gen -,16990,34990.0,"[Intel Celeron Dual Core Processor (10th Gen),...",https://www.flipkart.com/chuwi-celeron-dual-co...
2,HP Laptop Core i3 11th Gen 1115G4 -,38990,49025.0,"[Intel Core i3 Processor (11th Gen), 8 GB DDR4...",https://www.flipkart.com/hp-laptop-core-i3-11t...
3,Infinix INBook Y1 Plus Intel Core i3 10th Gen ...,26990,49990.0,"[Intel Core i3 Processor (10th Gen), 8 GB LPDD...",https://www.flipkart.com/infinix-inbook-y1-plu...
4,Lenovo V15 Ryzen 5 Hexa Core 5500U -,34640,69525.0,"[AMD Ryzen 5 Hexa Core Processor, 8 GB DDR4 RA...",https://www.flipkart.com/lenovo-v15-ryzen-5-he...
...,...,...,...,...,...
91,HP 15s,47490,54552.0,"[Intel Core i3 Processor (13th Gen), 8 GB DDR4...",https://www.flipkart.com/hp-15s-2023-intel-cor...
92,Infinix X3 Slim Intel Core i3 12th Gen 1215U -,34990,54990.0,"[Intel Core i3 Processor (12th Gen), 8 GB LPDD...",https://www.flipkart.com/infinix-x3-slim-intel...
93,Acer Extensa,34990,51999.0,"[Stylish & Portable Thin and Light Laptop, LPD...",https://www.flipkart.com/acer-extensa-2023-ryz...
94,HP Intel Core i5 12th Gen 1235U -,62039,72331.0,"[Intel Core i5 Processor (12th Gen), 16 GB DDR...",https://www.flipkart.com/hp-intel-core-i5-12th...


In [5]:
df.to_csv('smart phone.csv')

In [84]:
a=[]
li=soup.find_all('li',class_='rgWa7D')
for i in li:
    if 'SSD' in i.text:
        a.append(i.text)

In [86]:
len(a)

24

In [None]:
a=[]
li=soup.find_all('li',class_='rgWa7D')
for i in li:
    if 'SSD' in i.text:
        a.append(i.text)

In [82]:
a

['256 GB SSD',
 '512 GB SSD',
 '512 GB SSD',
 '512 GB SSD',
 '512 GB SSD',
 '256 GB SSD',
 '512 GB SSD',
 '512 GB SSD',
 '512 GB SSD',
 '512 GB SSD',
 '512 GB SSD',
 '512 GB SSD',
 '1 TB HDD|256 GB SSD',
 '512 GB SSD',
 '256 GB SSD',
 '512 GB SSD',
 '512 GB SSD',
 '512 GB SSD',
 '1 TB HDD|256 GB SSD',
 '256 GB SSD',
 '512 GB SSD',
 '512 GB SSD',
 '128 GB SSD',
 '512 GB SSD']

In [65]:
b=[]
li=soup.find_all('li',class_='rgWa7D')
for i in li:
    if 'Operating' in i.text:
        b.append(i.text)

In [87]:
len(b)

24

In [67]:
c=[]
li=soup.find_all('li',class_='rgWa7D')
for i in li:
    if 'cm' in i.text:
        c.append(i.text)

In [88]:
c

['35.56 cm (14 inch) Display',
 '40.89 cm (16.1 Inch) Display',
 '35.56 cm (14 Inch) Display',
 '39.62 cm (15.6 Inch) Display',
 '35.56 cm (14 Inch) Display',
 '35.56 cm (14 Inch) Display',
 '40.64 cm (16 inch) Display',
 '39.62 cm (15.6 inch) Display',
 '39.62 cm (15.6 Inch) Display',
 '35.56 cm (14 inch) Display',
 '35.56 cm (14 Inch) Display',
 '35.56 cm (14 Inch) Display',
 '39.62 cm (15.6 Inch) Display',
 '39.62 cm (15.6 inch) Display',
 '33.78 cm (13.3 inch) Display',
 '39.62 cm (15.6 inch) Display',
 '39.62 cm (15.6 Inch) Display',
 '39.62 cm (15.6 Inch) Display',
 '39.62 cm (15.6 Inch) Display',
 '35.56 cm (14 Inch) Display',
 '39.62 cm (15.6 Inch) Display',
 '35.56 cm (14 Inch) Display',
 '39.62 cm (15.6 Inch) Display',
 '39.62 cm (15.6 Inch) Display']

In [71]:
R=[]
li=soup.find_all('li',class_='rgWa7D')
for i in li:
    if 'RAM' in i.text:
        R.append(i.text)

In [77]:
len(R)

24

In [78]:
P=[]
li=soup.find_all('li',class_='rgWa7D')
for i in li:
    if 'Processor' in i.text:
        P.append(i.text)

In [81]:
len(P)

24