# WEB SCRAPING (Selenium)

Problem Statement:

Q1: Write a python program to scrape data for “Data Analyst” Job position in 
“Bangalore” location. You have to scrape the job-title, job-location, company_name,
experience_required. You have to scrape first 10 jobs data.
This task will be done in following steps:
1. first get the webpage https://www.naukri.com/
2. Enter “Data Analyst” in “Skill,Designations,Companies” field and enter “Bangalore” 
in “enter the location” field.
3. Then click the search button.
4. Then scrape the data for the first 10 jobs results you get.
5. Finally create a dataframe of the scraped data.
Note- All of the above steps have to be done in code. No step is to be done manually


In [1]:
! pip install selenium



In [2]:
#Lets now import all the required libraries
import selenium
import pandas as pd
from selenium import webdriver

In [3]:
#Lets first connect to the web driver
driver = webdriver.Chrome('chromedriver.exe')

In [4]:
url = 'https://www.naukri.com/'
driver.get(url)

In [5]:
#finding elements for job search bar
search_job = driver.find_element_by_id('qsb-keyword-sugg')
search_job

<selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="b0ff1ecd-5f24-40e3-bf3c-24f32d551487")>

In [6]:
#write on search bar
search_job.send_keys('Data Scientist')

In [7]:
# finding elements for job location bar
search_loc = driver.find_element_by_id('qsb-location-sugg')
search_loc.send_keys('Bangalore')

In [8]:
#do click using xpath function
search_btn = driver.find_element_by_xpath("//button[@class='btn']")
search_btn.click()

So now lets 4 create empty lists. In these lists the data will be stored while scraping.

In [9]:
#So lets extract all the tags having the job titles
titles_tags = driver.find_elements_by_xpath("//a[@class='title fw500 ellipsis']")

In [10]:
titles_tags

[<selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="b6344a00-8b54-45ab-a76a-379636f1b71a")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="60122ab0-13b4-4d69-97aa-bc3afaeb0706")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="48bd5def-01e0-4021-a0ee-a8a91b365d39")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="ee673f9f-cc43-49a3-bda7-2182066469fc")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="6ab2c165-d922-47e0-a9fd-ae86212804f8")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="f3dab171-2022-4d93-ac57-7d9dd374d324")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="c073a1f3-ec1a-41e4-bdf4-5b

In [11]:
#Loop to iterate over the tags extracted above and extract the text inside them.
job_titles=[]
for i in titles_tags:
    job_titles.append(i.text)
job_titles

['Data Scientist / Data Analyst -Business Analyst',
 'Senior Data Scientist, Modeling',
 'Big Data - Data Scientist',
 'Specialist I - Data Scientist',
 'Data Scientist',
 'Lead Data Scientist',
 'Data Scientist',
 'SDE Lead Data Scientist-L3',
 'Computational Design Lead Data Scientist-L3',
 'Hiring For DATA Scientist - ON Contract Basis (3-6 Months)',
 'Senior Data Scientist',
 'Senior Data Scientist - Chatbot & NLP',
 'Senior Data Scientist',
 'Sr. Data Scientist',
 'Senior Data Scientist',
 'Senior Associate/Team Lead - Data Scientist Consulting',
 'Senior Data Scientist',
 'Data Scientist',
 'Hiring For Lead data Scientist For Bangalore location',
 'Principal Data Scientist']

In [12]:
#So lets extract all the tags having the company name
companies_tags = driver.find_elements_by_xpath("//a[@class='subTitle ellipsis fleft']")
companies_tags

[<selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="5e6d0c9e-7118-49a3-8bc7-046805d6de3d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="90d34232-b962-403c-a90a-3d99797cd4d6")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="88f54e7a-1a8b-4f5b-b7d3-97547b2e2ca3")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="31b1594a-49fb-4219-b87e-250304d1ea64")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="20e03a47-c252-4beb-bf5c-30ea33031997")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="eb1abf99-af1a-4a8e-acf3-763888514558")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="d0b99afd-4198-4d8e-b8b4-3e

In [13]:
#Loop to iterate over the tags extracted above and extract the text inside them.
company_name=[]
for i in companies_tags:
    company_name.append(i.text)
company_name

['Inflexion Analytix Private Limited',
 'Nielsen',
 'Xoriant Solutions Pvt Ltd',
 'Philips India Limited',
 'IBM India Pvt. Limited',
 'Intel Technology India Pvt Ltd',
 'Oracle India Pvt. Ltd.',
 'Huawei Technologies India Pvt Ltd',
 'Huawei Technologies India Pvt Ltd',
 'GlobalEdx Learning and Technology Solution Pvt Ltd',
 'GO-JEK India',
 'Gojek Tech',
 'nanobi data and analytics private limited',
 'VALIANCE ANALYTICS PRIVATE LIMITED',
 'BankBazaar.com (A&A DUKAAN FINANCIAL SERVICES PVT. LTD)',
 'Analytics India Magazine',
 'Gojek Tech',
 'Applied Materials',
 'Societe Generale',
 'NetApp']

In [14]:
#So lets extract all the tags having the experience
experience_tags = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi experience']//span")
experience_tags

[<selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="f04d03a1-3445-49ff-8e09-5d57083f91c9")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="7057dc33-487a-48de-a21e-a201d1923fb7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="9b2cce51-9014-43b6-993a-572701efff4e")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="a44fa8c3-4127-41e1-9f4c-67ab1dfa0b25")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="2d0754d8-b77c-4026-88fe-c11570b09a88")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="08b94bc7-178b-4847-a79c-9f31aaa83abd")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="6d50e987-4430-41e1-8f13-f0

In [15]:
#Loop to iterate over the tags extracted above and extract the text inside them.
experience=[]
for i in experience_tags:
    experience.append(i.text)
experience

['0-3 Yrs',
 '3-7 Yrs',
 '1-3 Yrs',
 '4-7 Yrs',
 '6-8 Yrs',
 '6-10 Yrs',
 '6-10 Yrs',
 '5-8 Yrs',
 '5-8 Yrs',
 '3-8 Yrs',
 '4-11 Yrs',
 '3-7 Yrs',
 '7-9 Yrs',
 '6-8 Yrs',
 '5-7 Yrs',
 '3-5 Yrs',
 '3-7 Yrs',
 '2-4 Yrs',
 '5-9 Yrs',
 '10-15 Yrs']

In [16]:
#So lets extract all the tags having the salary
salary_tags = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi salary']//span")
salary_tags

[<selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="6f22c568-6409-4c76-9cb6-adf1df6da23f")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="0156bcc1-48f0-4ddc-8416-6ef3177b29e6")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="333327d5-ba86-4f98-bd5e-afb44ce982df")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="eb9a4982-b490-468c-bdfb-bdbbaa3687e9")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="3b3d3cbd-880d-41b6-983e-c97074c18382")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="c9d49fac-8a48-4cae-92fc-78edf59a0ea5")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="b59133eb-ea6c-45c8-b7ee-0c

In [17]:
#Loop to iterate over the tags extracted above and extract the text inside them.
salary=[]
for i in salary_tags:
    salary.append(i.text)
salary

['3,50,000 - 4,50,000 PA.',
 '7,00,000 - 9,50,000 PA.',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 '50,000 - 70,000 PA.',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 '20,00,000 - 35,00,000 PA.',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed']

In [18]:
#So lets extract all the tags having the salary
location_tags = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi location']//span")
location_tags

[<selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="676a2007-528c-4f8b-a395-0ddafb2053ba")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="3d98a062-b1c5-4297-8474-a05b980d7387")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="88ec3a64-97a6-44d6-b0c7-37557a04a602")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="705fde45-7f6c-4e45-99c7-2f38dcc42320")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="0aae64dd-e765-4417-9120-c7bbdaead197")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="4ec1ba12-e3a5-4c21-a988-a26b4bc59f7e")>,
 <selenium.webdriver.remote.webelement.WebElement (session="7a49f3c9989739e59e5ebd10948b511d", element="0d8ab8e6-497e-41fb-a4d3-da

In [19]:
#Loop to iterate over the tags extracted above and extract the text inside them.
location=[]
for i in location_tags:
    location.append(i.text)
location

['Mumbai, Hyderabad/Secunderabad, Pune, Gurgaon/Gurugram, Chennai, Bangalore/Bengaluru',
 'Kolkata, Gurgaon/Gurugram, Bangalore/Bengaluru, Vadodara, Mumbai (All Areas)',
 'Kochi/Cochin, Indore, Hyderabad/Secunderabad, Pune, Ahmedabad, Bangalore/Bengaluru, Mumbai (All Areas)',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Hyderabad/Secunderabad, Bangalore/Bengaluru, Mumbai (All Areas)',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru, Delhi / NCR',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru']

In [22]:
print(len(job_titles),len(company_name),len(salary),len(location),len(experience))

20 20 20 20 20


In [23]:
import pandas as pd

In [25]:
jobs=pd.DataFrame({})
jobs['title'] =job_titles
jobs['company'] =company_name
jobs['experience'] = experience
jobs['salary'] = salary
jobs['location'] = location

jobs

Unnamed: 0,title,company,experience,salary,location
0,Data Scientist / Data Analyst -Business Analyst,Inflexion Analytix Private Limited,0-3 Yrs,"3,50,000 - 4,50,000 PA.","Mumbai, Hyderabad/Secunderabad, Pune, Gurgaon/..."
1,"Senior Data Scientist, Modeling",Nielsen,3-7 Yrs,"7,00,000 - 9,50,000 PA.","Kolkata, Gurgaon/Gurugram, Bangalore/Bengaluru..."
2,Big Data - Data Scientist,Xoriant Solutions Pvt Ltd,1-3 Yrs,Not disclosed,"Kochi/Cochin, Indore, Hyderabad/Secunderabad, ..."
3,Specialist I - Data Scientist,Philips India Limited,4-7 Yrs,Not disclosed,Bangalore/Bengaluru
4,Data Scientist,IBM India Pvt. Limited,6-8 Yrs,Not disclosed,Bangalore/Bengaluru
5,Lead Data Scientist,Intel Technology India Pvt Ltd,6-10 Yrs,Not disclosed,Bangalore/Bengaluru
6,Data Scientist,Oracle India Pvt. Ltd.,6-10 Yrs,Not disclosed,Bangalore/Bengaluru
7,SDE Lead Data Scientist-L3,Huawei Technologies India Pvt Ltd,5-8 Yrs,Not disclosed,Bangalore/Bengaluru
8,Computational Design Lead Data Scientist-L3,Huawei Technologies India Pvt Ltd,5-8 Yrs,Not disclosed,Bangalore/Bengaluru
9,Hiring For DATA Scientist - ON Contract Basis ...,GlobalEdx Learning and Technology Solution Pvt...,3-8 Yrs,"50,000 - 70,000 PA.","Hyderabad/Secunderabad, Bangalore/Bengaluru, M..."


Problem Statement:
Q2: Write a python program to scrape data for “Data Scientist” Job position in 
“Bangalore” location. You have to scrape the job-title, job-location,
company_name, full job-description. You have to scrape first 10 jobs data.
This task will be done in following steps:
1. first get the webpage https://www.naukri.com/
2. Enter “Data Scientist” in “Skill,Designations,Companies” field and enter 
“Bangalore” in “enter the location” field.
3. Then click the search button.
4. Then scrape the data for the first 10 jobs results you get.
5. Finally create a dataframe of the scraped data.
Note- 1. All of the above steps have to be done in code. No step is to be done 
manually.

In [26]:
#Lets now import all the required libraries
import selenium
import pandas as pd
from selenium import webdriver

In [27]:
#Lets first connect to the web driver
driver = webdriver.Chrome('chromedriver.exe')

In [28]:
url = 'https://www.naukri.com/'
driver.get(url)

In [29]:
#finding elements for job search bar
search_job = driver.find_element_by_id('qsb-keyword-sugg')
search_job

<selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="aeaf030d-5779-4782-984f-f22eb194908a")>

In [30]:
#write on search bar
search_job.send_keys('Data Analyst')

In [31]:
# finding elements for job location bar
search_loc = driver.find_element_by_id('qsb-location-sugg')
search_loc.send_keys('Bangalore')

In [32]:
#do click using xpath function
search_btn = driver.find_element_by_xpath("//button[@class='btn']")
search_btn.click()

So now lets 4 create empty lists. In these lists the data will be stored while scraping.



In [35]:
#So lets extract all the tags having the job titles
titles_tags = driver.find_elements_by_xpath("//a[@class='title fw500 ellipsis']")
titles_tags

[<selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="c0c8b8d5-e64f-4f4b-b0aa-69de83ce1f39")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="1122a58a-b20f-42e2-ad27-71bcd4f4f30f")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="74e9c73b-72ef-4e58-ab64-98076fcc6553")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="8fa6f7b8-2061-44ae-a379-ee808f0f0cdc")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="8badffea-98d2-481b-b2a2-499b2869a2b2")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="7b059514-1565-4040-a27f-fb54f1678de3")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="fba7395f-0283-4c6b-809c-8e

In [38]:
#Loop to iterate over the tags extracted above and extract the text inside them.
job_title=[]
for i in titles_tags:
    job_title.append(i.text)
job_title

['Data Scientist / Data Analyst -Business Analyst',
 'Data Analyst - Informatica MDM',
 'Assistant Vice President - MIS & Reporting ( Business Data Analyst)',
 'Data Analyst',
 'Data Analyst',
 'Data Analyst',
 'Data Analyst',
 'Data Analyst',
 'Data analyst - Google Analytics',
 'Senior/Regular Business Analyst / Data Analyst',
 'Data Analyst',
 'Hiring For Data Analyst @ Flipkart on Contract',
 'Senior Data Analyst',
 'Data Analyst',
 'Cybersecurity Data Analyst',
 'Business Data Analyst - MIS & Reporting',
 'Data Analyst / Business Analyst -',
 'Business Data Analyst',
 'Data Analyst',
 'Data Analyst']

In [39]:
#So lets extract all the tags having the company names
company_tags = driver.find_elements_by_xpath("//a[@class='subTitle ellipsis fleft']")
company_tags

[<selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="95adb985-0855-4c31-a83f-0ec3832a4cde")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="33196802-5065-4857-baeb-060373fda3c9")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="4de61520-b219-495e-96a0-755df25370e3")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="097e0419-fd23-481c-bc4c-d9a6a43bebd4")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="47aa7bdf-d30d-4f54-a502-27f69bcddb61")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="672164ca-ff93-4711-ad03-869c55d881df")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="5d7fccb9-33e7-4e52-90dc-f8

In [40]:
#Loop to iterate over the tags extracted above and extract the text inside them.
company_names=[]
for i in company_tags:
    company_names.append(i.text)
company_names

['Inflexion Analytix Private Limited',
 'Shell India Markets Private Limited',
 'INTERTRUSTVITEOS CORPORATE AND FUND SERVICES PVT. LTD.',
 'Myntra Designs Pvt. Ltd.',
 'Myntra Designs Pvt. Ltd.',
 'Myntra Designs Pvt. Ltd.',
 'Myntra Designs Pvt. Ltd.',
 'SA Tech Software (I) Pvt. Ltd.',
 'H and M Hennes and Mauritz (P) Ltd.',
 'Luxoft',
 'Flipkart Internet Private Limited',
 'Flipkart Internet Private Limited',
 'Flipkart Internet Private Limited',
 'IBM India Pvt. Limited',
 'Huawei Technologies India Pvt Ltd',
 'INTERTRUST GROUP',
 'LatentView Analytics Private Limited',
 'ALSTOM India Limited',
 'Udaan',
 'Luxoft']

In [43]:
#So lets extract all the tags having the company names
experience_tags = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi experience']//span")
experience_tags

[<selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="20b189c0-b43d-48b4-8012-986d7133fa9d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="b7c0fe7f-266f-4c59-bdf1-00951e38858f")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="e98bcfd5-38b1-4941-a515-46117d3b5e8e")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="4781e245-cad8-4ebf-af5a-597373f411db")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="a5154be6-9caa-4ff7-9099-77b4c6f1ab41")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="c6e8fbd2-52d1-411f-b2c2-ec784c5d439a")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="c3eb0256-a1df-4888-afe3-f4

In [44]:
#Loop to iterate over the tags extracted above and extract the text inside them.
experiences=[]
for i in experience_tags:
    experiences.append(i.text)
experiences

['0-3 Yrs',
 '6-9 Yrs',
 '12-18 Yrs',
 '3-6 Yrs',
 '3-6 Yrs',
 '4-9 Yrs',
 '4-8 Yrs',
 '1-3 Yrs',
 '4-7 Yrs',
 '3-6 Yrs',
 '1-3 Yrs',
 '2-6 Yrs',
 '2-5 Yrs',
 '5-10 Yrs',
 '5-8 Yrs',
 '3-8 Yrs',
 '1-5 Yrs',
 '5-10 Yrs',
 '1-6 Yrs',
 '2-6 Yrs']

In [45]:
#So lets extract all the tags having the salary
salary_tags = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi salary']//span")
salary_tags

[<selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="aa9932af-706f-466c-8708-8701a19ea170")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="4b923bf9-ae73-444f-bbc6-4a07553cab68")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="3bf3c032-f350-4a2b-af76-abb780f1a1e1")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="62d58bf3-9468-4449-a714-a35b5c244282")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="0dbd5218-579d-49b9-84a6-b9947b872c23")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="c269172b-cdbb-4022-b7d9-499a09bf8eec")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="f4feb55f-bcfb-40e9-a88a-be

In [46]:
#Loop to iterate over the tags extracted above and extract the text inside them.
salaries=[]
for i in salary_tags:
    salaries.append(i.text)
salaries

['3,50,000 - 4,50,000 PA.',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed']

In [47]:
#So lets extract all the tags having the locations
location_tags = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi location']//span")
location_tags

[<selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="9e27d9ce-75d6-4814-b256-835c462a60e0")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="82edd939-ed3c-4c0e-b6bd-9cf46bee4ca5")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="b57bb9df-f8d0-4cf9-a1aa-ef6df88c4f8c")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="b819cfef-0384-4dd5-80fe-59d1ab0a25a7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="c603c6a9-a0d7-4a35-8137-585861966d08")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="dfda4f28-41fc-428c-b605-9340bc756bb9")>,
 <selenium.webdriver.remote.webelement.WebElement (session="e0e996cece0d216d34a136097b0f1ba1", element="3da0a90b-082f-4668-81ee-2d

In [48]:
#Loop to iterate over the tags extracted above and extract the text inside them.
locations=[]
for i in location_tags:
    locations.append(i.text)
locations

['Mumbai, Hyderabad/Secunderabad, Pune, Gurgaon/Gurugram, Chennai, Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Mumbai, Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Kolkata, Pune, Chennai, Bangalore/Bengaluru, Delhi / NCR, Mumbai (All Areas)',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Mumbai, Bangalore/Bengaluru',
 'Chennai, Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru',
 'Bangalore/Bengaluru']

In [49]:
print(len(job_title),len(company_names),len(salaries),len(locations),len(experiences))

20 20 20 20 20


In [50]:
jobs=pd.DataFrame({})
jobs['title'] =job_title
jobs['company'] =company_names
jobs['experience'] = experiences
jobs['salary'] = salaries
jobs['location'] = locations

jobs

Unnamed: 0,title,company,experience,salary,location
0,Data Scientist / Data Analyst -Business Analyst,Inflexion Analytix Private Limited,0-3 Yrs,"3,50,000 - 4,50,000 PA.","Mumbai, Hyderabad/Secunderabad, Pune, Gurgaon/..."
1,Data Analyst - Informatica MDM,Shell India Markets Private Limited,6-9 Yrs,Not disclosed,Bangalore/Bengaluru
2,Assistant Vice President - MIS & Reporting ( B...,INTERTRUSTVITEOS CORPORATE AND FUND SERVICES P...,12-18 Yrs,Not disclosed,"Mumbai, Bangalore/Bengaluru"
3,Data Analyst,Myntra Designs Pvt. Ltd.,3-6 Yrs,Not disclosed,Bangalore/Bengaluru
4,Data Analyst,Myntra Designs Pvt. Ltd.,3-6 Yrs,Not disclosed,Bangalore/Bengaluru
5,Data Analyst,Myntra Designs Pvt. Ltd.,4-9 Yrs,Not disclosed,Bangalore/Bengaluru
6,Data Analyst,Myntra Designs Pvt. Ltd.,4-8 Yrs,Not disclosed,Bangalore/Bengaluru
7,Data Analyst,SA Tech Software (I) Pvt. Ltd.,1-3 Yrs,Not disclosed,"Kolkata, Pune, Chennai, Bangalore/Bengaluru, D..."
8,Data analyst - Google Analytics,H and M Hennes and Mauritz (P) Ltd.,4-7 Yrs,Not disclosed,Bangalore/Bengaluru
9,Senior/Regular Business Analyst / Data Analyst,Luxoft,3-6 Yrs,Not disclosed,Bangalore/Bengaluru


Q3: In this question you have to scrape data using the filters available on the 
webpage as shown below:
You have to use the location and salary filter.
You have to scrape data for “Data Scientist” designation for first 10 job results.
You have to scrape the job-title, job-location, company_name,
experience_required.
The location filter to be used is “Delhi/NCR”
The salary filter to be used is “3-6” lakhs
The task will be done as shown in the below steps:
1. first get the webpage https://www.naukri.com/
2. Enter “Data Scientist” in “Skill,Designations,Companies” field 
3. Then click the search button.
4. Then apply the location filter and salary filter by checking the respective boxes
4. Then scrape the data for the first 10 jobs results you get.
5. Finally create a dataframe of the scraped data.
Note- All of the above steps have to be done in code. No step is to be done 
manually.

In [122]:
#Lets now import all the required libraries
import selenium
import pandas as pd
from selenium import webdriver

In [123]:
#Lets first connect to the web driver
driver = webdriver.Chrome('chromedriver.exe')

In [124]:
url = 'https://www.naukri.com/'
driver.get(url)

In [125]:

#finding elements for job search bar
search_job = driver.find_element_by_id('qsb-keyword-sugg')
search_job

<selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="23143b8f-d1af-45be-9424-b830ff3c962f")>

In [126]:
#write on search bar
search_job.send_keys('Data Scientist')

In [127]:
#do click using xpath function
search_btn = driver.find_element_by_xpath("//button[@class='btn']")
search_btn.click()

In [128]:
#finding elements for delhi/Ncr filter
search_sal =  driver.find_element_by_xpath("//span[@title='3-6 Lakhs']")
search_sal.click()

In [129]:
#finding elements for delhi/Ncr filter
search_sal =  driver.find_element_by_xpath('//span[@title="Delhi / NCR"]')
search_sal.click()

In [130]:
#So lets extract all the tags having the job titles
titles_tags = driver.find_elements_by_xpath("//a[@class='title fw500 ellipsis']")
titles_tags

[<selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="9e52c644-b142-4fb0-a416-d0cc9ed859df")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="e0dde576-7c28-45d9-a32d-850d366b43bc")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="910f4df3-2344-453f-8c02-37d399f065e7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="806b0401-39a2-423d-9e35-470fc088a633")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="d9fb8cbe-b717-4079-91bf-002b23da7343")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="ed4eaeed-aad0-4360-a053-414e18fb72d0")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="591babfb-0bf5-44bc-af29-61

In [131]:
#Loop to iterate over the tags extracted above and extract the text inside them.
job_title=[]
for i in titles_tags:
    job_title.append(i.text)
job_title

['Data Scientist / Data Analyst -Business Analyst',
 'Business Analyst- Data Scientist',
 'Data Scientist - High growth VC backed Influencer Marketplace',
 'DATA Scientist – Gurgaon (Exp 3-6 years)',
 'DATA Scientist – Gurgaon (Exp 3-6 years)',
 'Data Scientist - Noida',
 'Data Scientist',
 'Data Scientist',
 'Data Scientist',
 'Senior Data Scientist II 5+ yrs II Gurgaon',
 'Senior Data Scientist',
 'Data Scientist',
 'Data Scientist Machine Learning',
 'Data Scientist',
 'Data Scientist',
 'Data Scientist',
 'Data Scientist',
 'Data Scientist',
 'Associate Data Scientist',
 'Associate Data Scientist']

In [132]:
#So lets extract all the tags having the job titles
company_tags = driver.find_elements_by_xpath("//a[@class='subTitle ellipsis fleft']")
company_tags

[<selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="f9158caa-d91f-44a5-80a4-e0f04ae012ac")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="8df3c938-55e1-4770-9a5f-5424c163eded")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="54b2819c-caba-4d0b-b68c-8c9abd9d1b55")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="0cf01c5d-637b-4aa1-8271-e4f71f8b9a34")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="f96ee5eb-ebc4-415d-bce9-fe53c50c7e78")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="95def420-4521-4aa0-aeeb-a7da984f629e")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="05199e02-51cb-4e16-abc8-a3

In [133]:
#Loop to iterate over the tags extracted above and extract the text inside them.
company_name=[]
for i in company_tags:
    company_name.append(i.text)
company_name

['Inflexion Analytix Private Limited',
 'Wipro',
 'Ravgins International Pvt. Ltd.',
 'CRESCENDO GLOBAL LEADERSHIP HIRING INDIA PRIVATE L IMITED',
 'CRESCENDO GLOBAL LEADERSHIP HIRING INDIA PRIVATE L IMITED',
 'Optum Global Solutions (India) Private Limited',
 'IBM India Pvt. Limited',
 'Blitz Jobs',
 'Country Veggie',
 'Zenatix Solutions Private Limited',
 'iNICU',
 'BlackBuck',
 'Delhivery',
 'Sentieo',
 'Mahajan Imaging',
 'Mahajan Imaging',
 'Mobikwik',
 'itForte Staffing Services Private Ltd.',
 'Blow Trumpet Solutions',
 'Right Step Consulting']

In [134]:
#So lets extract all the tags having the experience
experience_tags = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi experience']//span")
experience_tags

[<selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="3ffe256a-41fb-4cd3-b114-e74d67268676")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="6da1ef59-d7f5-49fc-8e66-d541a01434fd")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="e5d93e94-0d67-4559-b56e-722e62885e60")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="fd1feaff-defc-4a2f-b715-46866e431625")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="43b48313-90b7-4dc8-9953-3453566b37ac")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="ca07c5d6-6bec-4148-a476-f5c5f284af2d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="658d5d6c-7cf9-44dd-b30f-03

In [135]:

#Loop to iterate over the tags extracted above and extract the text inside them.
experience=[]
for i in experience_tags:
    experience.append(i.text)
experience

['0-3 Yrs',
 '2-5 Yrs',
 '3-5 Yrs',
 '3-6 Yrs',
 '3-6 Yrs',
 '3-5 Yrs',
 '4-9 Yrs',
 '3-5 Yrs',
 '1-3 Yrs',
 '5-10 Yrs',
 '1-5 Yrs',
 '3-7 Yrs',
 '1-3 Yrs',
 '2-7 Yrs',
 '2-6 Yrs',
 '2-6 Yrs',
 '3-5 Yrs',
 '3-8 Yrs',
 '1-5 Yrs',
 '3-6 Yrs']

In [136]:
#So lets extract all the tags having the salary
salary_tags = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi salary']//span")
salary_tags

[<selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="abd59f26-b5d8-4344-9598-480157d8fa9d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="01c147b6-6958-4cbe-a18f-4da88c96319f")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="5487f60c-f7a4-4cb4-8c8b-ac2e4a92bd16")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="a9ce635f-d701-4c37-a1f1-e09f87e3a341")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="8914a785-be2b-4f68-bd70-6237669eebce")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="930679d3-830a-4d67-b596-96317df29450")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="dab96c8c-fbb8-471d-a308-97

In [137]:

#Loop to iterate over the tags extracted above and extract the text inside them.
salary=[]
for i in salary_tags:
    salary.append(i.text)
salary

['3,50,000 - 4,50,000 PA.',
 '3,50,000 - 6,50,000 PA.',
 '5,00,000 - 6,00,000 PA.',
 '5,00,000 - 9,00,000 PA.',
 '5,00,000 - 8,00,000 PA.',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 '5,00,000 - 14,00,000 PA.',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed',
 'Not disclosed']

In [138]:

#So lets extract all the tags having the salary
location_tags = driver.find_elements_by_xpath("//li[@class='fleft grey-text br2 placeHolderLi location']//span")
location_tags

[<selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="ba841129-cc1a-486b-9095-71cb94262203")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="20a9e4cb-2c24-4b93-bad4-dba6c3abacb7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="10c1f325-6c76-4f05-8d68-ee2fea2d5b3d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="45811174-a3d5-44c3-a9ee-32a08995958e")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="dd12e00d-c5dc-4ebf-bc76-2a9a62ecfc10")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="021f9fbe-849a-4ee3-8320-cd4621e9b300")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11f7cdf24b721a18c02fe2684af14997", element="50576b6c-f128-4281-b7d5-06

In [139]:

#Loop to iterate over the tags extracted above and extract the text inside them.
location=[]
for i in location_tags:
    location.append(i.text)
location

['Mumbai, Hyderabad/Secunderabad, Pune, Gurgaon/Gurugram, Chennai, Bangalore/Bengaluru',
 'Noida, Gurgaon/Gurugram',
 'Bangalore/Bengaluru, Delhi / NCR, Mumbai (All Areas)',
 'Gurgaon/Gurugram, Delhi / NCR',
 'Gurgaon/Gurugram, Delhi / NCR',
 'Noida',
 'Noida, Hyderabad/Secunderabad, Bangalore/Bengaluru',
 'Noida',
 'Bharuch, Jaipur, Bhopal, Mumbai, Jhansi, Nagpur, Ghaziabad, Jaunpur, Kanpur, New Delhi, Lucknow, Agra, Gurgaon/Gurugram, Rajkot, Bangalore/Bengaluru',
 'Gurgaon/Gurugram',
 'Delhi',
 'Gurgaon/Gurugram, Bangalore/Bengaluru',
 'Gurgaon/Gurugram',
 'Delhi',
 'New Delhi',
 'Delhi',
 'New Delhi, Gurgaon/Gurugram, Delhi / NCR',
 'Gurgaon',
 'New Delhi',
 'Noida']

In [141]:
jobs=pd.DataFrame({})
jobs['title'] =job_title
jobs['company'] =company_name
jobs['experience'] = experience
jobs['salary'] = salary
jobs['location'] = location

jobs

Unnamed: 0,title,company,experience,salary,location
0,Data Scientist / Data Analyst -Business Analyst,Inflexion Analytix Private Limited,0-3 Yrs,"3,50,000 - 4,50,000 PA.","Mumbai, Hyderabad/Secunderabad, Pune, Gurgaon/..."
1,Business Analyst- Data Scientist,Wipro,2-5 Yrs,"3,50,000 - 6,50,000 PA.","Noida, Gurgaon/Gurugram"
2,Data Scientist - High growth VC backed Influen...,Ravgins International Pvt. Ltd.,3-5 Yrs,"5,00,000 - 6,00,000 PA.","Bangalore/Bengaluru, Delhi / NCR, Mumbai (All ..."
3,DATA Scientist – Gurgaon (Exp 3-6 years),CRESCENDO GLOBAL LEADERSHIP HIRING INDIA PRIVA...,3-6 Yrs,"5,00,000 - 9,00,000 PA.","Gurgaon/Gurugram, Delhi / NCR"
4,DATA Scientist – Gurgaon (Exp 3-6 years),CRESCENDO GLOBAL LEADERSHIP HIRING INDIA PRIVA...,3-6 Yrs,"5,00,000 - 8,00,000 PA.","Gurgaon/Gurugram, Delhi / NCR"
5,Data Scientist - Noida,Optum Global Solutions (India) Private Limited,3-5 Yrs,Not disclosed,Noida
6,Data Scientist,IBM India Pvt. Limited,4-9 Yrs,Not disclosed,"Noida, Hyderabad/Secunderabad, Bangalore/Benga..."
7,Data Scientist,Blitz Jobs,3-5 Yrs,Not disclosed,Noida
8,Data Scientist,Country Veggie,1-3 Yrs,Not disclosed,"Bharuch, Jaipur, Bhopal, Mumbai, Jhansi, Nagpu..."
9,Senior Data Scientist II 5+ yrs II Gurgaon,Zenatix Solutions Private Limited,5-10 Yrs,"5,00,000 - 14,00,000 PA.",Gurgaon/Gurugram


Q4: Write a python program to scrape data for first 10 job results for Data scientist 
Designation in Noida location. You have to scrape company_name, No. of days 
ago when job was posted, Rating of the company.
This task will be done in following steps:
1. first get the webpage https://www.glassdoor.co.in/index.htm
2. Enter “Data Scientist” in “Job Title,Keyword,Company” field and enter “Noida” 
in “location” field.
3. Then click the search button. You will land up in the below page:
4. Then scrape the data for the first 10 jobs results you get in the above shown 
page.
5. Finally create a dataframe of the scraped data.
Note- All of the above steps have to be done in code. No step is to be done 
manually.

In [25]:
#Lets now import all the required libraries
import selenium
import pandas as pd
from selenium import webdriver

In [26]:
#Lets first connect to the web driver
driver = webdriver.Chrome('chromedriver.exe')

In [27]:
url = 'https://www.glassdoor.co.in/'
driver.get(url)

In [44]:
#finding elements for job search bar
search_job = driver.find_element_by_id("sc.keyword")
search_job

<selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="ea19aa8a-9d1d-409b-99a4-2f0c674350c0")>

In [45]:
#write on search bar
search_job.send_keys('Data Scientist')

In [46]:
# finding elements for job location bar
search_loc = driver.find_element_by_id('sc.location')
search_loc.send_keys('Noida')

In [47]:
#do click using xpath function
search_btn = driver.find_element_by_xpath("//button[@class='gd-ui-button ml-std col-auto SearchStyles__newSearchButton css-iixdfr']//span")
search_btn.click()

So now lets 4 create empty lists. In these lists the data will be stored while scraping.


In [56]:

#So lets extract all the tags having the company name
company_tags = driver.find_elements_by_xpath("//a[@class=' css-l2wjgv e1n63ojh0 jobLink']")
company_tags

[<selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="df16cc28-94c7-4354-823f-4c84f3beb968")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="6887aab5-53c9-40c3-a478-299616779fc5")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="000728e0-3637-47a9-b90a-5c0b7eaee737")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="69f53a33-2e84-4df1-b04d-848b2112c111")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="c4829080-7f35-4d9c-821f-cbadd0a39157")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="054442e9-b70b-4f5c-833a-b81e1f5f9920")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="8cfc1687-c6ec-490d-9551-f7

In [57]:

#Loop to iterate over the tags extracted above and extract the text inside them.
company_name=[]
for i in company_tags:
    company_name.append(i.text)
company_name

['Ericsson',
 'Mansha Solutions',
 'Lantern Digital Services',
 'Boston Consulting Group',
 'Sparkbpl',
 'Priority Vendor',
 'Biz2Credit Inc',
 'MasterCard',
 'Gauge Data Solutions',
 'Techlive',
 'Skyjobs hr services',
 'SearchUrCollege',
 'xtLytics',
 'Siemens Technology and Services Private Limited',
 'WishFin',
 'Webhelp',
 'Khan Academy',
 'Team Computers',
 'Ank Aha',
 'Fitfyles',
 'Mahajan Imaging',
 'Maruti Suzuki India Ltd',
 'Profisor Services',
 'Sentieo',
 'Analytics Vidhya',
 'Axslogic',
 'ESRI, Inc.',
 'Nihilent',
 'Saffron Consultancy Services',
 'Futures and Careers']

In [59]:

#So lets extract all the tags having no.of days ago it was posted
job_tags = driver.find_elements_by_xpath("//div[@class='d-flex align-items-end pl-std css-mi55ob']")
job_tags

[<selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="ff8478a8-fd9b-4f0d-9fb8-7d24c443cc2f")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="88bc75d6-d8fd-4cfd-90ac-c5242cfb4f13")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="12c36678-9613-4048-8b45-483eedd98708")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="0c14b8a9-71ed-4266-aa40-ce7942c9773a")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="feeeb23c-60ff-4d60-b2eb-15438194313e")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="3c85ccef-ea8a-4ae1-9f73-841b9375b5c0")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="77cbcb12-befd-4e5d-a59f-a3

In [61]:

#Loop to iterate over the tags extracted above and extract the text inside them.
job_posted=[]
for i in job_tags:
    job_posted.append(i.text)
job_posted

['16d',
 '11d',
 '9d',
 '2d',
 '16d',
 '',
 '24d',
 '24d',
 '15d',
 '22d',
 '8d',
 '15d',
 '2d',
 '3d',
 '26d',
 '9d',
 '1d',
 '3d',
 '15d',
 '15d',
 '15d',
 '24h',
 '15d',
 '17d',
 '15d',
 '13d',
 '24d',
 '24d',
 '15d',
 '27d']

In [62]:

#So lets extract all the ratings
rating_tags = driver.find_elements_by_xpath("//span[@class='css-19pjha7 e1cjmv6j1']")
rating_tags

[<selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="4b13fac1-8e8d-4e8a-8055-929412242cf0")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="b78006c0-8ffb-4a38-9452-7dd032b4cd3b")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="fc404d00-f989-45c8-863b-964d73505aa4")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="82bbb7d0-c876-4304-ae0a-dc0e7fd273a9")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="137558e5-c6be-44dd-bea4-b2583b492a09")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="201c9f72-bc80-443f-bce2-2c6e60861c88")>,
 <selenium.webdriver.remote.webelement.WebElement (session="d65939e301407b5aaf481f8568269d83", element="5a40e043-1ea2-4b6a-923c-8f

In [63]:
#Loop to iterate over the tags extracted above and extract the text inside them.
ratings=[]
for i in rating_tags:
    ratings.append(i.text)
ratings

['4.1',
 '4.1',
 '3.7',
 '3.8',
 '3.1',
 '5.0',
 '4.1',
 '3.8',
 '4.1',
 '4.0',
 '4.0',
 '4.5',
 '3.7',
 '4.3',
 '3.6']

In [65]:
print(len(company_name),len(ratings),len(job_posted))


30 15 30


Problem Statement:
Q5: Write a python program to scrape the salary data for Data Scientist designation 
in Noida location.
You have to scrape Company name, Number of salaries, Average salary, Min
salary, Max Salary.
The above task will be, done as shown in the below steps:
1. first get the webpage https://www.glassdoor.co.in/Salaries/index.htm
2. Enter “Data Scientist” in Job title field and “Noida” in location field.
3. Click the search button.
4. After that you will land on the below page.
You have to scrape whole data from this webpage
5. Scrape data for first 10 companies. Scrape the min salary, max salary, company 
name, Average salary and rating of the company.
6.Store the data in a dataframe.
Note that all of the above steps have to be done by coding only and not manually

In [66]:
#Lets now import all the required libraries
import selenium
import pandas as pd
from selenium import webdriver

In [67]:
#Lets first connect to the web driver
driver = webdriver.Chrome('chromedriver.exe')

In [68]:
url = 'https://www.glassdoor.co.in/Salaries/'
driver.get(url)

In [69]:
#finding elements for job search bar
search_job = driver.find_element_by_id('KeywordSearch')
search_job

<selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="d9fef89b-19f8-4986-832a-81fc95dd35fe")>

In [70]:

#write on search bar
search_job.send_keys('Data Scientist')


In [72]:
# finding elements for job location bar
search_loc = driver.find_element_by_id('LocationSearch')
search_loc.send_keys('Noida')

In [73]:
#do click using xpath function
search_btn = driver.find_element_by_xpath("//button[@class='gd-btn-mkt']")
search_btn.click()

So now lets some create empty lists. In these lists the data will be stored while scraping.

In [79]:
#So lets extract all the tags having the job titles
company_tags = driver.find_elements_by_xpath("//div[@class='d-flex']")
company_tags

[<selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="d4e3e9a7-1c5c-4a7d-bbcf-2609094d17bb")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="ec881cd7-8647-4d72-b67c-9942a3a9c148")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="8483d7ea-6848-4c1e-a3e4-691f4dbd0397")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="333c7c14-63c3-4091-b90f-bdc375668539")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="560a1f56-188e-4115-b34c-ca1db268d675")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="fde7d79d-bf1f-49f7-bc7d-d2b5456d70fc")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="0a0655c0-e12c-4b36-839e-22

In [80]:

#Loop to iterate over the tags extracted above and extract the text inside them.
company_name=[]
for i in company_tags:
    company_name.append(i.text)
company_name

['Data Scientist\nTata Consultancy Services\n17 salaries\nSee 130 salaries from all locations',
 'Data Scientist\nAccenture\n14 salaries\nSee 61 salaries from all locations',
 'Data Scientist\nIBM\n14 salaries\nSee 116 salaries from all locations',
 'Data Scientist\nEricsson-Worldwide\n14 salaries\nSee 30 salaries from all locations',
 'Data Scientist\nDelhivery\n14 salaries\nSee 18 salaries from all locations',
 'Data Scientist\nUnitedHealth Group\n11 salaries\nSee 18 salaries from all locations',
 'Data Scientist\nValiance Solutions\n9 salaries\nSee 11 salaries from all locations',
 'Data Scientist\nZS Associates\n8 salaries\nSee 21 salaries from all locations',
 'Data Scientist\nEXL Service\n8 salaries\nSee 10 salaries from all locations',
 'Optum Global Solutions\nData Scientist\nOptum Global Solutions\n8 salaries\nSee 12 salaries from all locations',
 'Data Scientist\nInnovaccer\n8 salaries\nSee 12 salaries from all locations',
 'Data Scientist\nCognizant Technology Solutions\n6 s

In [81]:
#So lets extract all the tags having the job titles
avgsal_tags = driver.find_elements_by_xpath("//div[@class='col-2 d-none d-md-flex flex-row justify-content-end']//strong")
avgsal_tags

[<selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="1eccb1c8-6c2e-4252-8d9c-7a3f4a142668")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="3fb1ce7f-5917-479d-a009-bcba1da70ea7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="658dd5c7-4ded-4e87-b6da-a88581ad1473")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="c9576c23-bf4a-4880-b956-3966909e6362")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="9e51b717-d5bf-4185-aab7-30ae33377440")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="3b60093f-d934-41e2-b90e-207ca1a46059")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="405904ea-ef4e-4c15-bf0f-e0

In [82]:

#Loop to iterate over the tags extracted above and extract the text inside them.
avg_salary=[]
for i in avgsal_tags:
    avg_salary.append(i.text)
avg_salary

['₹ 6,14,306',
 '₹ 11,46,533',
 '₹ 8,97,795',
 '₹ 7,38,057',
 '₹ 12,39,781',
 '₹ 13,36,142',
 '₹ 8,15,192',
 '₹ 11,35,221',
 '₹ 11,44,243',
 '₹ 14,13,288',
 '₹ 12,07,110',
 '₹ 10,07,410',
 '₹ 13,18,851',
 '₹ 10,52,718',
 '₹ 14,13,078',
 '₹ 9,61,653',
 '₹ 11,00,000',
 '₹ 15,887',
 '₹ 6,75,000',
 '₹ 15,00,000']

In [83]:
#So lets extract all the tags having the job titles
minmax_tags = driver.find_elements_by_xpath("//div[@class='common__RangeBarStyle__values d-flex justify-content-between ']//span")
minmax_tags

[<selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="34f9f19f-a6ff-4dbc-a27d-7d7f94a26134")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="74cef0b8-f5d7-4ea5-897e-754d7f5d56e8")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="450dd6b9-b528-41b7-a961-22892c40a495")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="199d6dbe-ae6a-4702-9949-5f7a250c7105")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="1c2c6812-ef25-4f5d-a000-d4497c517d5a")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="89ed82c9-8fae-491f-97d0-4a6a97c35070")>,
 <selenium.webdriver.remote.webelement.WebElement (session="11d343cd2e632f17da1c555568cc50b6", element="e2605490-a558-4b2a-945b-4f

In [92]:
#Loop to iterate over the tags extracted above and extract the text inside them.
minmax_sal=[]
for i in minmax_tags:
    minmax_sal.append(i.text)
minmax_sal

['₹343K',
 '₹1,250K',
 '₹577K',
 '₹2,213K',
 '₹586K',
 '₹2,730K',
 '₹355K',
 '₹1,613K',
 '₹450K',
 '₹11,622K',
 '₹1,069K',
 '₹1,520K',
 '₹502K',
 '₹1,465K',
 '₹202K',
 '₹1,809K',
 '₹575K',
 '₹1,520K',
 '₹1,014K',
 '₹2,149K',
 '₹620K',
 '₹1,695K',
 '₹792K',
 '₹1,263K',
 '₹807K',
 '₹1,986K',
 '₹405K',
 '₹1,703K',
 '₹971K',
 '₹1,883K',
 '₹400K',
 '₹1,124K',
 '₹817K',
 '₹2,006K',
 '₹12K',
 '₹63K',
 '₹86K',
 '₹1,358K',
 '₹772K',
 '₹2,091K']

In [93]:
#using slicing
minsalary = slice(0,40,2)
min_salary = minmax_sal[minsalary] 
min_salary


['₹343K',
 '₹577K',
 '₹586K',
 '₹355K',
 '₹450K',
 '₹1,069K',
 '₹502K',
 '₹202K',
 '₹575K',
 '₹1,014K',
 '₹620K',
 '₹792K',
 '₹807K',
 '₹405K',
 '₹971K',
 '₹400K',
 '₹817K',
 '₹12K',
 '₹86K',
 '₹772K']

In [94]:
minmax_sal

['₹343K',
 '₹1,250K',
 '₹577K',
 '₹2,213K',
 '₹586K',
 '₹2,730K',
 '₹355K',
 '₹1,613K',
 '₹450K',
 '₹11,622K',
 '₹1,069K',
 '₹1,520K',
 '₹502K',
 '₹1,465K',
 '₹202K',
 '₹1,809K',
 '₹575K',
 '₹1,520K',
 '₹1,014K',
 '₹2,149K',
 '₹620K',
 '₹1,695K',
 '₹792K',
 '₹1,263K',
 '₹807K',
 '₹1,986K',
 '₹405K',
 '₹1,703K',
 '₹971K',
 '₹1,883K',
 '₹400K',
 '₹1,124K',
 '₹817K',
 '₹2,006K',
 '₹12K',
 '₹63K',
 '₹86K',
 '₹1,358K',
 '₹772K',
 '₹2,091K']

In [97]:
#using slicing
maxsalary = slice(1,40,2)
max_salary = minmax_sal[maxsalary] 
max_salary


['₹1,250K',
 '₹2,213K',
 '₹2,730K',
 '₹1,613K',
 '₹11,622K',
 '₹1,520K',
 '₹1,465K',
 '₹1,809K',
 '₹1,520K',
 '₹2,149K',
 '₹1,695K',
 '₹1,263K',
 '₹1,986K',
 '₹1,703K',
 '₹1,883K',
 '₹1,124K',
 '₹2,006K',
 '₹63K',
 '₹1,358K',
 '₹2,091K']

In [98]:
print(len(company_name),len(avg_salary),len(min_salary),len(max_salary))

20 20 20 20


In [100]:
jobs=pd.DataFrame({})
jobs['Company'] =company_name
jobs['Avg Salary'] = avg_salary
jobs['Max salary'] = max_salary
jobs['Min Salary'] = min_salary

jobs

Unnamed: 0,Company,Avg Salary,Max salary,Min Salary
0,Data Scientist\nTata Consultancy Services\n17 ...,"₹ 6,14,306","₹1,250K",₹343K
1,Data Scientist\nAccenture\n14 salaries\nSee 61...,"₹ 11,46,533","₹2,213K",₹577K
2,Data Scientist\nIBM\n14 salaries\nSee 116 sala...,"₹ 8,97,795","₹2,730K",₹586K
3,Data Scientist\nEricsson-Worldwide\n14 salarie...,"₹ 7,38,057","₹1,613K",₹355K
4,Data Scientist\nDelhivery\n14 salaries\nSee 18...,"₹ 12,39,781","₹11,622K",₹450K
5,Data Scientist\nUnitedHealth Group\n11 salarie...,"₹ 13,36,142","₹1,520K","₹1,069K"
6,Data Scientist\nValiance Solutions\n9 salaries...,"₹ 8,15,192","₹1,465K",₹502K
7,Data Scientist\nZS Associates\n8 salaries\nSee...,"₹ 11,35,221","₹1,809K",₹202K
8,Data Scientist\nEXL Service\n8 salaries\nSee 1...,"₹ 11,44,243","₹1,520K",₹575K
9,Optum Global Solutions\nData Scientist\nOptum ...,"₹ 14,13,288","₹2,149K","₹1,014K"


Q6 : Scrape data of first 100 sunglasses listings on flipkart.com. You have to 
scrape four attributes:
1. Brand
2. Product Description
3. Price
4. Discount %
The attributes which you have to scrape is ticked marked in the below image.
To scrape the data you have to go through following steps:
1. Go to flipkart webpage by url https://www.flipkart.com/
2. Enter “sunglasses” in the search field where “search for products, brands and 
more” is written and click the search icon
3. after that you will reach to a webpage having a lot of sunglasses. From this page 
you can scrap the required data as usual.
4. after scraping data from the first page, go to the “Next” Button at the bottom of 
the page , then click on it.
5. Now scrape data from this page as usual
6. repeat this until you get data for 100 sunglasses.
Note that all of the above steps have to be done by coding only and not manually.

In [87]:
#Lets now import all the required libraries
import selenium
import pandas as pd
from selenium import webdriver

In [88]:
#Lets first connect to the web driver
driver = webdriver.Chrome('chromedriver.exe')

In [89]:
url = 'https://www.flipkart.com/'
driver.get(url)


In [90]:
#finding elements for job search bar
search_bar = driver.find_element_by_xpath("//div[@class='_3OO5Xc']//input")
search_bar

<selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="cc3afedb-21ea-4007-a52c-a99e8287a49f")>

In [91]:
#write on search bar
search_bar.send_keys('Sunglasses')

In [92]:
#do click using xpath function
search_btn = driver.find_element_by_xpath("//button[@class='L0Z3Pu']")
search_btn.click()

So now lets 4 create empty lists. In these lists the data will be stored while scraping.

In [93]:

#So lets extract all the tags having the job titles
brand_tags = driver.find_elements_by_xpath("//div[@class='_2WkVRV']")
brand_tags

[<selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="d18dc9a5-ee4c-428c-8c3e-c567dde42f5c")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="7c0f5aec-3c2a-4d6e-bae7-2338ac96d225")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="fd2e21a9-1f59-4a53-9263-81c8a1490e0a")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="949d0128-c2b7-47c7-b3e9-1dfa3aab818c")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="e616824c-4ca5-4dff-bc33-d95f745f956c")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="a9bdfd6a-3d6b-426f-9b30-2038d825db8d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="1367be4d-c37a-4784-a0cb-19

In [94]:
#Loop to iterate over the tags extracted above and extract the text inside them.
brand_name=[]
for i in brand_tags:
    brand_name.append(i.text)
brand_name

['SUNBEE',
 'HAMIW COLLECTION',
 'Fastrack',
 'ROZZETTA CRAFT',
 'PIRASO',
 'NuVew',
 'Singco India',
 'DEIXELS',
 'HIPPON',
 'PHENOMENAL',
 'GANSTA',
 'Silver Kartz',
 'Villain',
 'HAMIW COLLECTION',
 'ROYAL SON',
 'hipe',
 'Flizz',
 'AweStuffs',
 'AISLIN',
 'HAMIW COLLECTION',
 'elegante',
 'Ray-Ban',
 'Cristiano Ronnie',
 'HAMIW COLLECTION',
 'Crackers',
 'FOSSIL',
 'Fravy',
 'Singco',
 'Wrogn',
 'ROYAL SON',
 'PETER JONES',
 'SRPM',
 'IDEE',
 'ROYAL SON',
 'Elligator',
 'Fastrack',
 'ROZZETTA CRAFT',
 'PIRASO',
 'GANSTA',
 'hipe']

In [95]:
#So lets extract all the tags having the job titles
prod_tags = driver.find_elements_by_xpath("//a[@class='IRpwTa']")
prod_tags

[<selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="e3cf5237-2f6d-4187-bf18-cee30c7ade91")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="c5ef766b-be3b-4950-b8a4-3f70d92793e2")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="92b5f0b0-a077-4862-9c94-10708b6ff5d8")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="3def239d-518d-43a0-b143-05e6d0d4e687")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="5e0d9f4d-0400-4b94-a6ef-a038ebc0f868")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="6dbec322-3e04-47ab-adff-5d4821c02215")>,
 <selenium.webdriver.remote.webelement.WebElement (session="fa9b16f0ef8ba96f54b36e9bf2e65201", element="b4ae7fbf-ca69-4898-b814-26

In [96]:
#Loop to iterate over the tags extracted above and extract the text inside them.
prod_desc=[]
for i in prod_tags:
    prod_desc.append(i.text)
prod_desc

['UV Protection, Polarized, Mirrored Round Sunglasses (Fr...',
 'UV Protection Round Sunglasses (53)',
 'UV Protection Rectangular Sunglasses (Free Size)',
 'UV Protection Retro Square Sunglasses (Free Size)',
 'UV Protection Aviator Sunglasses (Free Size)',
 'UV Protection Aviator Sunglasses (58)',
 'Mirrored, Riding Glasses, Others Sports Sunglasses (50)',
 'UV Protection Aviator, Wayfarer Sunglasses (Free Size)',
 'UV Protection Wayfarer Sunglasses (55)',
 'UV Protection Retro Square Sunglasses (Free Size)',
 'UV Protection, Mirrored Wayfarer Sunglasses (53)',
 'UV Protection Oval Sunglasses (56)',
 'Others Retro Square Sunglasses (Free Size)',
 'UV Protection Round Sunglasses (Free Size)',
 'UV Protection Rectangular Sunglasses (58)',
 'UV Protection, Mirrored Round Sunglasses (Free Size)',
 'UV Protection Retro Square Sunglasses (Free Size)',
 'UV Protection Retro Square Sunglasses (Free Size)',
 'UV Protection Oval Sunglasses (60)',
 'UV Protection Wayfarer, Sports, Shield, Recta

In [46]:
#So lets extract all the tags having the job titles
price_tags = driver.find_elements_by_xpath("//div[@class='_30jeq3']")
price_tags

[<selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="f918a5f3-e288-4ef7-9ef6-48b78c661516")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="3d8543c5-9f31-4989-abd3-5df390e14b17")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="6c7878de-9ca5-4ee0-892b-6d44d10c2142")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="c4887f89-6d9f-4718-afcd-a59a0bc114ab")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="2f488713-46da-4637-a407-d513a7dafc0c")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="1533df73-bfab-4a42-b014-60dd8d2d686d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="074aefba-f260-4684-aea0-a7

In [47]:
#Loop to iterate over the tags extracted above and extract the text inside them.
price=[]
for i in price_tags:
    price.append(i.text)
price

['₹210',
 '₹395',
 '₹570',
 '₹499',
 '₹349',
 '₹236',
 '₹215',
 '₹213',
 '₹251',
 '₹399',
 '₹199',
 '₹299',
 '₹599',
 '₹395',
 '₹219',
 '₹474',
 '₹403',
 '₹399',
 '₹449',
 '₹399',
 '₹764',
 '₹7,281',
 '₹584',
 '₹664',
 '₹1,214',
 '₹170',
 '₹2,123',
 '₹289',
 '₹327',
 '₹995',
 '₹369',
 '₹189',
 '₹1,181',
 '₹259',
 '₹279',
 '₹733',
 '₹404',
 '₹250',
 '₹281',
 '₹419']

In [48]:
#So lets extract all the tags having the job titles
discount_tags = driver.find_elements_by_xpath("//div[@class='_3Ay6Sb']//span")
discount_tags

[<selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="06a431ca-e7f7-40f9-a012-a1f865a3dd41")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="abc6d888-3cba-4617-8c4e-560c64c86b87")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="e957b725-47ab-4a29-a828-65160a6faf59")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="528d95d5-9be4-401b-a644-eb42cce8cb99")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="b3ac6de8-dfc4-4202-9eff-321c9022fb70")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="0c1326a5-8693-4739-afa3-18944aae367d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="84293ab5-bced-4443-92e3-3b

In [49]:
#Loop to iterate over the tags extracted above and extract the text inside them.
discount=[]
for i in discount_tags:
    discount.append(i.text)
discount

['85% off',
 '77% off',
 '28% off',
 '77% off',
 '78% off',
 '84% off',
 '72% off',
 '82% off',
 '79% off',
 '80% off',
 '80% off',
 '75% off',
 '20% off',
 '74% off',
 '86% off',
 '68% off',
 '59% off',
 '81% off',
 '70% off',
 '73% off',
 '69% off',
 '10% off',
 '70% off',
 '66% off',
 '62% off',
 '82% off',
 '51% off',
 '80% off',
 '83% off',
 '73% off',
 '69% off',
 '62% off',
 '58% off',
 '67% off',
 '88% off',
 '18% off',
 '79% off',
 '84% off',
 '85% off',
 '71% off']

In [54]:
#do click using xpath function
search_btn = driver.find_element_by_xpath("//a[@href='/search?q=Sunglasses&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page=2']")
search_btn

<selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="aaf0f672-7d79-43ea-a1cb-98c1a65ba500")>

In [55]:
search_btn.click()

In [59]:
#So lets extract all the tags having the job titles
brand_tags1 = driver.find_elements_by_xpath("//div[@class='_2WkVRV']")
brand_tags1

[<selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="c71e06ee-b9e0-424f-bae8-2fa17393acb7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="85b47154-a1dd-4b38-9c71-a0585e5f453d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="d35cc4c9-83d8-4907-aecb-8a59a7709a5d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="32a0b2f0-e6d7-4a0c-b8b4-d9663a286cd9")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="1a8867ea-e501-4347-b2da-119137e9e859")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="3c4f5978-5a98-48c6-b1b1-2c2bd0a357a8")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="360a5002-39ca-4ad8-a0b2-08

In [61]:
#Loop to iterate over the tags extracted above and extract the text inside them.
brand_name1=[]
for i in brand_tags1:
    brand_name1.append(i.text)
brand_name1

['Silver Kartz',
 'GANSTA',
 'PHENOMENAL',
 'AISLIN',
 'DEIXELS',
 'ROYAL SON',
 'hipe',
 'elegante',
 'Wrogn',
 'AISLIN',
 'AISLIN',
 'Ray-Ban',
 'Fravy',
 'AISLIN',
 'NuVew',
 'Fastrack',
 'ROZZETTA CRAFT',
 'Singco India',
 'GANSTA',
 'AISLIN',
 'ROYAL SON',
 'PHENOMENAL',
 'DEIXELS',
 'AISLIN',
 'Silver Kartz',
 'hipe',
 'PIRASO',
 'Wrogn',
 'Ray-Ban',
 'AISLIN',
 'Fastrack',
 'ROZZETTA CRAFT',
 'GANSTA',
 'NuVew',
 'ROYAL SON',
 'PHENOMENAL',
 'Singco India',
 'DEIXELS',
 'hipe',
 'AISLIN']

In [63]:
#So lets extract all the tags having the job titles
prod_tags1 = driver.find_elements_by_xpath("//a[@class='IRpwTa']")
prod_tags1

[<selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="fbffd8a6-6ed2-469b-b7da-14a5596bf570")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="525f81a7-ab41-4417-bc20-e8598d6a0fc1")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="3a60d0e0-c97d-4e6f-b133-79f729982cc6")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="09ab747b-9c13-4426-8ade-9e1c4324a750")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="e14ef264-36d8-44cc-a48c-2d16962c0dd9")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="9b8037b4-5144-48a6-83ab-a3367a8b366d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="ff87050d-bfb1-46a9-9e77-91

In [65]:
#Loop to iterate over the tags extracted above and extract the text inside them.
prod_desc1=[]
for i in prod_tags1:
    prod_desc1.append(i.text)
prod_desc1

['UV Protection Wayfarer Sunglasses (Free Size)',
 'UV Protection Aviator Sunglasses (57)',
 'UV Protection, Mirrored Retro Square Sunglasses (53)',
 'UV Protection Aviator, Wayfarer Sunglasses (60)',
 'UV Protection Round Sunglasses (Free Size)',
 'Mirrored Aviator Sunglasses (58)',
 'UV Protection, Mirrored Aviator Sunglasses (Free Size)',
 'UV Protection Wrap-around Sunglasses (Free Size)',
 'UV Protection Wayfarer Sunglasses (55)',
 'UV Protection Aviator, Wrap-around Sunglasses (60)',
 'UV Protection Rectangular, Cat-eye Sunglasses (58)',
 'Mirrored Aviator Sunglasses (63)',
 'UV Protection, Gradient, Night Vision Retro Square Sung...',
 'UV Protection, Gradient Wayfarer Sunglasses (57)',
 'UV Protection Cat-eye Sunglasses (60)',
 'UV Protection Aviator Sunglasses (Free Size)',
 'UV Protection Retro Square Sunglasses (Free Size)',
 'UV Protection Aviator Sunglasses (Free Size)',
 'UV Protection Aviator Sunglasses (57)',
 'Polarized, UV Protection Wayfarer, Rectangular Sunglass...'

In [66]:
#So lets extract all the tags having the job titles
price_tags1 = driver.find_elements_by_xpath("//div[@class='_30jeq3']")
price_tags1

[<selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="b99f0197-6f66-4fb2-9c55-fcffd553233c")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="60559848-7e77-4cf8-9969-6ff0c348d069")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="f1ce2e3c-e9e8-4ea1-bb83-73f3d72a35f9")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="c8890b7a-d560-48b8-a86c-9805bf0f7f68")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="e99d038d-d029-4b3d-9ffb-979b75e90f1d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="ead44f99-60ed-4f85-83a7-051f869bedb7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="c0035468-22a4-45aa-bf36-1f

In [67]:
#Loop to iterate over the tags extracted above and extract the text inside them.
price1=[]
for i in price_tags1:
    price1.append(i.text)
price1

['₹259',
 '₹281',
 '₹399',
 '₹1,206',
 '₹209',
 '₹399',
 '₹199',
 '₹499',
 '₹875',
 '₹1,206',
 '₹679',
 '₹3,843',
 '₹329',
 '₹525',
 '₹419',
 '₹674',
 '₹499',
 '₹339',
 '₹337',
 '₹1,132',
 '₹217',
 '₹399',
 '₹236',
 '₹1,206',
 '₹269',
 '₹210',
 '₹250',
 '₹1,129',
 '₹10,341',
 '₹1,206',
 '₹621',
 '₹449',
 '₹352',
 '₹395',
 '₹265',
 '₹399',
 '₹314',
 '₹200',
 '₹210',
 '₹785']

In [68]:
#So lets extract all the tags having the job titles
discount_tags1 = driver.find_elements_by_xpath("//div[@class='_3Ay6Sb']")
discount_tags1

[<selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="67c8704a-56b0-49b3-97d4-3f35fc6334b0")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="991ebd2f-5773-46e4-8a62-54486e444ca6")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="f7076eca-abef-445c-8a69-21837d6c6d99")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="3f4e4ffc-b953-4d80-b612-75ee8e423a40")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="9d252458-f1b2-428c-a8e8-13f62ac11e17")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="1a95f5c7-5645-49b1-90fb-135e274b8700")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="f198d8e5-30c8-4236-8102-ec

In [69]:
#Loop to iterate over the tags extracted above and extract the text inside them.
discount1=[]
for i in discount_tags1:
    discount1.append(i.text)
discount1

['82% off',
 '85% off',
 '80% off',
 '71% off',
 '73% off',
 '73% off',
 '80% off',
 '66% off',
 '70% off',
 '71% off',
 '72% off',
 '30% off',
 '83% off',
 '80% off',
 '71% off',
 '25% off',
 '77% off',
 '71% off',
 '83% off',
 '71% off',
 '78% off',
 '80% off',
 '60% off',
 '71% off',
 '77% off',
 '83% off',
 '84% off',
 '65% off',
 '10% off',
 '71% off',
 '22% off',
 '79% off',
 '82% off',
 '75% off',
 '79% off',
 '80% off',
 '84% off',
 '59% off',
 '78% off',
 '71% off']

In [72]:
#do click using xpath function for 3rd page
search_btn = driver.find_element_by_xpath("//a[@href='/search?q=Sunglasses&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page=3']")
search_btn.click()

In [73]:
#So lets extract all the tags having the brand name
brand_tags2 = driver.find_elements_by_xpath("//div[@class='_2WkVRV']")
brand_tags2

[<selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="98833726-8988-4952-af4f-a59d531d3017")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="5b6cec33-1456-44bc-abef-4b986d3c1809")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="12fde2ec-df3c-4f65-b471-39e0f7e39f44")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="8987adc4-3dd0-4e75-b7ae-fe9f6bd88692")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="61015bf4-2240-41d5-9689-5b94228b6a77")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="c0c89f80-e110-4a34-91b0-1f4dcb8a2b59")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="3c0d8399-5be7-4ce4-a476-16

In [74]:
#Loop to iterate over the tags extracted above and extract the text inside them.
brand_name2=[]
for i in brand_tags2:
    brand_name2.append(i.text)
brand_name2

['Silver Kartz',
 'ROZZETTA CRAFT',
 'GANSTA',
 'SUNBEE',
 'ROYAL SON',
 'PHENOMENAL',
 'Singco India',
 'DEIXELS',
 'hipe',
 'HAMIW COLLECTION',
 'Silver Kartz',
 'Fastrack',
 'ROZZETTA CRAFT',
 'HAMIW COLLECTION',
 'ROYAL SON',
 'GANSTA',
 'PHENOMENAL',
 'Singco India',
 'Fastrack',
 'HAMIW COLLECTION',
 'ROYAL SON',
 'ROZZETTA CRAFT',
 'PHENOMENAL',
 'HAMIW COLLECTION',
 'GANSTA',
 'Fastrack',
 'ROYAL SON',
 'ROZZETTA CRAFT',
 'GANSTA',
 'ROYAL SON',
 'Fastrack',
 'ROYAL SON',
 'ROZZETTA CRAFT',
 'AISLIN',
 'GANSTA',
 'Fastrack',
 'ROYAL SON',
 'Fastrack',
 'ROZZETTA CRAFT',
 'NuVew']

In [76]:
#So lets extract all the tags having the product desc
prod_tags2 = driver.find_elements_by_xpath("//a[@class='IRpwTa']")
prod_tags2

[<selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="2a9a56be-a1c8-4e97-bc19-27a4e557e7c4")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="dfe857bd-acf2-4f49-a177-1b84c72a9011")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="3d460530-996d-4f18-8df9-27220f9cb043")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="70a72908-0517-4598-95d9-5c10387c14db")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="c0473dd0-0ffb-4515-a3fc-5265e390fcc7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="f30d6875-ce47-423f-9b4d-03f9db5bdac3")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="de31b8d8-7a0a-415b-87b6-92

In [77]:
#Loop to iterate over the tags extracted above and extract the text inside them.
prod_desc2=[]
for i in prod_tags2:
    prod_desc2.append(i.text)
prod_desc2

['UV Protection Wayfarer, Aviator Sunglasses (88)',
 'UV Protection, Gradient Retro Square Sunglasses (Free S...',
 'UV Protection, Night Vision, Riding Glasses Aviator Sun...',
 'UV Protection, Polarized, Mirrored Round Sunglasses (Fr...',
 'Mirrored Aviator Sunglasses (55)',
 'UV Protection Clubmaster Sunglasses (Free Size)',
 'UV Protection Round Sunglasses (Free Size)',
 'UV Protection Rectangular Sunglasses (Free Size)',
 'Mirrored, UV Protection, Gradient Round Sunglasses (55)',
 'UV Protection Round Sunglasses (53)',
 'UV Protection Oval Sunglasses (56)',
 'UV Protection Aviator Sunglasses (58)',
 'Gradient, UV Protection Round Sunglasses (Free Size)',
 'UV Protection Round Sunglasses (Free Size)',
 'Polarized, UV Protection Aviator Sunglasses (58)',
 'UV Protection Retro Square Sunglasses (53)',
 'UV Protection, Mirrored, Gradient Retro Square Sunglass...',
 'Riding Glasses, UV Protection, Others Aviator Sunglasse...',
 'UV Protection Wayfarer Sunglasses (57)',
 'UV Protection 

In [78]:
#So lets extract all the tags having the discount
discount_tags2 = driver.find_elements_by_xpath("//div[@class='_3Ay6Sb']")
discount_tags2

[<selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="5427ddd5-f2a4-48e1-99aa-3328928945a9")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="f4b5e843-72b4-4431-87db-90612aa1beed")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="f842e839-6d2d-4b34-8a34-a0fc0dad122f")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="950260a1-874a-48b4-9e2a-9f5fb4bef886")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="149e0f97-bcc3-4f8e-af79-945b132261f3")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="4e30e4ee-62dc-43b9-985f-9693e329f885")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="8efc0c9c-e0de-42db-90b2-c5

In [79]:
#Loop to iterate over the tags extracted above and extract the text inside them.
discount2=[]
for i in discount_tags2:
    discount2.append(i.text)
discount2

['76% off',
 '77% off',
 '85% off',
 '80% off',
 '74% off',
 '84% off',
 '79% off',
 '86% off',
 '85% off',
 '86% off',
 '75% off',
 '13% off',
 '77% off',
 '88% off',
 '64% off',
 '84% off',
 '87% off',
 '84% off',
 '30% off',
 '88% off',
 '50% off',
 '77% off',
 '84% off',
 '86% off',
 '80% off',
 '15% off',
 '70% off',
 '75% off',
 '86% off',
 '57% off',
 '27% off',
 '66% off',
 '85% off',
 '77% off',
 '86% off',
 '15% off',
 '71% off',
 '15% off',
 '83% off',
 '74% off']

In [80]:
discount_ = discount + discount1 + discount2 

In [81]:
discount_

['85% off',
 '77% off',
 '28% off',
 '77% off',
 '78% off',
 '84% off',
 '72% off',
 '82% off',
 '79% off',
 '80% off',
 '80% off',
 '75% off',
 '20% off',
 '74% off',
 '86% off',
 '68% off',
 '59% off',
 '81% off',
 '70% off',
 '73% off',
 '69% off',
 '10% off',
 '70% off',
 '66% off',
 '62% off',
 '82% off',
 '51% off',
 '80% off',
 '83% off',
 '73% off',
 '69% off',
 '62% off',
 '58% off',
 '67% off',
 '88% off',
 '18% off',
 '79% off',
 '84% off',
 '85% off',
 '71% off',
 '82% off',
 '85% off',
 '80% off',
 '71% off',
 '73% off',
 '73% off',
 '80% off',
 '66% off',
 '70% off',
 '71% off',
 '72% off',
 '30% off',
 '83% off',
 '80% off',
 '71% off',
 '25% off',
 '77% off',
 '71% off',
 '83% off',
 '71% off',
 '78% off',
 '80% off',
 '60% off',
 '71% off',
 '77% off',
 '83% off',
 '84% off',
 '65% off',
 '10% off',
 '71% off',
 '22% off',
 '79% off',
 '82% off',
 '75% off',
 '79% off',
 '80% off',
 '84% off',
 '59% off',
 '78% off',
 '71% off',
 '76% off',
 '77% off',
 '85% off',
 '80

In [82]:
#So lets extract all the tags having the price
price_tags2 = driver.find_elements_by_xpath("//div[@class='_30jeq3']")
price_tags2

[<selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="034dc8ee-297a-40eb-bf98-9ae05a443c4d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="855ef46b-9b68-4685-9910-6fcfc3580247")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="df3f8382-ce67-4b6a-8168-8ce227b769e8")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="8a7aa1bc-fd1c-45df-b0a3-113d1bf68d54")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="f5706761-84c0-49cc-9854-a4a01bd6f5e5")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="9855d7bc-fa69-484b-87d7-e9d15e4aff53")>,
 <selenium.webdriver.remote.webelement.WebElement (session="592ff384e38d3b2b95e9d5117e87e3ea", element="2281ae99-064e-4fef-89de-76

In [83]:
#Loop to iterate over the tags extracted above and extract the text inside them.
price2=[]
for i in price_tags2:
    price2.append(i.text)
price2

['₹279',
 '₹449',
 '₹292',
 '₹329',
 '₹379',
 '₹319',
 '₹205',
 '₹200',
 '₹210',
 '₹219',
 '₹299',
 '₹1,123',
 '₹449',
 '₹170',
 '₹711',
 '₹199',
 '₹259',
 '₹227',
 '₹559',
 '₹199',
 '₹1,234',
 '₹499',
 '₹319',
 '₹209',
 '₹207',
 '₹759',
 '₹599',
 '₹499',
 '₹260',
 '₹854',
 '₹576',
 '₹499',
 '₹399',
 '₹395',
 '₹263',
 '₹759',
 '₹569',
 '₹758',
 '₹449',
 '₹395']

Now lets merge all the data

In [99]:
prod_desc.insert(27, 'UV Protection Glasses')

In [100]:
brand = brand_name + brand_name1 + brand_name2
product = prod_desc + prod_desc1 + prod_desc2
prices = price + price1 + price2

In [101]:


print(len(discount_),len(prices),len(product),len(brand))

120 120 120 120


37


In [105]:
flipkart=pd.DataFrame({})
flipkart['Brand Name'] = brand
flipkart['Product Desc'] = product
flipkart['Price'] = prices
flipkart['Discounts'] = discount_

flipkart

Unnamed: 0,Brand Name,Product Desc,Price,Discounts
0,SUNBEE,"UV Protection, Polarized, Mirrored Round Sungl...",₹210,85% off
1,HAMIW COLLECTION,UV Protection Round Sunglasses (53),₹395,77% off
2,Fastrack,UV Protection Rectangular Sunglasses (Free Size),₹570,28% off
3,ROZZETTA CRAFT,UV Protection Retro Square Sunglasses (Free Size),₹499,77% off
4,PIRASO,UV Protection Aviator Sunglasses (Free Size),₹349,78% off
...,...,...,...,...
115,Fastrack,UV Protection Wayfarer Sunglasses (56),₹759,15% off
116,ROYAL SON,UV Protection Retro Square Sunglasses (88),₹569,71% off
117,Fastrack,UV Protection Wayfarer Sunglasses (Free Size),₹758,15% off
118,ROZZETTA CRAFT,"UV Protection, Gradient Round Sunglasses (Free...",₹449,83% off


PROBLEM STATEMENT:
Q7: Scrape 100 reviews data from flipkart.com for iphone11 phone. You have to 
go the link: https://www.flipkart.com/apple-iphone-11-black-64-gb-includesearpods-poweradapter/p/itm0f37c2240b217?pid=MOBFKCTSVZAXUHGR&lid=LSTMOBFKC
TSVZAXUHGREPBFGI&marketplace.
When you will open the above link you will reach to the below shown webpage.
As shown in the above page you have to scrape the tick marked attributes.
These are 
1. Rating 
2. Review_summary 
3. Full review
You have to scrape this data for first 100 reviews.

In [112]:
#Lets now import all the required libraries
import selenium
import pandas as pd
from selenium import webdriver
import time

In [109]:
#Lets first connect to the web driver
driver = webdriver.Chrome('chromedriver.exe')

In [110]:
url = 'https://www.flipkart.com/apple-iphone-11-black-64-gb-includes-earpods-power-adapter/product-reviews/itm0f37c2240b217?pid=MOBFKCTSVZAXUHGR&lid=LSTMOBFKCTSVZAXUHGREPBFGI&marketplace=FLIPKART'
driver.get(url)

In [119]:
review = []
for i in range(0,10):
    for j in driver.find_elements_by_xpath('//p[@class="_2-N8zT"]'):
        review.append(j.text)
    driver.find_element_by_xpath('//nav[@class="yFHi8N"]').click()
    time.sleep(4)
    


In [121]:
review

['Super!',
 'Perfect product!',
 'Must buy!',
 'Great product',
 'Awesome',
 'Mind-blowing purchase',
 'Perfect product!',
 'Fabulous!',
 'Perfect product!',
 'Excellent',
 'Super!',
 'Perfect product!',
 'Must buy!',
 'Great product',
 'Awesome',
 'Mind-blowing purchase',
 'Perfect product!',
 'Fabulous!',
 'Perfect product!',
 'Excellent',
 'Super!',
 'Perfect product!',
 'Must buy!',
 'Great product',
 'Awesome',
 'Mind-blowing purchase',
 'Perfect product!',
 'Fabulous!',
 'Perfect product!',
 'Excellent',
 'Super!',
 'Perfect product!',
 'Must buy!',
 'Great product',
 'Awesome',
 'Mind-blowing purchase',
 'Perfect product!',
 'Fabulous!',
 'Perfect product!',
 'Excellent',
 'Super!',
 'Perfect product!',
 'Must buy!',
 'Great product',
 'Awesome',
 'Mind-blowing purchase',
 'Perfect product!',
 'Fabulous!',
 'Perfect product!',
 'Excellent',
 'Super!',
 'Perfect product!',
 'Must buy!',
 'Great product',
 'Awesome',
 'Mind-blowing purchase',
 'Perfect product!',
 'Fabulous!',
 'P

In [122]:
full_review = []
for i in range(0,10):
    for j in driver.find_elements_by_xpath('//div[@class="t-ZTKy"]'):
        full_review.append(j.text)
    driver.find_element_by_xpath('//nav[@class="yFHi8N"]').click()
    time.sleep(4)

In [125]:
full_review

['Did an upgrade from 6s plus to iphone 11.\nAo far the experience is well and good. Felt smoother than 6s plus. The camera quality is superb. Battery backup is descent. Not a heavy user, and gets power more than a day. Go for it if you need an alrounder iphone in a competitve price.',
 'Value for money\n5 star rating\nExcellent camera\nBattery backup full day in single charge.\n\nTougher and water resistant design, glossy back.\nThe screen has excellent brightness and contrast.\nApple A13 Bionic is the fastest smartphone chip on the planet.\nExcellent battery life, fast charging support.\nStereo speakers with great quality.',
 'Damn this phone is a blast . Upgraded from android to ios and is a duperb experience. Battery backup is top notch and display also pretty good',
 "Again back to apple iphone after a gap of 2-3 years. It's pleasure to use iOS and the quality product by Apple. Iphone 11 still works like a beast in 2021 also. It really capable of doing day to day usage as well as 

In [126]:
ratings = []
for i in range(0,10):
    for j in driver.find_elements_by_xpath('//div[@class="_3LWZlK _1BLPMq"]'):
        ratings.append(j.text)
    driver.find_element_by_xpath('//nav[@class="yFHi8N"]').click()
    time.sleep(4)

In [129]:
ratings

['5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5']

In [130]:
Reviews=pd.DataFrame({})
Reviews['Review'] =review
Reviews['Rating'] =ratings
Reviews['Full Review'] = full_review


Reviews

Unnamed: 0,Review,Rating,Full Review
0,Super!,5,Did an upgrade from 6s plus to iphone 11.\nAo ...
1,Perfect product!,5,Value for money\n5 star rating\nExcellent came...
2,Must buy!,5,Damn this phone is a blast . Upgraded from and...
3,Great product,5,Again back to apple iphone after a gap of 2-3 ...
4,Awesome,5,"Always love the apple products, upgraded from ..."
...,...,...,...
95,Mind-blowing purchase,5,awesome Phone Smooth Touch Too good Sexyy look...
96,Perfect product!,5,Best and amazing product.....phone looks so pr...
97,Fabulous!,5,I purchased the iPhone 11 a month back. I must...
98,Perfect product!,5,It is just awesome mobile for this price from ...


Q8: Scrape data for first 100 sneakers you find when you visit flipkart.com and 
search for “sneakers” in the search field.
You have to scrape 4 attributes of each sneaker :
1. Brand
3. Price
4. discount %
As shown in the below image, you have to scrape the tick marked attributes.
Also note that all the steps required during scraping should be done through code 
only and not manually.

In [131]:
#Lets now import all the required libraries
import selenium
import pandas as pd
from selenium import webdriver

In [132]:
#Lets first connect to the web driver
driver = webdriver.Chrome('chromedriver.exe')

In [133]:
url = 'https://www.flipkart.com/'
driver.get(url)

In [134]:
#finding elements for job search bar
search_bar = driver.find_element_by_xpath("//div[@class='_3OO5Xc']//input")
search_bar

<selenium.webdriver.remote.webelement.WebElement (session="ba7efe76b7031ec0efb2fec173a7ed29", element="f0b40ba7-2d1f-43f3-b91e-bda1078de74a")>

In [135]:
#write on search bar
search_bar.send_keys('Sneakers')

In [136]:
#do click using xpath function
search_btn = driver.find_element_by_xpath("//button[@class='L0Z3Pu']")
search_btn.click()

In [137]:
brand = []
for i in range(0,3):
    for j in driver.find_elements_by_xpath('//div[@class="_2WkVRV"]'):
        brand.append(j.text)
    driver.find_element_by_xpath('//nav[@class="yFHi8N"]').click()
    time.sleep(4)

In [141]:
brand

['World Wear Footwear',
 'luxury fashion',
 'Robbie jones',
 'HOTSTYLE',
 'ORICUM',
 'Robbie jones',
 'Shoes Bank',
 'VIPSJAZZY',
 'Numenzo',
 'Robbie jones',
 'Stefano Rads',
 'bluemaker',
 'BRUTON',
 'India hub',
 '"trend"',
 'aadi',
 'Echor',
 'VORII',
 'Kraasa',
 'DUCATI',
 'Birde',
 'Zsyto',
 '3SIX5',
 '"trend"',
 'Echor',
 'TR',
 'Skechers',
 'Baogi',
 'SCATCHITE',
 'Fzzirok',
 'SPARX',
 'Alfiyo',
 'T-ROCK',
 'Fzzirok',
 'Svpanther',
 'SPARX',
 'SPARX',
 'Magnolia',
 'restinfoot',
 'HARMEET',
 'PUMA',
 'Qtsy',
 'HOC',
 'DUCATI',
 'PUMA',
 'SPORTER',
 'PUMA',
 'Jack Diamond',
 'PUMA',
 'India hub',
 'SORT',
 'Bata',
 'Rising Wolf',
 'Robbie jones',
 'PUMA',
 'SPARX',
 'ADIDAS',
 'bacca bucci',
 'BRUTON',
 'Birde',
 'Magnolia',
 'SPARX',
 'Numenzo',
 'Rising Wolf',
 'ASTEROID',
 'D-SNEAKERZ',
 'SPARX',
 'Baogi',
 'Englewood',
 'Kavon',
 'PUMA',
 'ESSENCE',
 'SPARX',
 'K K',
 'ESSENCE',
 'ORICUM',
 'PUMA',
 'SPARX',
 'bacca bucci',
 'Robbie jones',
 'PUMA',
 'Qtsy',
 'HOC',
 'DUCATI

In [146]:
price = []
for i in range(0,3):
    for j in driver.find_elements_by_xpath('//div[@class="_30jeq3"]'):
        price.append(j.text)
    driver.find_element_by_xpath('//nav[@class="yFHi8N"]').click()
    time.sleep(4)

In [148]:
price

['₹399',
 '₹549',
 '₹379',
 '₹283',
 '₹377',
 '₹474',
 '₹349',
 '₹419',
 '₹399',
 '₹428',
 '₹229',
 '₹474',
 '₹499',
 '₹1,987',
 '₹426',
 '₹298',
 '₹569',
 '₹379',
 '₹416',
 '₹1,461',
 '₹757',
 '₹328',
 '₹214',
 '₹283',
 '₹569',
 '₹599',
 '₹2,364',
 '₹399',
 '₹398',
 '₹474',
 '₹769',
 '₹459',
 '₹378',
 '₹370',
 '₹515',
 '₹632',
 '₹699',
 '₹311',
 '₹379',
 '₹485',
 '₹1,644',
 '₹396',
 '₹398',
 '₹568',
 '₹2,009',
 '₹498',
 '₹1,329',
 '₹379',
 '₹1,491',
 '₹389',
 '₹599',
 '₹899',
 '₹485',
 '₹419',
 '₹2,285',
 '₹1,049',
 '₹1,799',
 '₹1,234',
 '₹341',
 '₹474',
 '₹424',
 '₹629',
 '₹599',
 '₹399',
 '₹474',
 '₹346',
 '₹649',
 '₹399',
 '₹499',
 '₹498',
 '₹3,185',
 '₹424',
 '₹1,000',
 '₹474',
 '₹424',
 '₹664',
 '₹3,919',
 '₹669',
 '₹949',
 '₹379',
 '₹1,644',
 '₹396',
 '₹398',
 '₹568',
 '₹2,009',
 '₹498',
 '₹1,329',
 '₹379',
 '₹1,491',
 '₹389',
 '₹599',
 '₹899',
 '₹485',
 '₹419',
 '₹2,285',
 '₹1,049',
 '₹1,799',
 '₹1,234',
 '₹341',
 '₹474',
 '₹424',
 '₹629',
 '₹599',
 '₹399',
 '₹474',
 '₹346',
 '

In [149]:
discount = []
for i in range(0,3):
    for j in driver.find_elements_by_xpath('//div[@class="_3Ay6Sb"]'):
        discount.append(j.text)
    driver.find_element_by_xpath('//nav[@class="yFHi8N"]').click()
    time.sleep(4)

In [151]:
discount

['53% off',
 '50% off',
 '69% off',
 '71% off',
 '42% off',
 '50% off',
 '62% off',
 '62% off',
 '57% off',
 '61% off',
 '75% off',
 '10% off',
 '75% off',
 '58% off',
 '67% off',
 '25% off',
 '55% off',
 '58% off',
 '85% off',
 '52% off',
 '57% off',
 '16% off',
 '40% off',
 '60% off',
 '52% off',
 '65% off',
 '7% off',
 '60% off',
 '73% off',
 '50% off',
 '42% off',
 '57% off',
 '23% off',
 '52% off',
 '57% off',
 '55% off',
 '44% off',
 '10% off',
 '62% off',
 '62% off',
 '53% off',
 '50% off',
 '69% off',
 '71% off',
 '42% off',
 '50% off',
 '62% off',
 '62% off',
 '57% off',
 '61% off',
 '75% off',
 '10% off',
 '75% off',
 '58% off',
 '67% off',
 '25% off',
 '55% off',
 '58% off',
 '85% off',
 '52% off',
 '57% off',
 '16% off',
 '40% off',
 '60% off',
 '52% off',
 '65% off',
 '7% off',
 '60% off',
 '73% off',
 '50% off',
 '42% off',
 '57% off',
 '23% off',
 '52% off',
 '57% off',
 '55% off',
 '44% off',
 '10% off',
 '62% off',
 '62% off',
 '53% off',
 '50% off',
 '69% off',
 '71% 

In [155]:
sneakers=pd.DataFrame({})
sneakers['Brand'] = brand
sneakers['Price'] = price
sneakers['Discount'] = discount

sneakers

Unnamed: 0,Brand,Price,Discount
0,World Wear Footwear,₹399,53% off
1,luxury fashion,₹549,50% off
2,Robbie jones,₹379,69% off
3,HOTSTYLE,₹283,71% off
4,ORICUM,₹377,42% off
...,...,...,...
115,ORICUM,₹664,55% off
116,PUMA,"₹3,919",44% off
117,SPARX,₹669,10% off
118,bacca bucci,₹949,62% off


Q9: Go to the link - https://www.myntra.com/shoes
Set Price filter to “Rs. 6649 to Rs. 13099” , Color filter to “Black”, as shown in 
the below image.
And then scrape First 100 shoes data you get. The data should include “Brand” of 
the shoes , Short Shoe description, price of the shoe as shown in the below image.
Please note that applying the filter and scraping the data , everything should be 
done through code only and there should not be any manual step.

In [20]:
#Lets now import all the required libraries
import selenium
import pandas as pd
from selenium import webdriver
import time

In [2]:
#Lets first connect to the web driver
driver = webdriver.Chrome('chromedriver.exe')

In [4]:
url = 'https://www.myntra.com/shoes'
driver.get(url)

In [None]:
#do click using xpath function for price
chk_box1 = driver.find_element_by_xpath("//div[@class='price-input']")
chk_box1.click()

In [18]:
#do click using xpath function for colour
chk_box = driver.find_element_by_xpath("//span[@class='colour-label colour-colorDisplay']")
chk_box.click()

So now lets 3 create empty lists. In these lists the data will be stored while scraping.

In [21]:
brand_name = []
for i in range(0,2):
    for j in driver.find_elements_by_xpath('//h3[@class="product-brand"]'):
        brand_name.append(j.text)
    driver.find_element_by_xpath('//li[@class="pagination-active"]').click()
    time.sleep(4)

In [24]:
brand_name

['Nike',
 'Nike',
 'ALDO',
 'Nike',
 'Nike',
 'ALDO',
 'Nike',
 'Nike',
 'Nike',
 'Nike',
 'UNDER ARMOUR',
 'UNDER ARMOUR',
 'UNDER ARMOUR',
 'Puma',
 'UNDER ARMOUR',
 'ASICS',
 'Puma',
 'Hush Puppies',
 'UNDER ARMOUR',
 'Puma',
 'Saint G',
 'UNDER ARMOUR',
 'Puma',
 'UNDER ARMOUR',
 'UNDER ARMOUR',
 'Onitsuka Tiger',
 'FILA',
 'UNDER ARMOUR',
 'Puma',
 'Puma',
 'Nike',
 'ASICS',
 'Puma',
 'Puma',
 'Puma',
 'UNDER ARMOUR',
 'UNDER ARMOUR',
 'Skechers',
 'Cole Haan',
 'ALDO',
 'Hush Puppies',
 'Ruosh',
 'Heel & Buckle London',
 'UNDER ARMOUR',
 'UNDER ARMOUR',
 'UNDER ARMOUR',
 'Bugatti',
 'Ruosh',
 'Saint G',
 'Reebok',
 'Nike',
 'Nike',
 'ALDO',
 'Nike',
 'Nike',
 'ALDO',
 'Nike',
 'Nike',
 'Nike',
 'Nike',
 'UNDER ARMOUR',
 'UNDER ARMOUR',
 'UNDER ARMOUR',
 'Puma',
 'UNDER ARMOUR',
 'ASICS',
 'Puma',
 'Hush Puppies',
 'UNDER ARMOUR',
 'Puma',
 'Saint G',
 'UNDER ARMOUR',
 'Puma',
 'UNDER ARMOUR',
 'UNDER ARMOUR',
 'Onitsuka Tiger',
 'FILA',
 'UNDER ARMOUR',
 'Puma',
 'Puma',
 'Nike',

In [25]:
prod_desc = []
for i in range(0,2):
    for j in driver.find_elements_by_xpath('//h4[@class="product-product"]'):
        prod_desc.append(j.text)
    driver.find_element_by_xpath('//li[@class="pagination-active"]').click()
    time.sleep(4)

In [27]:
prod_desc

['AIR ZOOM PEGASUS Running Shoes',
 'Men KD13 EP Basketball Shoes',
 'Men Sneakers',
 'Men JORDAN DELTA Basketball',
 'Men REACT MILER Running Shoes',
 'Men Textured Sneakers',
 'Women PEGASUS 37 Running Shoes',
 'Men AIR ZOOM Running Shoes',
 'Men JOYRIDE Running Shoes',
 'Women REACT Running Shoes',
 'Men HOVR Strt Walking Shoes',
 'Men Liquify Running Shoes',
 'HOVR Sonic 3 Running Shoes',
 'Men SPEED 500 2 Running Shoes',
 'Charged Rogue 2 Wide 2E Shoes',
 'Men Black Sports Shoes',
 'Women Eternity Nitro Running',
 'Men Solid Leather Formal Slip-Ons',
 'Charged Impulse Running Shoes',
 'Men UltraRide Running Shoes',
 'Men Textured Leather Formal Loafers',
 'HOVR Infinite 2 Running Shoes',
 'Men Velocity Nitro Running',
 'Men HOVR Guardian Shoes',
 'Women Liquify Rebel Running',
 'Unisex Mexico 66 Paraty Sneakers',
 'Men Running Shoes',
 'Women Charged Breathe TR 2',
 'Women Provoke XT Training',
 'Women Liberate Nitro Running',
 'Men JORDAN DELTA Sneakers',
 'Men Running Shoes',
 '

In [28]:
price = []
for i in range(0,2):
    for j in driver.find_elements_by_xpath('//div[@class="product-price"]'):
        price.append(j.text)
    driver.find_element_by_xpath('//li[@class="pagination-active"]').click()
    time.sleep(4)

In [30]:
price

['Rs. 11495',
 'Rs. 12995',
 'Rs. 9999',
 'Rs. 12495',
 'Rs. 8796Rs. 10995(20% OFF)',
 'Rs. 7199Rs. 8999(20% OFF)',
 'Rs. 7496Rs. 9995(25% OFF)',
 'Rs. 7721Rs. 10295(25% OFF)',
 'Rs. 11246Rs. 14995(25% OFF)',
 'Rs. 8396Rs. 11995(30% OFF)',
 'Rs. 9999',
 'Rs. 10999',
 'Rs. 10999',
 'Rs. 6999Rs. 9999(30% OFF)',
 'Rs. 7999',
 'Rs. 6999Rs. 9999(30% OFF)',
 'Rs. 12999',
 'Rs. 8999',
 'Rs. 7999',
 'Rs. 7649Rs. 8999(15% OFF)',
 'Rs. 9975Rs. 10500(5% OFF)',
 'Rs. 11999',
 'Rs. 10999',
 'Rs. 10199Rs. 11999(15% OFF)',
 'Rs. 8999',
 'Rs. 6999',
 'Rs. 8499',
 'Rs. 7999',
 'Rs. 7999',
 'Rs. 7999Rs. 9999(20% OFF)',
 'Rs. 10995',
 'Rs. 6999Rs. 9999(30% OFF)',
 'Rs. 6999',
 'Rs. 7199Rs. 7999(10% OFF)',
 'Rs. 8499Rs. 9999(15% OFF)',
 'Rs. 7224Rs. 8499(15% OFF)',
 'Rs. 8999',
 'Rs. 6999',
 'Rs. 6749Rs. 14999(55% OFF)',
 'Rs. 8999',
 'Rs. 9999',
 'Rs. 8990',
 'Rs. 7693Rs. 10990(30% OFF)',
 'Rs. 9999',
 'Rs. 11999',
 'Rs. 10999',
 'Rs. 7499',
 'Rs. 6990',
 'Rs. 11305Rs. 11900(5% OFF)',
 'Rs. 9999',
 'Rs. 

In [31]:
myntra=pd.DataFrame({})
myntra['Brand'] = brand_name
myntra['Product Desc'] = prod_desc
myntra['Price'] = price

myntra

Unnamed: 0,Brand,Product Desc,Price
0,Nike,AIR ZOOM PEGASUS Running Shoes,Rs. 11495
1,Nike,Men KD13 EP Basketball Shoes,Rs. 12995
2,ALDO,Men Sneakers,Rs. 9999
3,Nike,Men JORDAN DELTA Basketball,Rs. 12495
4,Nike,Men REACT MILER Running Shoes,Rs. 8796Rs. 10995(20% OFF)
...,...,...,...
95,UNDER ARMOUR,HOVR Sonic 3 Running Shoes,Rs. 10999
96,Bugatti,Women Mules,Rs. 7499
97,Ruosh,Men Formal Brogues,Rs. 6990
98,Saint G,Men Leather Chelsea Boots,Rs. 11305Rs. 11900(5% OFF)


Q10: Go to webpage https://www.amazon.in/
 Enter “Laptop” in the search field and then click the search icon.
 Then set CPU Type filter to “Intel Core i7” and “Intel Core i9” as shown in the 
below image:
After setting the filters scrape first 10 laptops data. You have to scrape 3 attributes 
for each laptop:
1. title
2. Ratings
3. Price
As shown in the below image as the tick marked attributes.


In [89]:
#Lets now import all the required libraries
import selenium
import pandas as pd
from selenium import webdriver

In [90]:
#Lets first connect to the web driver
driver = webdriver.Chrome('chromedriver.exe')

In [91]:
url = 'https://www.amazon.in/'
driver.get(url)

In [92]:

#finding elements for job search bar
search_bar = driver.find_element_by_id('twotabsearchtextbox')
search_bar

<selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="0aec5820-7d78-41f1-9c0e-d1ad28a9bfba")>

In [93]:
#write on search bar
search_bar.send_keys('Laptop')

In [94]:
#do click using xpath function
search_btn = driver.find_element_by_id('nav-search-submit-button')
search_btn.click()

In [95]:
#locating the i7 filter
filter_button = driver.find_elements_by_xpath('//a[@class="a-link-normal s-navigation-item"]/span')
for i in filter_button:
    if i.text == 'Intel Core i7':
        i.click()
        break

In [96]:
#So lets extract all the tags having the job titles
titles_tags = driver.find_elements_by_xpath("//span[@class='a-size-medium a-color-base a-text-normal']")
titles_tags

[<selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="ffc54edd-112d-4785-8489-794e4044335d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="0962bf9b-85be-466b-b171-6ab6773234be")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="200d8e9b-73be-4dc5-b0d8-ff246471aa5f")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="3a0c02ba-6df2-4b74-8bf9-9eeac2e9388e")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="9153ef69-a82d-429b-873f-ce13fd272eec")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="cf56242e-1768-4892-a514-6e70d1f3e049")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="db313c58-8d22-4b6c-9daa-9d

In [97]:
#Loop to iterate over the tags extracted above and extract the text inside them.
title=[]
for i in titles_tags:
    title.append(i.text)
title

['(Renewed) Dell Latitude E6420 14 Inch Laptop (Core I7 2460M/4GB/320GB/Nvidia Dedicated Graphics/Windows Professional/MS Office), Dark Grey',
 'Mi Notebook Horizon Edition 14 Intel Core i5-10210U 10th Gen 14-inch (35.56 cms) Thin and Light Laptop(8GB/512GB SSD/Windows 10/Nvidia MX350 2GB Graphics/Grey/1.35Kg), XMA1904-AR+Webcam',
 "(Renewed) HP EliteBook 840 G3 Laptop (Core i7 6th Gen/8GB/500GB/WEBCAM/14'' Touch/DOS)",
 'Lenovo Legion 5Pi 10th Gen Intel Core i7 15.6" FHD Gaming Laptop (16GB/1TB SSD/Windows 10/MS Office 2019/144 Hz/NVIDIA RTX 2060 6GB GDDR6/with M300 RGB Gaming Mouse/Iron Grey/2.3Kg), 82AW005SIN',
 'Asus ROG Zephyrus S Ultra Slim Gaming Laptop, 15.6" 144Hz IPS Type FHD, GeForce RTX 2070, Intel Core i7-8750H, 16GB DDR4, 512GB PCIe NVMe SSD, Aura Sync RGB, Windows 10, GX531GW-AS76',
 'HP Pavilion (2021) Thin & Light 11th Gen Core i7 Laptop, 16 GB RAM, 1TB SSD, Iris Xe Graphics, 14" (35.56cms) FHD Screen, Windows 10, MS Office, Backlit Keyboard (14-dv0058TU)',
 'Lenovo Yo

In [98]:
#So lets extract all the tags having the ratings
rating_tags = driver.find_elements_by_xpath("//span[@class='a-icon-alt']")
rating_tags

[<selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="99dd331a-8e45-4cd8-8658-961da8426617")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="cd1dac52-3255-454a-8ca4-ed92e9d15865")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="932b4558-fc4e-4aa7-8c8a-61f43eb17c92")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="92cd2918-6956-4008-98e9-3e9cd0aaf31c")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="d83a9b44-ae08-4391-aa10-62128bb281a6")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="7bf700e8-b66b-41ca-aef9-e0ba77718b07")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="ff984827-e912-4232-a54b-8c

In [99]:
#Loop to iterate over the tags extracted above and extract the text inside them.
rating=[]
for i in rating_tags:
    rating.append(i.text)
rating

['',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '']

In [100]:
#So lets extract all the tags having the price
price_tags = driver.find_elements_by_xpath("//span[@class='a-price-whole']")
price_tags

[<selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="aacd8016-629f-4619-bd8b-ef10f8aa7079")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="cba525f3-7016-4b18-b679-8cdda6569bfc")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="e7db0316-25a9-459f-b2b4-3918a295a5fc")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="9fb2680a-1d20-4376-a29e-f5400e2ebbc2")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="77eef36a-6d68-45d2-8bf7-953b5f50a0eb")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="03b27e08-38ef-416e-897d-f60cc453a0a1")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="ba02c64a-53ea-401a-ab83-69

In [101]:
#Loop to iterate over the tags extracted above and extract the text inside them.
price=[]
for i in price_tags:
    price.append(i.text)
price

['35,000',
 '49,999',
 '44,999',
 '3,43,099',
 '79,990',
 '97,990',
 '84,990',
 '46,290',
 '83,990',
 '69,990',
 '2,01,524',
 '80,990',
 '1,44,643',
 '1,64,990',
 '39,999',
 '2,16,327',
 '47,050',
 '34,999',
 '78,493',
 '43,000',
 '29,999']

In [102]:
#locating the i7 filter
filter_button = driver.find_elements_by_xpath('//a[@class="a-link-normal s-navigation-item"]/span')
for i in filter_button:
    if i.text == 'Intel Core i9':
        i.click()
        break

In [103]:
#So lets extract all the tags having the job titles
titles_tags1 = driver.find_elements_by_xpath("//span[@class='a-size-medium a-color-base a-text-normal']")
titles_tags1

[<selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="b0ea429b-fe00-42f1-b4c4-4d6cee0bfe23")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="b1dcf8d7-90b9-40f2-8477-36c4dbf83d4d")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="6070c119-9d60-4247-973a-eb14192ea69f")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="0cae2860-81ae-4c1d-a53a-edf5c3376a9e")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="80a91b74-18ec-49ba-8405-a52089f77346")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="1c30c28a-9d4d-4736-b6d2-eedb6b3a0283")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="fc3da9ce-4117-47f9-b13b-15

In [104]:
#Loop to iterate over the tags extracted above and extract the text inside them.
title1=[]
for i in titles_tags1:
    title1.append(i.text)
title1

['Dell XPS 9570 15.6" (39.62cms) UHD Laptop (8th Gen i9-8950HK/32GB/1TB SSD/Win 10 + MS Office/Integrated Graphics), Silver',
 'ASUS ZenBook Pro Duo Intel Core i9-10980HK 10th Gen 15.6" 4K UHD OLED Touchscreen Laptop (32GB RAM/1TB NVMe SSD/Windows 10/6GB NVIDIA GeForce RTX 2060 Graphics/Celestial Blue/2.5 Kg), UX581LV-H2035T',
 '(Renewed) Dell G Series G7  7588 15.6-inch FHD Laptop (8th gen Core i9-8950HK/16GB/1TB + 128GB SSD/Windows 10/MS Office/6 GB Nvidia GeForce GTX 1060 Graphics)',
 'Lenovo Legion 7 10th Gen Intel Core i9 15.6 inch Full HD Gaming Laptop (16GB/1TB SSD/Windows 10/MS Office 2019/144 Hz/NVIDIA RTX 2080 8GB GDDR6 Graphics/Slate Grey/2.25Kg), 81YU006HIN',
 'ASUS ROG Zephyrus Duo 15, 15.6" FHD 300Hz/3ms, Intel Core i9-10980HK 10th Gen, RTX 2080 SUPER Max-Q 8GB Graphic, Gaming Laptop (32GB/2TB RAID 0 SSD/Office 2019/Windows 10/Gray/2.4 Kg) GX550LXS-HF168TS',
 'Dell Alienware m15(R3) 15.6" (39.62cms) UHD Gaming Laptop (10th Gen Core i9-10980HK/32GB/1TB SSD/Windows 10 Home 

In [105]:
#So lets extract all the tags having the ratings
rating_tags1 = driver.find_elements_by_xpath("//span[@class='a-icon-alt']")
rating_tags1

[<selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="cdda51e9-edeb-4a38-bd1a-ac5b463d8d99")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="4f46a5fa-bc7a-4e31-90a5-376393dc19ae")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="4bee0121-0894-4ab2-a60d-50c8e02d794e")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="ddf0c54b-8f83-48fb-9b9e-c09c9649dd43")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="ec9795af-7302-4494-8ff4-30f461121581")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="4f0ef427-886d-4f51-aa32-dfe689dd3c6c")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="d42af50b-b12b-4cd8-b46f-4c

In [106]:
#Loop to iterate over the tags extracted above and extract the text inside them.
rating1=[]
for i in rating_tags1:
    rating1.append(i.text)
rating1

['', '', '', '', '', '', '', '', '', '', '', '']

In [107]:
#So lets extract all the tags having the price
price_tags1 = driver.find_elements_by_xpath("//span[@class='a-price-whole']")
price_tags1

[<selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="402c11a1-1c12-46da-bd0e-14f2d4fee510")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="689ba773-8a4f-470e-ba48-c9a35e9049e7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="30ea4722-1ee6-43ed-8eba-2957c2cd8f22")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="f188ea4f-8243-45b6-a4fc-d6e892e41884")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="0ac31773-08c1-4cff-8521-4b13de65bd55")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="651290f3-0c04-41b4-9825-e56986805091")>,
 <selenium.webdriver.remote.webelement.WebElement (session="088ef5344dc7cd5f7f3c00a07c9e459f", element="e53cbbfd-53b5-49cc-ad16-e2

In [108]:
#Loop to iterate over the tags extracted above and extract the text inside them.
price1=[]
for i in price_tags1:
    price1.append(i.text)
price1

['2,48,790',
 '2,99,999',
 '1,22,000',
 '2,62,990',
 '2,66,990',
 '3,19,990',
 '2,77,490',
 '1,89,900',
 '2,00,690',
 '2,15,990',
 '2,69,900']

In [109]:
prices = price + price1
titles = title + title1

In [114]:
del titles[-4]

In [116]:
del titles[-3]

In [117]:
del titles[-2]

In [118]:
del titles[-1]

In [119]:
print(len(titles))

32


In [121]:

amazon=pd.DataFrame({})
amazon['title'] = titles
amazon['price'] = prices

amazon

Unnamed: 0,title,price
0,(Renewed) Dell Latitude E6420 14 Inch Laptop (...,35000
1,Mi Notebook Horizon Edition 14 Intel Core i5-1...,49999
2,(Renewed) HP EliteBook 840 G3 Laptop (Core i7 ...,44999
3,"Lenovo Legion 5Pi 10th Gen Intel Core i7 15.6""...",343099
4,"Asus ROG Zephyrus S Ultra Slim Gaming Laptop, ...",79990
5,HP Pavilion (2021) Thin & Light 11th Gen Core ...,97990
6,"Lenovo Yoga Slim 7i 11th Gen Intel Core i7 14""...",84990
7,HP Pavilion Gaming 10th Gen Intel Core i7 Proc...,46290
8,(Renewed) Lenovo Intel 4th Gen Core i7-4980HQ ...,83990
9,Lenovo IdeaPad Flex 5 11th Gen Intel Core i7 1...,69990
