## Resources used
1. https://stackoverflow.com/questions/51046454/how-can-we-use-selenium-webdriver-in-colab-research-google-com
2. https://www.browserstack.com/guide/python-selenium-to-run-web-automation-test
3. https://dev.to/razgandeanu/selenium-cheat-sheet-9lc
4. https://medium.com/analytics-vidhya/web-scraping-google-search-results-with-selenium-and-beautifulsoup-4c534817ad88

## Installing drivers to use selenium on colab

If you are running on a local system then i've mentioned the relevant steps in the code block of each cell. 
Only the setup varies, rest is same!!


In [1]:
# pip install selenium==3.141.0
!pip install selenium

Collecting selenium
[?25l  Downloading https://files.pythonhosted.org/packages/80/d6/4294f0b4bce4de0abf13e17190289f9d0613b0a44e5dd6a7f5ca98459853/selenium-3.141.0-py2.py3-none-any.whl (904kB)
[K     |████████████████████████████████| 911kB 6.4MB/s 
Installing collected packages: selenium
Successfully installed selenium-3.141.0


In [7]:
# Manually install the chrome driver from https://sites.google.com/a/chromium.org/chromedriver/downloads.
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin

0% [Working]            Hit:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
0% [Connecting to archive.ubuntu.com (91.189.88.152)] [Connecting to security.u0% [1 InRelease gpgv 3,626 B] [Connecting to archive.ubuntu.com (91.189.88.152)                                                                               Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
0% [1 InRelease gpgv 3,626 B] [Connecting to archive.ubuntu.com (91.189.88.152)                                                                               Ign:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Hit:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Hit:6 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
Hit:7 http://security.ubuntu.co

In [8]:
# remains same
# In general, Selenium opens a virtual browser and executes all the commands we pass through our script
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

In [9]:
# creating a chrome instance. 
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options) # repalce the first argument with the path of your driver

  


### Searching and extracting relevant information

In [10]:
# accessing the webpage
driver.get("https://www.google.co.in")

In [None]:
# checking the current url
print(driver.current_url)

https://www.google.co.in/


### Finding the element
Ways to find things from a webpage using selenium
* find_elements_by_name
* find_elements_by_xpath
* find_elements_by_link_text
* find_elements_by_partial_link_text
* find_elements_by_tag_name
* find_elements_by_class_name
* find_elements_by_css_selector

In [None]:
print(driver.title)

Google


In [None]:
# accessing the search bar
search_bar = driver.find_element_by_name("q")

In [None]:
# searching something through the search bar
search_bar.clear()
search_bar.send_keys("deep learning")
search_bar.send_keys(Keys.RETURN)

In [None]:
# the_link_text = "News"
# news = driver.find_element_by_link_text(the_link_text)
# news.click()  # to redirect to this page

In [None]:
# searching links for news and videos 
tags_list = ["News", "Videos"]
tags_link = []


for i in tags_list:
  tags_link.append(driver.find_element_by_link_text(i).get_attribute('href'))


# printing the link
for i in tags_link:
  print(i)

https://www.google.co.in/search?q=deep+learning&source=lnms&tbm=nws&sa=X&ved=2ahUKEwiClLrjxvvvAhXlGDQIHexmBlQQ_AUoAXoECAEQAw
https://www.google.co.in/search?q=deep+learning&source=lnms&tbm=vid&sa=X&ved=2ahUKEwiClLrjxvvvAhXlGDQIHexmBlQQ_AUoA3oECAEQBQ


### Extracting links from videos search

In [None]:
# extracting all the links from the news and vidoes pages using BeautifulSoup
from bs4 import BeautifulSoup
n_pages = 2
results = []

# creating a list of list for storing all the links from videos tag
for page in range(1, n_pages):
    url = "https://www.google.co.in/search?q=deep+learning&source=lnms&tbm=vid&sa=X&ved=2ahUKEwiClLrjxvvvAhXlGDQIHexmBlQQ_AUoA3oECAEQBQ" +  str((page - 1) * 10)

    driver.get(url)
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    search = soup.find_all('div', class_="yuRUbf" )
    for h in search:
        results[i].append(h.a.get('href'))

In [15]:
print(results)
print(len(results))

['https://techxplore.com/news/2021-04-deeponet-deep-neural-network-based-approximate.html', 'https://www.vfxvoice.com/ai-machine-and-deep-learning-filling-todays-need-for-speed-and-iteration/', 'https://www.efinancialcareers.com/news/2021/04/morgan-stanley-machine-learning', 'https://betakit.com/deeplite-raises-7-5-million-cad-seed-round-to-optimize-deep-neural-networks/', 'https://www.marktechpost.com/2021/04/10/computer-scientists-from-rice-university-display-cpu-algorithm-that-trains-deep-neural-networks-15-times-faster-than-gpu/', 'https://techxplore.com/news/2021-04-rice-intel-optimize-ai-commodity.html', 'https://www.slashgear.com/3d-printed-all-optical-diffractive-deep-neural-network-created-at-ucla-06667284/', 'https://bdtechtalks.com/2021/03/15/machine-learning-causality/', 'https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/deep-learning-isnt-deep-enough-unless-it-copies-from-the-brain', 'https://techxplore.com/news/2021-04-deep-networks-human-voicej

### Extracting link from news search


In [12]:
# extracting all the links from the news and vidoes pages using BeautifulSoup
from bs4 import BeautifulSoup
n_pages = 2
results = []

# creating a list of list for storing all the links from news and videos tag
for page in range(1, n_pages):
    # url = "http://www.google.com/search?q=" + query + "&start=" +   str((page - 1) * 10)
    url = "https://www.google.co.in/search?q=deep+learning&source=lnms&tbm=nws&sa=X&ved=2ahUKEwiClLrjxvvvAhXlGDQIHexmBlQQ_AUoAXoECAEQAw" +  str((page - 1) * 10)
    driver.get(url)

    soup = BeautifulSoup(driver.page_source, 'html.parser')
    search = soup.find_all('div', class_="dbsr" )
    for h in search:
        results.append(h.a.get('href'))

In [14]:
print(results)
print(len(results))

10


# Combining the above code into a function

In [18]:
# imports
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

In [16]:
# creating a chrome instance. 
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options) # repalce the first argument with the path of your driver

  


In [22]:
# function for getting links from the specified category
def link_from_category(category_link, category, n_pages):
  class_from_categ = {"News":"dbsr", "Videos":"yuRUbf"} #class tag for categories
  class_tag = ""
  class_tag = class_from_categ[category]

  results = [] # list for storing all the links


  for page in range(1, n_pages):
    url = category_link +  str((page - 1) * 10) 
    driver.get(url)

    soup = BeautifulSoup(driver.page_source, 'html.parser')
    search = soup.find_all('div', class_=class_tag )
    for h in search:
        results.append(h.a.get('href'))

    
  return results

In [32]:
# sanity check

temp_link = "https://www.google.co.in/search?q=deep+learning&source=lnms&tbm=nws&sa=X&ved=2ahUKEwiClLrjxvvvAhXlGDQIHexmBlQQ_AUoAXoECAEQAw" # link for news articles
# temp_link = "https://www.google.co.in/search?q=deep+learning&source=lnms&tbm=vid&sa=X&ved=2ahUKEwiClLrjxvvvAhXlGDQIHexmBlQQ_AUoA3oECAEQBQ" # link for videos

temp_result = link_from_category(temp_link, "News",2)
print(temp_result)
print(len(temp_result))

del temp_link, temp_result

['https://techxplore.com/news/2021-04-deeponet-deep-neural-network-based-approximate.html', 'https://www.vfxvoice.com/ai-machine-and-deep-learning-filling-todays-need-for-speed-and-iteration/', 'https://www.efinancialcareers.com/news/2021/04/morgan-stanley-machine-learning', 'https://betakit.com/deeplite-raises-7-5-million-cad-seed-round-to-optimize-deep-neural-networks/', 'https://www.marktechpost.com/2021/04/10/computer-scientists-from-rice-university-display-cpu-algorithm-that-trains-deep-neural-networks-15-times-faster-than-gpu/', 'https://techxplore.com/news/2021-04-rice-intel-optimize-ai-commodity.html', 'https://www.slashgear.com/3d-printed-all-optical-diffractive-deep-neural-network-created-at-ucla-06667284/', 'https://bdtechtalks.com/2021/03/15/machine-learning-causality/', 'https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/deep-learning-isnt-deep-enough-unless-it-copies-from-the-brain', 'https://techxplore.com/news/2021-04-deep-networks-human-voicej

In [33]:
# function for retreiving all the news and videos links for a specified search. It will return a list of list for the specified query.
# 0th index will contain links for all news articles and 1st index will contain links for all videos
def links_for_search(query, n_pages=10):

  # redirecting to google.co.in
  driver.get("https://www.google.co.in")

  # accessing the search bar and searching the specified query
  search_bar = driver.find_element_by_name("q")
  search_bar.clear()
  search_bar.send_keys(query)
  search_bar.send_keys(Keys.RETURN)

  # fetching the news and videos links for the specified query
  category_list = ["News", "Videos"]
  category_link = []
  for i in category_list:
    category_link.append(driver.find_element_by_link_text(i).get_attribute('href'))


  # list for storing all the links
  result_links = []

  # fetching all the links for news articles
  result_links.append(link_from_category(category_link[0], "News",n_pages))

  # fetching all the links for videos
  result_links.append(link_from_category(category_link[1], "Videos",n_pages))

  return result_links

In [34]:
links = links_for_search("deep learning",10) # links_for_search(your_query, no_of_pages)

In [35]:
print(links[0])
print(len(links[0]))
print(links[1])
print(len(links[1]))
print(len(links))

['https://techxplore.com/news/2021-04-deeponet-deep-neural-network-based-approximate.html', 'https://www.vfxvoice.com/ai-machine-and-deep-learning-filling-todays-need-for-speed-and-iteration/', 'https://www.efinancialcareers.com/news/2021/04/morgan-stanley-machine-learning', 'https://betakit.com/deeplite-raises-7-5-million-cad-seed-round-to-optimize-deep-neural-networks/', 'https://www.marktechpost.com/2021/04/10/computer-scientists-from-rice-university-display-cpu-algorithm-that-trains-deep-neural-networks-15-times-faster-than-gpu/', 'https://techxplore.com/news/2021-04-rice-intel-optimize-ai-commodity.html', 'https://www.slashgear.com/3d-printed-all-optical-diffractive-deep-neural-network-created-at-ucla-06667284/', 'https://bdtechtalks.com/2021/03/15/machine-learning-causality/', 'https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/deep-learning-isnt-deep-enough-unless-it-copies-from-the-brain', 'https://techxplore.com/news/2021-04-deep-networks-human-voicej