# Working with Selenium
---
**Author**: Marko Bajec

**Last update**: 5.3.2019

**Description**: the library <code>Selenium</code> 

This notebook shows few examples of using <code>Selenium</code> for starting <code>Chrome</code> or <code>Firefox</code> in a *headless mode* and then retreiving sources of web pages as they would be rendered in a browser. 

**Official web page:** https://selenium-python.readthedocs.io. 

---
### Using Chrome to retreive and navigate through a web page
In this example Chrome is started (in headless mode). Then, the content of **UL FRI** web page (http://fri.uni-lj.si/) is fetched and checked for last news. If there are any, their titles are printed out.

In [None]:
import os  
from selenium import webdriver  
from selenium.webdriver.common.keys import Keys  
from selenium.webdriver.chrome.options import Options  

options = Options()  
options.add_argument("--headless")  
options.binary_location = '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'

driver = webdriver.Chrome(executable_path=os.path.abspath('chromedriver'), options=options)  
driver.get("http://fri.uni-lj.si")  

for n in driver.find_elements_by_class_name('news-container-title'):
    if len(n.text)>0:
        print(n.text)
 
driver.close() 

### Example with Firefox
#### Check Python webpage
Here we start Firefox with GUI, we jump tu www.python.org and check if the page has opened (with assertion <code>Python in driver.title</code>). If all ok, we find element by name <code>q<code> which represents a **search field** in the source page. We enter "pycon" and observe if resulting page is not empty (with asserion, <code>"No results found." not in driver.page_source</code>. 

In [None]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
#print(driver.page_source)
driver.close()
print('all ok')

#### Extract new movies from YiFi
In this example we use <code>Selenium</code> library to access web content on [YiFi Movies](https://yts.am/) web page. We check for new movies and print out titles of movies with ratings higher than 7.0.

**Disclaimer**: note that the legality of the Yifi webpage is not of our concern.

In [None]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options

options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)

try:
    driver.get("https://yts.am/")
    
    #get all parts of HTML that contain information on titles, genres, years... 
    elems = driver.find_elements_by_xpath('//div[contains(@class, "browse-movie-wrap")]')

    for elem in elems:
        #some parts doesn't have ratings - check and continue only if there is a rating 
        ratings = elem.find_elements_by_class_name('rating')
        if len(ratings)==1:
            #capture the rating and if it is 7 or more, print the movie title, year and rating 
            #rating comes in the following form: "X / 10" where X is the rating. 
            rating = float(elem.find_element_by_class_name('rating').get_attribute('innerText').split('/')[0])
            if rating >= 7.0:
                print(elem.find_element_by_class_name('browse-movie-title').text)
                print(elem.find_element_by_class_name('browse-movie-year').text)
                print(rating)

    driver.quit()
    
except:
    driver.quit()
    raise

#### Check if movie available
In this example we use <code>Selenium</code> library to check if a certain movie is available in **YiFi database**. If exists, its synopsis will be printed out, otherwise a messege "MOVIE NOT YET AVAILABLE" is printed.

In [None]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)

try:
    driver.get("https://yts.am/")
    
    #get search input element and enter "Green book"
    searchinput = driver.find_element_by_id('quick-search-input')
    searchinput.clear()
    searchinput.send_keys("Green Book")    
    
    #when text is entered in the search field, the movie title appears as a hover over the search field
    #if the movie is found in the database. We have to wait until this happens. If it doesn't it means
    #the movie is not available
    element = WebDriverWait(driver, 2).until(
        EC.presence_of_element_located((By.XPATH, './/li[contains(@class, "ac-item-hover")]')))
    
    #when hover appears, we click on it which opens new page with details on the Green Book movie
    #we print out the movie Synopsis
    
    element.find_element_by_tag_name('a').click()
    print('MOVIE FOUND!')
    print(driver.find_element_by_id('synopsis').text)

    driver.quit()
    
except TimeoutException:
    driver.quit()
    print('MOVIE NOT YET AVAILABLE!')
    pass
except:
    driver.quit()
    raise