## Using Python Selenium to Automate Tasks

### First day: Selenium by Example

Selenium is a great tool to write functional/acceptance tests and automation scripts that require interaction with a webpage. 

In this lesson we get Selenium running and look at two use cases. Then we have you code 1 or 2 scripts using Selenium.

To follow along you need have Selenium installed and a webdriver:

1. `pip install selenium` (if you installed the requirements.txt in my setup video in the appendix you should already have it)
2. I used _PhantomJS_ before but now I got this error: `Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead`. So I downloaded the [ChromeDriver](https://sites.google.com/a/chromium.org/chromedriver/home) and put it in my `PATH` (`$HOME/bin`)

Here is the _Hello Selenium world_ example from [the docs](http://selenium-python.readthedocs.io/getting-started.html). Notice how easy it is to interact with forms:

In [1]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()  # replaced Firefox by Chrome
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()

#### Example 1. Packt ebook manager

Packt gives away [a free ebook each day](https://www.packtpub.com/packt/offers/free-learning). I've been grabbing quite a few of them (back in our 100 Days we [wrote a notification script](https://github.com/pybites/100DaysOfCode/blob/master/076/packt_notification.py)). 

In this section I will make a simple Selenium script to search my collection for a title and provide me the download link. Of course my books are behind login, so I use Selenium to login first. This script/idea also came out of our own 100 days of code, see [day 66](https://github.com/pybites/100DaysOfCode/blob/master/066/packt.py).

How would this work? Quite simply:

1. Go to the Packt login URL and login:
    ![logged out](images/packt1.png)
2. You get to your Account page:
    ![logged in](images/packt2.png)
3. Go the _My eBooks_ links and parse the html:
    ![parse ebook html](images/packt3.png)

Let's get coding!

First of all as per [The twelve-factor app](https://12factor.net/config) I store config details in the environment, __never__ in the script. 

`os.environ.get` lets you retrieve environment variables defaulting to `None` if not found. The check for `None` I consider a more Pythonic pattern and to increase the readability I added my own exception: `NoLogin`.

In [2]:
import os

user = os.environ.get('PACKT_USER')
pw = os.environ.get('PACKT_PW')

As shown in the example before we create a `driver` object and go to the `login` URL. We find the `edit-name` and `edit-pass` form elements and send the user and password (stored in the `user` and `pw` variables respectively). Appending the `Keys.RETURN` submits the form (html forms can be submitted by hitting Enter on an input field, not so on a textarea box, there it would enter a newline).

In [3]:
login = 'https://www.packtpub.com/login'

driver = webdriver.Chrome()
driver.get(login)

driver.find_element_by_id('edit-name').send_keys(user)
driver.find_element_by_id('edit-pass').send_keys(pw + Keys.RETURN)

Note that at this point a Chrome browser window opened in the background. It will close when we close the driver later:

![selenium web browser](images/packt4.png)

Als note that here is a natural delay between steps because we are using a notebook. In a script though steps run one after the other at lightning speed. 

Also in this case there is no pagination, so my 100+ books take some time to load. So if you use this as a script you might want to add: `driver.implicitly_wait(3)`

Now let's get to the actual content, storing it in `elements`:

In [4]:
driver.find_element_by_link_text('My eBooks').click()

Next I use a _dictionary comprehension_ to store the book id (`nid`) as keys and the book titles as values. You could sync this to a local file or sqlite DB if you want to cache these results. 

In [5]:
elements = driver.find_elements_by_class_name("product-line")
books = {e.get_attribute('nid'): e.get_attribute('title') for e in elements}
books

{'10068': 'Learning Ext JS 4 [eBook]',
 '10264': 'Implementing Splunk: Big Data Reporting and Development for Operational Intelligence [eBook]',
 '10744': 'Nagios Core Administration Cookbook [eBook]',
 '10763': 'Continuous Delivery and DevOps: A Quickstart guide [eBook]',
 '11441': 'Learning SciPy for Numerical and Scientific Computing [eBook]',
 '11703': 'Building Machine Learning Systems with Python [eBook]',
 '11723': 'Learning jQuery - Fourth Edition [eBook]',
 '11913': 'Mastering Web Application Development with AngularJS [eBook]',
 '12001': 'OpenCV Computer Vision with Python [eBook]',
 '12050': "Magento : Beginner's Guide - Second Edition [eBook]",
 '12318': 'Python Geospatial Development - Second Edition [eBook]',
 '12364': 'Object-Oriented JavaScript - Second Edition [eBook]',
 '12730': 'Learning Vaadin 7: Second Edition [eBook]',
 '12883': '3D Printing Blueprints [eBook]',
 '13253': 'Boost C++ Application Development Cookbook [eBook]',
 '13532': 'Python Data Visualization Co

First time around I made the mistake to close `driver` after `elements`, but `get_attribute` I used in `books` still needs the session. So only now should you close it. This also closes the Chrome app:

In [6]:
driver.close()

Now let's write a function to receive a regex search term and book format and print the books that match the books in my collection:

In [7]:
import re

DOWNLOAD_URL = 'https://www.packtpub.com/ebook_download/{nid}/{ebook_format}'
BOOK_FORMATS = 'pdf epub mobi'

def get_books(grep, ebook_format):
    """Receives a grep regex and book format (epub, pdf, mobi)
       and prints the titles + urls of matching ebooks"""
    grep = grep.lower()
    ebook_format = ebook_format.lower()
    if ebook_format not in BOOK_FORMATS.split():
        raise ValueError(f'Not a valid book format (valid are: {BOOK_FORMATS})')
        
    for nid, title in books.items():
        if re.search(grep, title.lower()):
            url = DOWNLOAD_URL.format(nid=nid, ebook_format=ebook_format)
            print(title, url)

In [8]:
get_books('python.*data', 'mobi')

Python Data Science Essentials - Second Edition [eBook] https://www.packtpub.com/ebook_download/27146/mobi
Python Machine Learning Blueprints: Intuitive data projects you can relate to [eBook] https://www.packtpub.com/ebook_download/24221/mobi
Python Data Visualization Cookbook [eBook] https://www.packtpub.com/ebook_download/13532/mobi


In [9]:
get_books('Machine.*Learning', 'PDF')

Machine Learning for the Web [eBook] https://www.packtpub.com/ebook_download/24826/pdf
Python Machine Learning Blueprints: Intuitive data projects you can relate to [eBook] https://www.packtpub.com/ebook_download/24221/pdf
What You Need to Know about Machine Learning [eBook] https://www.packtpub.com/ebook_download/27683/pdf
Mastering Machine Learning with scikit-learn [eBook] https://www.packtpub.com/ebook_download/17805/pdf
Practical Machine Learning [eBook] https://www.packtpub.com/ebook_download/20987/pdf
Machine Learning with R - Second Edition [eBook] https://www.packtpub.com/ebook_download/21989/pdf
Machine Learning with Spark [eBook] https://www.packtpub.com/ebook_download/17399/pdf
Python Machine Learning [eBook] https://www.packtpub.com/ebook_download/17954/pdf
Building Machine Learning Systems with Python [eBook] https://www.packtpub.com/ebook_download/11703/pdf


#### Example 2. autocreate a PyBites banner

Some time ago [I made a banner generator with Pillow and Flask](https://pybit.es/pillow-banner-flask.html). It is hosted [here](http://pybites-banners.herokuapp.com). 

Although this is nice what if I want to make banners automatically? Let's try to do so using Selenium:

Let's break the task down into various steps:

1. Although the site can be used without login, authenticated users have their banners stored, so go straight to login URL:
    ![go to site](images/banner1.png)

2. And login: 
    ![login](images/banner2.png)

3. We need to locate the form elements and provide the proper data, then click the submit button:
    ![provide data](images/banner3.png)

4. We need to download the output image it generates:
    ![get banner](images/banner4.png)

After previous exercise this should be quite straight-forward:

In [10]:
user = os.environ.get('PB_BANNER_USER')
pw = os.environ.get('PB_BANNER_PW')

class NoLogin(Exception):
    pass

if user is None or pw is None:
    raise NoLogin('Set PB_BANNER_USER and PB_BANNER_PW in your env')

In [11]:
login = 'https://pybites-banners.herokuapp.com/login'

driver = webdriver.Chrome()
driver.get(login)

driver.find_element_by_id('username').send_keys(user)
driver.find_element_by_id('password').send_keys(pw + Keys.RETURN)

In [12]:
from datetime import datetime

def get_title():
    """Creates a title to store banner as, e.g. newsYYYYWW
       (YYYY = year, WW = week number)"""
    now = datetime.now()
    year = now.year
    week = str(now.isocalendar()[1]).zfill(2)
    return f'news{year}{week}'

title = get_title()

In [14]:
now = datetime.now()
year = now.year
week = str(now.isocalendar()[1]).zfill(2)
news_option = 'pybites-news'

bg_image = 'http://www.allwhitebackground.com/images/2/2210.jpg'
banner_text = f'from pybites import News -> Twitter Digest {year} Week {week}'

driver.find_element_by_id('name').send_keys(title)
driver.find_element_by_xpath(f'//select[@name="image_url1"]/option[text()="{news_option}"]').click()
driver.find_element_by_id('text').send_keys(banner_text)
driver.find_element_by_id('image_url2').send_keys(bg_image + Keys.RETURN)

And the result:

![resulting banner](images/banner5.png)

In [15]:
driver.close()

## Second + third day: practice time!

Now it's your turn. The goal is to have you get your hands dirty using Python Selenium.

### Testing with Selenium

I deliberately left testing out, because we have a nice code challenge for you to practice:

- First checkout the docs: [Using Selenium to write tests](http://selenium-python.readthedocs.io/getting-started.html#using-selenium-to-write-tests)
- Then head over to [Code Challenge 32 - Test a Simple Django App With Selenium](https://codechalleng.es/challenges/32/) and try to automate testing of [PyBites first ever Django App](http://pyplanet.herokuapp.com).

### Scratch your own itch

Although the testing option is our favorite, we want you to be free.

In this notebook I gave you two examples of automated tasks. Maybe you want to try them yourself, build them out?

Or what about trying to use Selenium on your favorite website or service? e.g. login to Facebook, Twitter, Reddit and read/post content ...

There are many possibilities. Again, having you code up a script using what you just learned will make all the difference in what you get out of this lesson. 

Have fun and remember: _Keep calm and code in Python!_

### Time to share what you've accomplished!

Be sure to share your last couple of days work on Twitter or Facebook. Use the hashtag **#100DaysOfCode**.

Here are [some examples](https://twitter.com/search?q=%23100DaysOfCode) to inspire you. Consider including [@talkpython](https://twitter.com/talkpython) and [@pybites](https://twitter.com/pybites) in your tweets.

*See a mistake in these instructions? Please [submit a new issue](https://github.com/talkpython/100daysofcode-with-python-course/issues) or fix it and [submit a PR](https://github.com/talkpython/100daysofcode-with-python-course/pulls).*