# Selenium

**Selenium** is a powerful tool for automating web browsers. It is widely used for testing web applications, but it can also be used for web scraping, automating repetitive tasks, and interacting with web pages in a way that mimics human behavior. Selenium supports multiple programming languages, including Python, Java, C#, and Ruby.

With Selenium you can:

- Automate tasks such as clicking buttons, filling out forms, navigating pages, and more.
- Emulate multiple browsers, including Chrome, Firefox, Safari, and Edge.
- Perform Web-Scraping (it is especially good for dynamic content).
- Use it to test your aplication (pytest, unittest...)
- Create a "headless" browser (without a graphical user interface)

## Installation

[Install Selenium](https://selenium-python.readthedocs.io/installation.html)

## Webdrive manager (optional)

Many people use a second library called webdriver_manager that allows you to better use and configure the webdrivers.
In that case, you can use the following library.

[Install webdriver_manager](https://pypi.org/project/webdriver-manager/)

## XPATH
XPath (XML Path Language) is a query language for selecting nodes from an XML document. It is also used to navigate through elements and attributes in an XML document. In the context of web automation with tools like BeautifulSoup, XPath is used to locate elements within the HTML structure of a web page.

### Basic Syntax

- **Absolute XPath**: Starts with a single slash (`/`) and represents the full path from the root element.
  ```xml
  /html/body/div/input
  ```

- **Relative XPath**: Starts with a double slash (`//`) and represents a search for the element anywhere in the document.
  ```xml
  //input[@name='username']
  ```

### Common XPath Expressions

1. **Selecting Elements by Tag Name**:
   ```xml
   //input
   ```
   Selects all `<input>` elements in the document.

2. **Selecting Elements by Attribute**:
   ```xml
   //input[@name='username']
   ```
   Selects the `<input>` element with the `name` attribute equal to `username`.

3. **Selecting Elements by Text Content**:
   ```xml
   //button[text()='Submit']
   ```
   Selects the `<button>` element with the text content `Submit`.

4. **Selecting Elements by Partial Attribute Value**:
   ```xml
   //input[contains(@name, 'user')]
   ```
   Selects the `<input>` element where the `name` attribute contains the substring `user`.

5. **Selecting Elements by Position**:
   ```xml
   //div[position()=1]
   ```
   Selects the first `<div>` element in the document.
   
### Getting the XPATH of a tag with a browser

When you click on inspect and then select a tag, you can right-click go to "copy" and "copy XPATH".

## First example : a search of the news of Bing

### The ```.find_element()``` and ```.find_elements()``` methods

The `find_element` and `find_elements` methods in Selenium are used to locate elements on a web page. The main difference between the two is that `find_element` returns a single element, while `find_elements` returns a list of elements that match the specified criteria.

### The ```By``` class

The `By` class in Selenium is a utility class that provides a set of predefined locator strategies for finding elements on a web page. It is part of the `selenium.webdriver.common.by` module and is used to specify the type of locator (e.g., ID, name, class name, XPath, etc.) when searching for elements.

The most common locators are :

1. **By.ID**: Locates an element by its `id` attribute.
   ```python
   element = driver.find_element(By.ID, "element_id")
   ```

2. **By.NAME**: Locates an element by its `name` attribute.
   ```python
   element = driver.find_element(By.NAME, "element_name")
   ```

3. **By.CLASS_NAME**: Locates an element by its `class` attribute.
   ```python
   element = driver.find_element(By.CLASS_NAME, "element_class")
   ```

4. **By.TAG_NAME**: Locates an element by its tag name.
   ```python
   element = driver.find_element(By.TAG_NAME, "tag_name")
   ```

5. **By.XPATH**: Locates an element using an XPath expression.
   ```python
   element = driver.find_element(By.XPATH, "//xpath/expression")
   ```

6. **By.CSS_SELECTOR**: Locates an element using a CSS selector.
   ```python
   element = driver.find_element(By.CSS_SELECTOR, "css_selector")
   ```
   
### Basic example

Let's use selenium to navigate and make a research for us.

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By

URL = "https://www.bing.com/news"

browser = webdriver.Chrome()
browser.get(URL)

query_field = browser.find_element(By.NAME, "q") # You can find the element using the attribute
#query_field = browser.find_element(By.XPATH, "//*[@id='sb_form_q']") # or using the XPATH

query_field.send_keys("Kamala Harris") # Let's type something

search_button = browser.find_element(By.XPATH, "//input[@id='sb_form_go']") # Let's find the button
search_button.click()

### Headless and screenshot example

In [None]:
from selenium import webdriver

from PIL import Image # To display the image
from io import BytesIO # To deal with binary data

URL = "https://www.bing.com/news"

# Headless options
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")

# To reset the options without restarting the kernel
# from selenium.webdriver.chrome.options import Options
# chrome_options = Options()

browser = webdriver.Chrome(options=chrome_options)

browser.get(URL)

query_field = browser.find_element("name", "q") # You can find using the attribute
#query_field = browser.find_element("xpath", "//*[@id='sb_form_q']") # or XPATH

query_field.send_keys("Kamala Harris") # Let's type something

search_button = browser.find_element("xpath", "//input[@id='sb_form_go']") # Let's find the button
search_button.click()

img = Image.open(BytesIO(browser.get_screenshot_as_png())) # take a screenshot

browser.quit() # Let's quit the browser this way, because we don't have a GUI.

img

## Exercice

❓ **>>>** Using Selenium (and other libraries) and a headless browser: 
1. Go to the website https://reporterre.net/.
1. Find the main article.
1. Click on it to read the article.
1. Display the author name.
1. Take a screenshot.

In [None]:
# Code here!


## Exercice

❓ **>>>** Using Selenium (and other libraries) and a headless browser: 
1. Go to the website https://www.saucedemo.com which has been specially designed to be navigated through libraries such as Selenium.
1. Find a way to enter the website.
1. Click on all the button "Add to cart" on the webpage.
1. Go to the page "cart" and take a screenshot to make sure you've added all products.

In [None]:
# Code here


## Sleep, WebDriverWait, EC, Keys

When dealing with real websites, we often need more tools.

### Sleep

Using the sleep function from the time library allows us to be sure that the webpage is loaded.

```python
from time import sleep
sleep(5)
```

### WebDriverWait()

This creates an instance of the `WebDriverWait` class, which is used to wait for a specific condition to be met within a specified timeout period.

```python
WebDriverWait(browser, 5)
```

### .until()

The `until` method is used to wait until a specific condition is met.

```python
.until(EC.visibility_of_element_located((By.XPATH, "//input[@name='username']")))
```

### EC (expected condition)

```EC.visibility_of_element_located```: This is an expected condition from the expected_conditions module (EC). It waits until the specified element is visible on the page.

### Keys

The Keys class allows us to simulate the keyboard. For instance ```Keys.ENTER``` tells to the browser to presse enter.

## Cookies consent

All the previous methods and tips should allow you to deal with the consent cookies pop-up, at least to a certain extent.

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

URL = "https://www.bing.com/news"

browser = webdriver.Chrome()
browser.get(URL)

query_field = browser.find_element(By.NAME, "q")
query_field.send_keys("Kamala Harris") # Let's type something

WebDriverWait(browser, 5).until(EC.visibility_of_element_located((By.XPATH, "//input[@id='sb_form_go']"))).click()

## Exercice

❓ **>>>** Using Selenium (and other libraries) and a headless browser: 
1. Go to the website https://www.google.com
1. Find a way to refuse the cookies.
1. Do a resaerch on "Guido Van Rossum".
1. Take a screenshot.

In [None]:
# Code here!
