---
title: "Advanced Usage"
pagetitle: "Selenium Wire"
description-meta: "Introduction, case studies, and exercises for automating browsers."
description-title: "Introduction, case studies, and exercises for automating browsers."
author: "Piotr Sapiezynski and Leon Yin"
author-meta: Piotr Sapiezynski and Leon Yin"
date: "06-11-2023"
date-modified: "06-17-2023"
execute: 
  enabled: false
keywords: data collection, web scraping, browser automation, algorithm audits, personalization
twitter-card:
  title: Browser Automation
  description: Introduction, case studies, and exercises for automating browsers.
  image: assets/inspect-element-logo.jpg
open-graph:
  title: Browser Automation
  description: Introduction, case studies, and exercises for automating browsers.
  locale: us_EN
  site-name: Inspect Element
  image: assets/inspect-element-logo.jpg
href: selenium_wire
---

This section will walk through advanced use cases you might run into when using browser automation.

1. Intercepting network requests (API calls) while browsing
2. 

### Requirements.txt

Here are the Python packages we'll use to initercept traffic in Selenium.

- `selenium-wire` is a package that offers the same functionality of `selenium`, with the added bonus of being able to intercept network traffic. (API requests).<br>
- `brotlipy` is a package used to decode compressed responses from servers: aka when the response looks like random characters.<br>
- `chromedriver-binary-auto` to help Selenium find the web driver for Chromium.

We'll also upgrade a default library `requests`, because older versions of the package will not function property.

In [3]:
# !pip install selenium-wire requests chromedriver-binary-auto bropotlipy

In [2]:
# !pip install requests --upgrade

## Intercepting Network Requests in Selenium

Selenium-Wire can be used anytime you would use Selenium. All we need to do is change the import from `selenium` to `seleniumwire`. <br>
Notice we continue too use `chromedriver_binary` to make our lives easier.

In [40]:
from seleniumwire import webdriver
import chromedriver_binary

driver = webdriver.Chrome()

Ideally, a blank window of Chrome should appear without any error messages. Some M1-series Macbooks run into issues download Selenium. Here's [a potential fix](https://stackoverflow.com/a/74651536/18264897) for that issue.

### Visiting a website and triggering requests

To demonstrate how to intercept network requests in Selenium, we'll trigger DuckDuckGo's autocomplete in the browser and fetch the network request (undocumented API) running in the background communicating with their servers.

In [41]:
# open the duckduckgo website in our automated browser
driver.get('https://duckduckgo.com')

Now, manually type "why are" in the search box to trigger the autocomplete function.<br>
Bonus: do this programmatically.

You'll notice this is nearly identical to our [finding undocumented APIs](/apis.html) tutorial.

Rather than find the network requests in the `DevTools`, we can view them programmatically here, thanks for Selenium Wire's built-in `requests` attribute to web`driver`.

In [42]:
driver.requests[-3:]

[Request(method='GET', url='https://duckduckgo.com/ac/?q=why+a&kl=wt-wt', headers=[('sec-ch-ua', '"Google Chrome";v="117", "Not;A=Brand";v="8", "Chromium";v="117"'), ('sec-ch-ua-mobile', '?0'), ('user-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'), ('sec-ch-ua-platform', '"macOS"'), ('accept', '*/*'), ('sec-fetch-site', 'same-origin'), ('sec-fetch-mode', 'cors'), ('sec-fetch-dest', 'empty'), ('referer', 'https://duckduckgo.com/'), ('accept-encoding', 'gzip, deflate, br'), ('accept-language', 'en-US,en;q=0.9')], body=b''),
 Request(method='GET', url='https://duckduckgo.com/ac/?q=why+are&kl=wt-wt', headers=[('sec-ch-ua', '"Google Chrome";v="117", "Not;A=Brand";v="8", "Chromium";v="117"'), ('sec-ch-ua-mobile', '?0'), ('user-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'), ('sec-ch-ua-platform', '"macOS"'), ('accept', '*/*'), ('s

Above we list the three latest network requests, and find that the url `https://duckduckgo.com/ac/?q=why+are+&kl=wt-wt` seems like its the undocumented API for autocomplete.

As you'll soon notice if you repeat this step: requests are being made all the time!

In [44]:
# You can save a list of them like so:
saved_requests = driver.requests

You can filter the requests using a [list comprehension](https://www.w3schools.com/python/python_lists_comprehension.asp) (or any other way of sifting through a list).

In [38]:
look_for = 'duckduckgo.com/ac/'

In [45]:
found_requests = [r for r in saved_requests if look_for in r.url]
found_requests

[Request(method='GET', url='https://duckduckgo.com/ac/?q=why&kl=wt-wt', headers=[('sec-ch-ua', '"Google Chrome";v="117", "Not;A=Brand";v="8", "Chromium";v="117"'), ('sec-ch-ua-mobile', '?0'), ('user-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'), ('sec-ch-ua-platform', '"macOS"'), ('accept', '*/*'), ('sec-fetch-site', 'same-origin'), ('sec-fetch-mode', 'cors'), ('sec-fetch-dest', 'empty'), ('referer', 'https://duckduckgo.com/'), ('accept-encoding', 'gzip, deflate, br'), ('accept-language', 'en-US,en;q=0.9')], body=b''),
 Request(method='GET', url='https://duckduckgo.com/ac/?q=why+a&kl=wt-wt', headers=[('sec-ch-ua', '"Google Chrome";v="117", "Not;A=Brand";v="8", "Chromium";v="117"'), ('sec-ch-ua-mobile', '?0'), ('user-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'), ('sec-ch-ua-platform', '"macOS"'), ('accept', '*/*'), ('sec-f

In [48]:
last_request = found_requests[-1]
response = last_request.response

### Decoding compressed responses
The responses are stored as parameters with each request, let's look at the response to the most recent request:

In [49]:
response

Response(status_code=200, reason='', headers=[('server', 'nginx'), ('date', 'Sun, 08 Oct 2023 19:00:13 GMT'), ('content-type', 'application/javascript; charset=UTF-8'), ('vary', 'Accept-Encoding'), ('strict-transport-security', 'max-age=31536000'), ('permissions-policy', 'interest-cohort=()'), ('content-security-policy', "default-src 'none' ; connect-src  https://duckduckgo.com https://*.duckduckgo.com https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ ; manifest-src  https://duckduckgo.com https://*.duckduckgo.com https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ ; media-src  https://duckduckgo.com https://*.duckduckgo.com https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ ; script-src blob:  https://duckduckgo.com https://*.duckduckgo.com https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ 'unsafe-inline' 'unsafe-eval' ; font-src data:  https://duckduckgo.com https://*.duckduckgo.com https://duckduckg

We were expecting a list of suggestions but got this instead:

In [50]:
response.body

b'\x15B\x01\x00\xc4\xca\xb9\x94\xdd\x89\xaeP\xaf,_\x93\x0bPa\xf3@np\xc0\x1e\xf8\xf2\xb8\x8d\xe3\x9c\xe8d;\x0e8\xd8\xa0\x1b{h\xb6!zn\xf6\xb5\xe3\xd7\xacq\x17\xaa9\x98g|Cv\xc5.o\xcc\xd2\t.\xd0\xd7\xaa\x9d\xe9\xa1\xe6?\xe1m\r\x01\x85%A\xc8\xc0\xfa\x1b\xd0\x84\x03cq\xf2\xae3\xac\x0c\x82VvuX<\xec\xdd\xde\xdb\xac;\xe5A\x08;\x84x\xbc\xc3\x8d\x91[\xeaa!n\x86\xe0\xb4\xbd\x11\xf3\x11\xf0\xb9\xd4\xaa\xbd\x99\xa5]\x83]\x88]\x9d\xf9\x9f\x01'

The response body looks like this because it's compressed. To actually read it, we will have to decompress it first.

We'll use brotli to `decode` the compressed response.

In [51]:
import brotli

decompressed = brotli.decompress(response.body).decode('utf-8')
decompressed

'[{"phrase":"why are ray bans so expensive"},{"phrase":"why are eggs so expensive"},{"phrase":"why are flags at half mast today"},{"phrase":"why are cats scared of cucumbers"},{"phrase":"why are gas prices rising"},{"phrase":"why are flamingos pink"},{"phrase":"why are my feet swollen"},{"phrase":"why are firetrucks red"}]'

That's much more like it! We can turn it into a python object using the `json` package:

In [52]:
import json
resp_json = json.loads(decompressed)
resp_json

[{'phrase': 'why are ray bans so expensive'},
 {'phrase': 'why are eggs so expensive'},
 {'phrase': 'why are flags at half mast today'},
 {'phrase': 'why are cats scared of cucumbers'},
 {'phrase': 'why are gas prices rising'},
 {'phrase': 'why are flamingos pink'},
 {'phrase': 'why are my feet swollen'},
 {'phrase': 'why are firetrucks red'}]

### Replaying requests

We can also re-play any of the intercepted requests using the `requests` package. This way Let's stick to the recent one:

In [100]:
import requests

request = driver.last_request
response = requests.get(request)
response.json()

[{'phrase': 'why are flags at half mast today'},
 {'phrase': 'why are cats afraid of cucumber'},
 {'phrase': 'why are eggs so expensive'},
 {'phrase': 'why are gas prices rising'},
 {'phrase': 'why are flamingos pink'},
 {'phrase': 'why are you interested in this position'},
 {'phrase': 'why are my balls so itchy'},
 {'phrase': 'why are you running'}]

We can also modify any of the parameters of the Request before sending it off. Let's change the query from "why are" to "why is". The query is expressed as a parameter `q` in the URL:

[https://duckduckgo.com/ac/?q=why+are+&kl=wt-wt](https://duckduckgo.com/ac/?q=why+are+&kl=wt-wt)

for our convenience we can modify the URL parameters by changing the `request.params` dictionary:

In [102]:
params = request.params
params

{'q': 'why are', 'kl': 'wt-wt'}

In [103]:
params['q'] = 'why is'
request.params = params
request.url

'https://duckduckgo.com/ac/?q=why+is&kl=wt-wt'

In [104]:
#request.url = request.url.replace('why+are','why+is')
response = requests.get(request)
response.json()

[{'phrase': 'why is the sky blue'},
 {'phrase': 'why is my poop green'},
 {'phrase': 'why is the sun red'},
 {'phrase': 'why is the air quality bad today'},
 {'phrase': 'why is ronaldo benched'},
 {'phrase': 'why is it important'},
 {'phrase': 'why is roblox down'},
 {'phrase': 'why is gail off cbs'}]