# Browser Automation Lecture Notes
Date: 2025-07-09

We'll use this notebook to work through some examples and showcase some essential functions in Playwright.

In [1]:
!pip install ipykernel==6.28.0



In [1]:
import os
import random
import time

from playwright.async_api import async_playwright, expect

In [3]:
!pip install playwright



In [4]:
!which playwright

/Users/lyin72/miniconda3/bin/playwright


In [5]:
!playwright install

In [2]:
os.makedirs('data/', exist_ok=True)

In [3]:
# Start the browser
playwright = await async_playwright().start()

In [4]:
browser = await playwright.chromium.launch(headless=False)
page = await browser.new_page()
await browser.close()

In [7]:
async def open_browser(headless=False):
    """
    Starts the automated browser and opens a new window
    """
    # Start playwright
    playwright = await async_playwright().start()

    # Open chromium (chrome) browser, can use firefox or others
    browser = await playwright.chromium.launch(headless=headless)
  
    # Create a new browser window
    page = await browser.new_page()

    return browser, page

In [8]:
driver, page = await open_browser()

In [12]:
# visit a URL
url = 'https://amazon.com'
await page.goto(url)

<Response url='https://www.amazon.com/' request=<Request url='https://www.amazon.com/' method='GET'>>

## XPATH
You learned about beautifulSoup to work with HTML...

Xpath is another way to navigate hierarchical structures of HTML and SVG

It's fast, versatile, and can be used in the developer console and in any computing language.

## Using XPATH
You can test xpaths in the devloper tools under the `console` tab using the function `$x()`. Read about that function [here](https://developer.chrome.com/docs/devtools/console/utilities/#xpath-function).


You can use Playwright's `locator` [function](https://playwright.dev/python/docs/locators#locate-by-css-or-xpath) to find the search box on Amazon site using an xpath or css selector. It'll return the first match unless you add `.all()` to the located elements.

Playwright allows other [locators](https://playwright.dev/python/docs/locators#quick-guide), which the developers suggest (but I don't).

Example of an xpath!

//input[@aria-label="Search Amazon"]

### Finding elements and performing actons
Browser automation is largely about locating elements on the page, and interacting with them in some way.
This can involve filling forms, mocking pressing buttons on a keyboard, clicking things.

In [13]:
# here's how you can do that with xpath
search_bar = page.locator(
    '//input[@aria-label="Search Amazon"]'
)

You can also use the built-in functionality by exploiting the placeholder text
```
page.get_by_placeholder("Search Amazon")
```

In [14]:
search_bar

<Locator frame=<Frame name= url='https://www.amazon.com/'> selector='//input[@aria-label="Search Amazon"]'>

Let's fill in the search bar.

In [15]:
await search_bar.fill('TEST')

In [16]:
search_term = 'womens shirt'
await search_bar.fill(search_term)

You can make the search either by inputting the "Enter" button, or finding the search button and clicking it.

Here's how you can press a [keyboard](https://playwright.dev/docs/api/class-keyboard) button.

In [17]:
await page.keyboard.press("Enter")

Alternatively, you can locate the button perform an [action](https://playwright.dev/python/docs/input#mouse-click) such as a mouse `click`.

In [18]:
search_button = page.locator(
    '//input[@id="nav-search-submit-button"]'
)
await search_button.click()

### Parse the products

For each product, let's print the brand name:

In [19]:
# contains supports substring matches
xpath_product = '//div[contains(@cel_widget_id, "MAIN-SEARCH_RESULTS-")]'
product_tiles = await page.locator(xpath_product).all()
len(product_tiles)

0

Notice we're adding `.all()` to the command, this will return a list, rather than the first element.

If you run all the cells at once, the above will return zero results. This is because the browser needs to await for the element to be visible.

You can force the browser to sleep or check the element is visible using the `expect` [function](https://playwright.dev/python/docs/test-assertions).

In [20]:
# Let's wait for the first product to load
await expect(page.locator(xpath_product).first).to_be_visible()

In [21]:
# run it again, after waiting for the elements to be rendered
product_tiles = await page.locator(xpath_product).all()
len(product_tiles)

33

This is the sleeping method

In [22]:
# import asyncio

# await asyncio.sleep(2)
# product_tiles = await page.locator(xpath_product).all()

Let's parse one product

In [23]:
prod = product_tiles[0]

In [24]:
# see the text of the element
await prod.text_content()

'\n\n\n\n    \n\n\n\n    +12 other colors/patternsFeatured from Amazon brandsFeatured from Amazon brands Amazon EssentialsWomen\'s Regular-Fit 3/4 Sleeve V-Neck T-Shirt (Available in Plus Size), Multipacks 4.2 out of 5 stars 10,509  100+ bought in past monthPrice, product page$8.90$8.90 Typical: $13.30Typical: $13.30$13.30FREE delivery Fri, Jul 11 on $35 of items shipped by AmazonOr fastest delivery Tue, Jul 8  1 sustainability feature<img alt="" src="https://m.media-amazon.com/images/I/11++B3A2NEL.png" height="24px" width="24px"/>  Sustainability featuresThis product has sustainability features recognized by trusted certifications. Safer chemicalsMade with chemicals safer for human health and the environment.As certified by<img alt="" src="https://m.media-amazon.com/images/I/51YvKwF01yL._SS200_.jpg" height="24px" width="24px"/>  OEKO-TEX STANDARD 100Learn more about OEKO-TEX STANDARD 100<img alt="" src="https://m.media-amazon.com/images/I/51YvKwF01yL._SS200_.jpg" height="36px" width="

Let's iterate through each product and print the brand name, which is saved as a header (h2) with a unique class in the element:

In [25]:
xpath_brand = '//h2[@class="a-size-mini s-line-clamp-1"]'
for product in product_tiles:
    brand = await product.locator(xpath_brand).text_content()
    print(brand)

Amazon Essentials
AUTOMET
ANRABESS
Trendy Queen
AUTOMET
Amazon Essentials
AUTOMET
WIHOLL
OFEEFAN
AUTOMET
Zeagoo
AUTOMET
Real Essentials
EyMuse
ANRABESS
Oriental Pearl
Amazon Essentials
BAOKUAN
Huukeay
JomeDesign
Memorose
Trendy Queen
CE' CERDR
Astylish
Falechay
G Gradual
MaQiYa
YOGINGO
AUTOMET
Blooming Jelly
ANRABESS
AUTOMET
EVALESS


Although we did this all using Playwright, it's better to save the page source and then parse the saved results in BeautifulSoup, lxml, or whatever parsing software you prefer.

### Annotate the elements we find
Let's find all the ads, and highlight them red on the page.

In [26]:
# xpath_ads = '//div[@data-asin and .//a[@aria-label="View Sponsored information or leave ad feedback"]]'
xpath_ads = '//div[@data-asin and .//span[@aria-label="View Sponsored information or leave ad feedback"]]'
ads = await page.locator(xpath_ads).all()

In [27]:
len(ads)

10

You can "inject" attributes into elements, including style attributes.

In [28]:
elem = ads[0]

In [29]:
style = f"background-color: red !important; transition: all 0.5s linear;"

In [30]:
await elem.evaluate(f"el => el.setAttribute('style','{style}')")

In [31]:
async def stain(elem, color = 'red'):
    """
    Injects a style attribute to stain `elem` the `color` red.
    """
    style = f"background-color: {color} !important; "\
             "transition: all 0.5s linear;"
    await elem.evaluate(f"el => el.setAttribute('style','{style}')")

In [32]:
for elem in ads:
    await stain(elem)

### Get height of document

In [33]:
import pandas as pd

In [34]:
height = await page.evaluate("document.body.scrollHeight")

In [35]:
height

13625

Get the coorindates and size of each element using the `bounding_box` function.

In [36]:
await elem.bounding_box()

{'x': 767.203125,
 'y': 6555.4296875,
 'width': 250.3984375,
 'height': 660.4921875}

In [37]:
ad_metadata = []
for elem in ads:
    if await elem.is_visible(): # use this function to only analyze visable elements
        rect = await elem.bounding_box()
        ad_metadata.append(rect)

In [38]:
df = pd.DataFrame(ad_metadata)

In [39]:
df['y']

0    3965.460938
1    3965.460938
2    3965.460938
3    5254.445312
4    5254.445312
5    5254.445312
6    5254.445312
7    6555.429688
8    6555.429688
9    6555.429688
Name: y, dtype: float64

In [40]:
df['how_far_down'] = df['y'] / height

In [41]:
df.how_far_down.value_counts()

how_far_down
0.385647    4
0.291043    3
0.481132    3
Name: count, dtype: int64

### Save receipts

In [42]:
# how to save what the emulator sees
source = await page.content()
with open('data/amazon_selenium_test.html', 'w') as f:
    f.write(source)

In [43]:
# just what's visible
screenshot = await page.screenshot(path='data/amazon_selenium_test.png')

### Parsing the results however you like
For me it means using lxml, but you can do this same thing in BeautifulSoup, and I encourage you do so...

In [44]:
from lxml import etree

In [45]:
dom = etree.HTML(open('data/amazon_selenium_test.html').read())

In [46]:
product_metadata = []
for result in dom.xpath('.//div[contains(@cel_widget_id, "MAIN-SEARCH_RESULTS")]'):
    # this is where you can parse as many fields as you like.
    brand, product_name = result.xpath('.//h2//text()')[:2]
    product_metadata.append({
        'brand': brand,
        'product_name': product_name
    })

In [47]:
pd.DataFrame(product_metadata)

Unnamed: 0,brand,product_name
0,Amazon Essentials,Women's Regular-Fit 3/4 Sleeve V-Neck T-Shirt ...
1,AUTOMET,Women Shirts Summer Sweaters Regular Fit Short...
2,ANRABESS,Womens Short Sleeve Henley Tops V Neck Dressy ...
3,Trendy Queen,Womens Basic T Shirts Summer Tops 2025 Crop Sh...
4,AUTOMET,Summer Tops Womens Spring Short Sleeve Shirts ...
5,Amazon Essentials,Women's Regular-Fit Short-Sleeve V-Neck T-Shir...
6,AUTOMET,Women's Short Sleeve Shirts Dressy Lace Summer...
7,WIHOLL,Tops for Women Summer Casual Ruffle Trim Sleev...
8,OFEEFAN,Womens T Shirts Short Sleeve Pleated Dressy Ca...
9,AUTOMET,Women's Summer Short Sleeve Shirts Trendy T-Sh...


### Automate rotating "AUTOMET"
1. Find all products
2. filter to those with brand == AUTOMET
3. Find the image
    - Inject Javascript to make it spin.

In [48]:
async def spin(elem):
    """
    Injects a style attribute to rotate `elem` 180 degrees.
    """
    style = f"transform: rotate(180deg) !important; "
    await elem.evaluate(
            f"elm => elm.setAttribute('style','{style}')"
    )

Here's how to do this for ads (which we found previously)

In [49]:
for elem in ads:
    await spin(elem)

## Here's the bountry
Ultimately, I want every product image from specific brands (of y0ur choosing) to rotate continuously.
You need a full screenshot or a video is fine, too.

Also, make sure to save the results before you parse them.

In [50]:
for product in product_tiles:
    # get the brand name...
    brand_name = await product.locator(TK)
    
    # check if brand is in list
    if brand_name in brands_to_spin:
        print(product.text_content())
        # find the image TK
        spin(product)

NameError: name 'TK' is not defined

In [None]:
await driver.close()