This is the first part of our automated browser tutorial. We will cover the following common tasks as we prepare to study personalization on TikTok:

* Setting up the automated browser for use with Python.
* Hiding typical tell-tales of an automated browser to circumvent anti-bot protections
* Finding particular elements on the screen, reading their content, and interacting with them
* Scrolling
* Taking screenshots
* Saving data for future analysis

## Step 1: Setting up the browser
Our setup will consist from a real browser and an interface that will allow us to control that browser from python. We chose Google Chrome - not because it's our favorite but because using the most popular browser will help us blend in with real users.

### 1.1 Installing Google Chrome
We will do this tutorial using Google Chrome. Please download the most recent version from [here](https://www.google.com/chrome/).

If you already have Google Chrome installed, please make sure it's at its newest version by opening pasting this address in the Chrome address bar: [chrome://settings/help](chrome://settings/help) and verifying that there are no pending updates.

![](assets/browser1_01_version.png "Google Chrome window showing the current version")

### 1.2 Installing the webdriver
Webdriver is our interface between Python and the browser. It is specific to the browser (there are different webdrivers for Firefox, Safari, etc) and even to the particular version of the browser. It's easier to ensure we got the correct version by installing the webdriver that automatically detects the current version of Chrome. Run the code in the cell below to do that.

> Adding an exclamation mark before code in Jupyter notebook allows you to run commands as if you were in the terminal.

In [None]:
!pip install chromedriver-binary-auto

Let's see if the installation worked correctly! Run the cell below to import the correct webdriver and open a new Chrome window.

In [1]:
from selenium import webdriver
import chromedriver_binary # adds the chromedriver binary to the path

driver = webdriver.Chrome()

The `chrome-driver-auto` package should have installed a driver that's suitable for your current Chrome version and line above should have opened a new Chrome window. 
Instead you might get a version mismatch error, like this:

```
SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 112
Current browser version is 113 with binary path /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
```
It means that you probably updated your Chrome in the meantime. To fix it, reinstall the Python package:

In [None]:
!pip install --upgrade --force-reinstall chromedriver-binary-auto

If everything works fine and you have the window open, our setup is complete and you can now close the Chrome window:

In [None]:
driver.close()

# Step 2: Hiding typical tell-tales of an automated browser
When you opened Chrome you noticed that it displays a warning about being an automated session. 
Even though the warning is only displayed to you, there are tricks that website operators can use to detect these warnings and refuse to serve content. 

Let's remove those!

In [19]:
def new_window():
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")

    # remove all signs of this being an automated browser
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)

    # open the browser with the new options
    driver = webdriver.Chrome(options=options)
    return driver

In [20]:
driver = new_window()
driver.get('https://tiktok.com')

This should open a new window without those warnings and navigate to tiktok.com:

![](assets/browser1_02_tiktok.png "tiktok main page")




## Step 3: Finding elements on page and interacting with them

We will perform our first attempt at the experiment without logging in, but we will also learn how to create multiple accounts and how to log in later.

Instead of logging in, our first interaction will be dissmissing this login window. Doing this programmatically has two steps:

1. We need to identify that \[X\] button in the page source 
2. And then click it

Let's inspect the button:
![](assets/browser1_03_dismiss.png "Inspecting the Dismiss button")

In my case, the particular element that the Developer Tools navigated to is just the graphic on the button, not the button itself, but you can still find the actual button by hovering your mouse over different elements in the source and seeing what elements on page are highlighted:

![](assets/browser1_04_inspect.png "Inspecting the Dismiss button")

Our close button is a `<div>` element, whose `data-e2e` attribute is `"modal-close-inner-button"`. One way to find it would be using a CSS_SELECTOR, like so:

In [21]:
from selenium.webdriver.common.by import By

close_button = driver.find_element(By.CSS_SELECTOR, '[data-e2e="modal-close-inner-button"]')
close_button

<selenium.webdriver.remote.webelement.WebElement (session="0973a35169d61570b23b45906e5a5494", element="17ecfc45-186c-4f7b-91ae-d30a109d4de8")>

We seem to have found it, let's click it!

In [22]:
close_button.click()

# Step 4: Scrolling

We now have a browser instance open and displaying the For You Page. Let's scroll through the videos.
A human user would press the arrow down key on their keyboard. We will do that programmatically instead:

In [8]:
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys

def arrow_down(driver):
    """
    Sends the ARROW_DOWN key to a webdriver instance.
    """
    actions = ActionChains(driver)
    actions.send_keys(Keys.ARROW_DOWN)
    actions.perform()

In [10]:
# run this cell a few times to scroll down further
arrow_down(driver)

# Step 5: Finding tiktoks on the page

Now that the site loaded and you can browse it, let's find all the tiktoks that are displayed and extract the meta information from each of them.

1. Right click on the white space around a tiktok and choose "Inspect".

    ![Inspect Element](assets/browser1_05_inspect_tiktok_a.png)

1. Hover your mouse over the surrounding `<div>` elements and observe the highlighted elements on the page to see which ones correspond to each tiktok.

    ![Inspect Element](assets/browser1_05_inspect_tiktok_b.png)

1. You will see that each tiktok is in a separate `<div>` container but each of these containers has the same classes, in my case it's `tiktok-1nncbiz-DivItemContainer etvrc4k0`. Note that class names are separated by spaces, and the containers can be found by both `tiktok-1nncbiz-DivItemContainer` and `etvrc4k0`.
1. Let's start by writing a function that returns the list of all displayed tiktoks:

In [13]:
def get_videos(driver, class_name='etvrc4k0'):
    return driver.find_elements(By.CLASS_NAME, class_name)

In [14]:
get_videos(driver)

[<selenium.webdriver.remote.webelement.WebElement (session="a8f726353606bbff812815718da10cb1", element="f4790407-5eb5-4a1f-96fd-2a2a8a467b62")>,
 <selenium.webdriver.remote.webelement.WebElement (session="a8f726353606bbff812815718da10cb1", element="7cf72f4c-5bd8-41a0-a406-f8b444512fa2")>,
 <selenium.webdriver.remote.webelement.WebElement (session="a8f726353606bbff812815718da10cb1", element="7f51093f-53be-4db4-8736-eb69e362e3d5")>,
 <selenium.webdriver.remote.webelement.WebElement (session="a8f726353606bbff812815718da10cb1", element="2f69b160-755f-4487-a7dc-277c8ca7970e")>,
 <selenium.webdriver.remote.webelement.WebElement (session="a8f726353606bbff812815718da10cb1", element="5b9b0fba-8236-47d7-b3fd-0efefa65e05e")>,
 <selenium.webdriver.remote.webelement.WebElement (session="a8f726353606bbff812815718da10cb1", element="8d231c15-143d-4f4f-9ab9-043d3343363e")>,
 <selenium.webdriver.remote.webelement.WebElement (session="a8f726353606bbff812815718da10cb1", element="01a4d1e3-00af-4516-aa0f-4d

# Step 6: Parsing a tiktok
Now that we found all the tiktoks on the page, let's extract the description from each - this is how we will decide whether to watch this tiktok or to skip it.

1. Pick any description, right click, Inspect. 
1. Let's locate the `<div>` that contains the whole description (including any hashtags) and make a note of its class
1. Now let's write a function that given one tiktok extracts its description (note that you can get the text content of any element by calling `element.text`)

In [15]:
def get_description(item, class_name='ejg0rhn0'):
    """
    Given a div with a single video, extract its text description
    """
    return item.find_element(By.CLASS_NAME, class_name).text

In [16]:
tiktoks = get_videos(driver)
for tt in tiktoks: 
    print(get_description(tt))
    print()



You NEED to hear Adele sing the backing vocals of Rolling in the Deep 🤯
#adele #rollinginthedeep #carpoolkaraoke



#greenscreen Based on true events. #fyp #work #working #corporate #corporatelife #corporatetiktok #corporateamerica #corporatehumor #office #officelife #manager #managersbelike #career #quietquit #actyourwage #skit #funny #sketch #quietquitting #veronica

ASMR + pretty in pink = a sweetly satisfying SHEIN unboxing 👂💗 @joselyn.luna

#SHEIN #saveinstyle #SHEINbeauty #fyp

Cow Ain’t playing games 💀 @barstooloutdoors (via:@tampa.jitjay )

How many trends did u know?

BESTIE!!!!💘





# Step 7: Putting it together
We now can read the descriptions of tiktok and move between them. 
That's most of the setup we need to try a very simple measurement - let's watch all tiktoks that mention commedy in their description and skip all those that don't. After a few hundred, we will see whether there are more commedy videos over time.

In [17]:
# if the description has these words, we will watch the video
keywords = ['comedy', 'standup', 'comic', 'joke', 'stand-up', 'crowdwork', 'funny', 'improv', 'humor']



In [23]:
import time

# open a new window and go to tiktok
driver = new_window()
driver.get('https://tiktok.com')

In [24]:
# close the login modal
driver.find_element(By.CSS_SELECTOR, '[data-e2e="modal-close-inner-button"]').click()

In [None]:
for tiktok_index in range(0, 100):
    
    # get all 
    tiktoks = get_videos(driver)
    
    # the current tiktok 
    current_tt = tiktoks[tiktok_index]
    
    # read its description
    description = get_description(current_tt)
    
    # assume we're not watching it
    decision = False
    
    # check in any of the keywords is in the description
    for keyword in keywords:
        if keyword in description:
            # we have a video of interest, let's watch it for 30 seconds
            decision = True
            break
            
    print(tiktok_index, decision, description )
    
    if decision:
        # we have a video of interest, let's watch it for 30 seconds
        time.sleep(30)
    
    # move to the next video and allow some time to make sure the scrolling happened
    arrow_down(driver)
    time.sleep(2)
    
    
    

## Summary
And with this simple proof of concept tiktok bot we conclude our very first lesson on browser automation! 
There is plenty left to do before we can reliably use this for an audit, but it is a great start. 