# New Tutorial on TikTok Scraping

This tutorial does the following:

1. Connects to an existing (open) Chrome instance. **[Part 1](#sec1)**
2. It shows how we can get videos from a TikTok account page. **[Part 2](#sec2)**
3. It scrolls to get a certain number of videos. **[Part 3](#sec3)**

<a id="sec1"></a>
## Part 1: Create Chrome Instance

**Important:** For this to work, you should already have the Google instance running on your computer. To do that, open a console and run the command for your browser (see below).


**On Mac:**
```
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir="/tmp/chrome_dev_test"
```

**On Windows:**

```
C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222 --user-data-dir="C:\selenium\ChromeTestProfile
```

**New installation**

If you don't have the following package, install it once.

In [1]:
pip install webdriver_manager

Collecting webdriver_manager
  Downloading webdriver_manager-4.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting python-dotenv (from webdriver_manager)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Downloading webdriver_manager-4.0.1-py2.py3-none-any.whl (27 kB)
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv, webdriver_manager
Successfully installed python-dotenv-1.0.1 webdriver_manager-4.0.1
Note: you may need to restart the kernel to use updated packages.


Now you are ready to run the code below:

In [2]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import time

# Set up Chrome options
options = Options()
options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")

# Path to your ChromeDriver
service = Service(ChromeDriverManager().install())

# Connect to the existing Chrome browser session
driver = webdriver.Chrome(service=service, options=options)

# Interact with the existing browser session
driver.get('http://www.tiktok.com/@nytimes')

In [3]:
driver.title

''

<a id="sec2"></a>
## Part 2: Getting Videos from a TikTok Page


I will be using class names to find some HTML elements that are useful for the scraping. These are:

```
CONTAINER_CLASS = "eegew6e2"
VIDEO_CLASS = "e19c29qe8"
DESC_CLASS = "eih2qak4"
```

I am creating variables for them, so that if these classes change, we can plug in here the new class names.

In [4]:
CONTAINER_CLASS = "eegew6e2" 
VIDEO_CLASS = "e19c29qe8"
DESC_CLASS = "eih2qak4"

Here is a function that will get the posts (both URLs and descriptions of each video):

In [5]:
def getVideosAndDescriptions(driver):
    """Given an open driver instance on a TikTok account page, 
    get the list of video URLs that are accessible.
    """
    time.sleep(2) # in case the page hasn't loaded yet

    # Get the container of the videos
    try:
        container = driver.find_element(By.CLASS_NAME, "eegew6e2")
    except Exception as e:
        print(f"Container: An unexpected error occurred: {e}")
        return []

    # Get the video elements
    try:
        posts = container.find_elements(By.CLASS_NAME, VIDEO_CLASS)
    except Exception as e:
        print(f"Post: An unexpected error occurred: {e}")
        return []

    # Get the URLs of the videos
    try:
        urls = [post.find_element(By.TAG_NAME, "a").get_attribute('href') for post in posts]
    except Exception as e:
        print(f"URL: An unexpected error occurred: {e}")
        return []

    # Get the description of each post. Since some of them don't have one, we'll add an empty string
    descriptions = []
    for post in posts:
        try:
            desc = post.find_element(By.CLASS_NAME, DESC_CLASS).text
            descriptions.append(desc)
        except:
            descriptions.append('')

    # Combine together urls and descriptions
    return list(zip(urls, descriptions))

Now let's try this out with the videos from a famous account

In [6]:
url = "https://www.tiktok.com/@taylorswift"
driver.get(url)
posts = getVideosAndDescriptions(driver)
len(posts)

Container: An unexpected error occurred: Message: no such element: Unable to locate element: {"method":"css selector","selector":".eegew6e2"}
  (Session info: chrome=123.0.6312.124); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
0   chromedriver                        0x0000000102c104a4 chromedriver + 4326564
1   chromedriver                        0x0000000102c0896c chromedriver + 4295020
2   chromedriver                        0x0000000102834088 chromedriver + 278664
3   chromedriver                        0x0000000102876a80 chromedriver + 551552
4   chromedriver                        0x00000001028af4f8 chromedriver + 783608
5   chromedriver                        0x000000010286b4e4 chromedriver + 505060
6   chromedriver                        0x000000010286bf5c chromedriver + 507740
7   chromedriver                        0x0000000102bd3a40 chromedriver + 4078144
8   chrom

0

Show a few posts:

In [7]:
posts[:3]

[]

**Note:** By default, when visiting the page of a Tiktok account, we only get the first 35 posts. If we want more, we need to scroll down.

<a id="sec3"></a>
## Part 3: Scrolling down the page

We show how we can scroll for a few times:

In [8]:
# We press the arrow_down key every 1/10 of a second

actions = ActionChains(driver)

for i in range(50):
    actions.send_keys(Keys.ARROW_DOWN)
    actions.perform()
    time.sleep(0.1)

We can call the function to get the posts:

In [9]:
posts = getVideosAndDescriptions(driver)
len(posts)

35

As we can see, by scrolling down for 50 keys, our document went from 35 posts to 104 posts. 

My tests have shown that when scrolling, the posts don't disappear from the DOM, once they have seen, they remain there. Thus, we can scroll for a while and then stop and save all the posts.

In [12]:
accounts_complete = ['kirstengillibrand', 'teampattymurray', 'repkatieporter', 'repwilson', 'nikemawilliams', 'reppressley', 'rashidatlaib',
           'ilhanmn', 'repstansbury', 'aoc', 'repshontelbrown', 'repchrissyhoulahan', 'repsummerlee', 'sheilaforhouston', 'marieforcongress',
           'congresswomanjayapal']

accounts = ['reppressley']

for acc in accounts:
    url = f"https://www.tiktok.com/@{acc}"
    driver.get(url)

    # Scroll down for a while to load posts
    for i in range(200):
        actions.send_keys(Keys.ARROW_DOWN)
        actions.perform()
        time.sleep(0.1)

    posts = getVideosAndDescriptions(driver)
    print(acc, len(posts))

    # Save in different files
    import json
    with open(f"{acc}.json", 'w') as fout:
        json.dump(posts, fout)

reppressley 28
