Skip to content

This project uses RSelenium webscraping, Google Colab, and OpenAI’s Whisper Transcription to analyze sentiment in US political ads on Google platforms. The goal is to explore the difference between Trump and Biden ads using GPT-3.5 Turbo as the sentiment analyzer.

Notifications You must be signed in to change notification settings

tobiasuruali/WebScraper_PolAds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Political Ad Webscraping Sentiment Analysis

This project analyzes a dataset containing all political ads that have run on Google platforms in the United States since May 2018. The goal is to explore the sentiment difference between Trump and Biden advertisements using GPT-3.5 Turbo as our sentiment analyzer.

Instructions

To run this project, follow these steps:

  1. Set up Docker by following the instructions here.
  2. Run the Rselenium docker image with this command:
docker run -d -p 4445:4444 -p 5901:5900 --shm-size="2g" selenium/standalone-firefox:4.8.3-20230403

This command runs a Docker container with the selenium/standalone-firefox:4.8.3-20230403 image in detached mode. Port 4444 in the container (the Selenium port) is mapped to port 4445 on the host machine. Port 5900 in the container is mapped to port 5901 on the host machine, allowing remote access with a VNC viewer. The shared memory size for the container is set to 2GB.

  1. Run the scripts in the following order:

    • 01_prepare_dataset_4_url_scrape.R: This script prepares the dataset for URL scraping.
    • 02_scrape_automation_video_links.R: This script uses RSelenium to scrape the youtube URLs from a dynamic website.
    • 03_trump_biden_subset.R: This script selects ads from advertisers that have either "Trump" or "Biden" in their name and randomly selects 25 ads from each group.
    • 04_download_and_transcribe.ipynb: This script uses Google Colab and the package yt-dlp to download just the audio of each youtube video and transcribes it using OpenAI's Whisper Transcription model.
    • 05_gpt_sentiment_analysis.ipynb: This script uses GPT-3.5 Turbo to perform sentiment analysis on the transcribed text.
  2. For scripts 04_download_and_transcribe.ipynb and 05_gpt_sentiment_analysis.ipynb, you will need to create a Google Drive folder called Google_Pol_Ads. After that, the folder structure should be the same as this project, with data/data_processed and data/data_raw folders.

About

This project uses RSelenium webscraping, Google Colab, and OpenAI’s Whisper Transcription to analyze sentiment in US political ads on Google platforms. The goal is to explore the difference between Trump and Biden ads using GPT-3.5 Turbo as the sentiment analyzer.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published