Skip to content

ucdavis-noyce/YouTube-Sock-Puppet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YouTube-Sock-Puppet

This is the sock puppet implementation for our work on auditing YouTube's recommendations. Read more about it here.

We used Docker to scale and parallelize the collection of sock puppet data. A sock puppet was initialized and executed in its own independent Docker container. To interact with YouTube, we developed our own selenium-based driver which can be found here.

Getting Started

For starters, you can just run python docker-api.py --build --run --max-containers 5. This will perform the following steps:

  1. Build the docker image.
  2. Load training and testing videos from the data directory.
  3. Create five sock puppets for the five ideology categories and assign them training and testing videos. Once created, these sock puppet parameters can be found in the arguments directory.
  4. Start docker containers for the five sock puppets. These sock puppets will run in parallel immediately and write their output to the output/puppets directory once finished.

There are two main files in this repository.

  • docker-api.py: Provides an API for running Docker commands and creating/running sock puppet containers. The script generates arguments for the sock puppets as well.
  • sockpuppet.py: The actual sock puppet implementation that runs inside each container. This is the code that runs for each individual sock puppet.

docker-api.py

Inside docker-api.py, you'll find some parameters and arguments. Here's an overview of what they do.

Parameters

  • IMAGE_NAME: The tag for the docker image to build. Use this to identify running containers later on.
  • OUTPUT_DIR: The output directory for the sock puppets.
  • ARGS_DIR: The directory where this script will generate arguments for the sock puppets.
  • NUM_TRAINING_VIDEOS: The number of training videos for each sock puppet.
  • WATCH_DURATION: The time spent watching each video by a sock puppet.
  • USERNAME: Username for the docker container to run under. In most cases, leave this as-is.

Arguments

The following command-line arguments can be specified for the script.

usage: docker-api.py [-h] [--build] [--run] [--simulate] [--max-containers MAX_CONTAINERS] [--sleep-duration SLEEP_DURATION] [--training-videos TRAINING_VIDEOS] [--testing-videos TESTING_VIDEOS]

optional arguments:
  -h, --help            show this help message and exit
  --build               Build docker image
  --run                 Run all docker containers
  --simulate            Only generate arguments but do not start containers
  --max-containers MAX_CONTAINERS
                        Maximum number of concurrent containers
  --sleep-duration SLEEP_DURATION
                        Time to sleep (in seconds) when max containers are reached and before spawning additional containers
  --training-videos TRAINING_VIDEOS
                        CSV file to read training videos from
  --testing-videos TESTING_VIDEOS
                        CSV file to read testing videos from

sockpuppet.py

This script performs the actual data collection/interaction with YouTube. It takes as an argument the path to arguments file which contains the actual sock puppet arguments generated by docker-api.py. After performing the defined steps, it writes the final output to the output directory containing the training and testing videos, the homepage and the up-next recommendations. The output is a JSON file which contains the input arguments, timestamps, and action-param pairs for the sock puppet. Here, actions refer to the steps performed by the sock puppet and the result of that action is its result (or param). These actions, in order, can be accessed by the actions array in the JSON file and generally consist of the following:

  • get_homepage: The sock puppet goes to the YouTube homepage.
    • Params: List of video IDs of the homepage recommendations.
  • training_start: Marks the start of training.
  • watch: The sock puppet watches a video.
    • Params: The video ID of the watched video.
  • training_end: Marks the end of training.
  • testing_start: Marks the start of testing.
  • get_recommendations: The sock puppet collects up-next recommendations.
    • Params: List of video IDs of the up-next recommendations.
  • testing_end: Marks the end of testing.

The actions array is sorted by the order of the action performed. You can parse the output by iterating over the actions for each sock puppet. The script is modular enough to define a custom set of actions as well.

Help with Docker

  • To verify that the containers are running successfully, run docker container ls to see running containers.
  • To view the logs for a particular container, run docker container logs -f <container-id> where <container-id> can be obtained from the docker container ls command.

Acknowledgements

This tool was developed as part of an effort by researchers at UC Davis to audit the recommendations on YouTube. Read more about it here.

The primary maintainer is Muhammad Haroon.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published