Skip to content
Switch branches/tags
This branch is 70 commits ahead of pablobarbera:master.
Fetch upstream

Latest commit


Git stats


Failed to load latest commit information.

Workshop: Twitter Text Analytics for R

Sponsored by

July 27, 2016


  • Ryan Wesslen

Original materials are by Pablo Barberá, sponsored by Quantitative Methods Working Group, European University Institute. Please see the forked original workshop materials. Thanks to Pablo for allowing us to use his materials!

Additional content (via Pablo Barberá) were based on materials prepared by Dan Cervone, Alex Hanna, Ken Benoit, Paul Nulty, Kevin Munger, and Justin Grimmer.

For Project Mosaic's workshop, I've created new challenges for each module using Charlotte Twitter datasets.

The material below has been modified from the original workshop.


The popularity of text as data is increasing rapidly within the social sciences. “Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research” (Grimmer and Stewart 2013) and elsewhere in the social sciences. This situation has changed with increasing computing power and more capable computing tools. In the coming years, the relevance of text data will further increase as more and more human communication is recorded online.

This workshop provides an introduction to text analysis using R, focusing on Twitter datasets. We will cover methods to conduct quantitative analysis of textual and web data applied to the study of social science questions. The workshop is made up of three "modules", each consisting of an introduction to a topic followed by examples and applications using R. The first module will cover how to format and input source texts, how to prepare the data for analysis, and how to extract descriptive statistics. The second module will discuss automated classification of text sources into categories using dictionary methods and supervised learning. Finally, the third module will discuss unsupervised classification of text into categories using topic modeling.

Setup and Preparation

You will need R and RStudio installed. Follow the instructions here to install both.

Instructions for using course materials on GitHub

You have three options for downloading the course material found on this page:

  1. Most simply, you can choose the button on the right marked "Download zip" which will download the entire repository as a zip file.

  2. You can "clone" repository, using the buttons found to the right side of your browser window as you view this repository. This is the button labelled "Clone in Desktop". If you do not have a git client installed on your system, you will need to get one here and also to make sure that git is installed. This is preferred, since you can refresh your clone as new content gets pushed to the course repository. (And new material will get actively pushed to the course repository while this course takes place.)

You can also subscribe to the repository if you have a GitHub account, which will send you updates each time new changes are pushed to the repository.

Schedule for July 27 Project Mosaic Workshop

Time Topic
9:00 -10:00 Optional: Introduction to R
10:00-10:30 Introduction to Twitter Analytics
10:30-11:15 Descriptive analysis, regular expressions
11:15-11:30 Dictionary methods
11:30-12:00 Challenge I
12:00- 1:00 Lunch Break
1:00- 2:00 Supervised methods
2:00- 2:30 Challenge II
2:30- 2:50 Coffee Break
2:50- 3:30 Unsupervised methods
3:30- 4:00 Challenge III
4:00- 5:00 Working Session: Data (e.g. Public API) and Research Ideas


Project Mosaic: Twitter Text Analytics with R




No releases published


No packages published