No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Semi-supervised sentiment analysis of the R-help mailing list

Author: Trey Causey with assistance from Erin Gengo.

This is a project to measure if the R-help mailing list, perceived as (in)famously rough toward its treatment of novices, has become "meaner" over time. Data were scraped from the R-help archives using Scrapy and analyzed in R using the ReadMe package. I have not made the data publicly available a) because the archives are already public, b) to conserve space, and c) because you can run the scraper and collect it yourself.

You can find the writeup of my findings at All findings should be interpreted with humor.

About the files

rhelp.R:The main R code for cleaning the data and calculating sentiment distribution. rhelp_analysis.R: Additional code for analysis and plots. rhelp/: The code for the webscraper. You must have Python (not 3) and the Scrapy module.

The coding scheme for each email is as follows:

-2: Negative and not helpful
-1: Negative but helpful
 0: No obvious valence/a request for additional information
 1: Positive/helpful
 2: Not a response/is a question/other