Semi-supervised sentiment analysis of the R-help mailing list
Author: Trey Causey with assistance from Erin Gengo.
This is a project to measure if the R-help mailing list, perceived as (in)famously rough toward its treatment of novices, has become "meaner" over time. Data were scraped from the R-help archives using Scrapy and analyzed in R using the ReadMe package. I have not made the data publicly available a) because the archives are already public, b) to conserve space, and c) because you can run the scraper and collect it yourself.
You can find the writeup of my findings at badhessian.org. All findings should be interpreted with humor.
About the files
rhelp.R:The main R code for cleaning the data and calculating sentiment distribution.
rhelp_analysis.R: Additional code for analysis and plots.
rhelp/: The code for the webscraper. You must have Python (not 3) and the
The coding scheme for each email is as follows:
-2: Negative and not helpful -1: Negative but helpful 0: No obvious valence/a request for additional information 1: Positive/helpful 2: Not a response/is a question/other