Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Twitter vaccine sentiment project
- Project lead: Benjamin Brooks, UW Institute for Health Metrics and Evaluation
- Advisor: Abie Flaxman, UW Institute for Health Metrics and Evaluation
- eScience Liaison: Andrew Whitaker, UW eScience Institute
There has been considerable attention given to the potential for search engine and social media data to provide real time information regarding public health threats; this idea is well known in the context of influenza. Public opinion concerning vaccination is of interest since the publication of a study in 1999 (now discredited) linking the measles, mumps, and rubella (MMR) vaccine to autism; in its wake, parental fear of vaccination has risen, vaccination rates have decreased, and occurrence of outbreaks of vaccine-preventable diseases have increased. Relative to other applications of social media data in public health, the study of anti-vaccination sentiments is particularly appropriate given that individuals are often opinionated on the topic and might be expected to share such opinions publicly.
We are interested in using Twitter data as a means of monitoring general anti-vaccination sentiment. In particular, we hypothesize that opinions shared on Twitter regarding vaccination provide insights into where geographic clusters of anti-vaccination sentiment exist, and, consequently, where children are not immunized and outbreaks might be expected. A study published in 2011 used a series of keywords to identify and collect Twitter data related to vaccination over a six month period after the H1N1 (“swine flu”) vaccine became available to the public. The researchers developed a classifier by compiling a training dataset where students tagged tweets as containing positive, negative, or neutral sentiment toward the vaccine for about 10% of their data; this classifier was then used to categorize the rest of the tweets into one of the three bins.
While this study showed that users with anti-vaccination opinions tended to cluster within the social network, it only used a crude measure to validate whether those opinion manifested themselves in measurable outcomes of public health concern. They used geographic information associated with individual Twitter accounts to compare the average “sentiment ratings” of different regions of the US to H1N1 vaccination rates and found a reasonably strong positive correlation (i.e., more positive sentiment, higher vaccination coverage). Our goal is to extend this work by examining whether these clusters can be linked to particular geographic areas at the state or, preferably, sub-state level and whether those areas have experienced outbreaks of vaccine-preventable disease since the original link between autism and the MMR vaccine was published.
We tested this hypothesis by combining vaccination-related Twitter data with data published through the National Notifiable Disease Surveillance System, which provides weekly case counts of newly diagnosed cases of key infectious diseases (including those that are preventable through vaccine) for each state . In the process of working towards this goal, we tested several different sentiment classification methods, collected a new body of vaccination-related Twitter data from 2014, and examined whether the average sentiment expressed on Twitter in 2009 during the H1N1 pandemic was similar to the average sentiment in the same geographic areas in 2014.
We compared the average sentiment observed by users in each state in 2010 to the H1N1 vaccination rate. In this case, we expect a positive correlation; that is, as average sentiment score increases, the H1N1 vaccination rate should likewise increase. Indeed, we observed a slight positive correlation, as indicated in the plot below.
We extended our analysis to include more recent tweets and a broader set of diseases. We compared the average sentiment observed by users in each state in 2014 to the mumps incidence rate per 100,000 for all cases observed over the period 2009-2013. In this case, we would expect a negative correlation; that is, as average sentiment score increases, the mumps incidence rate decreases. However, that relationship was not observed in our data; in fact, it appears that some of the states with the highest average sentiment scores also had the highest mumps incidence rates.