Political Twitter Images Project Summary and Goals
How do outsider political groups use social media to mobilize supporters online? What types of social media techniques, messages and images are most likely to capture attention and motivate action? Prior research demonstrates the people are more responsive to visual cues than text. We test whether images impact message sharing and followership, and if so, which types of images are the most effective at mobilizing supporters.
To begin to address these questions, we are tracking the Twitter posts of roughly 1,300 public affairs organizations (obtained from the Encyclopedia of Associations), national and state politicians (including every member of the 115th Congress), and news organizations. For each tracked account, we are streaming all tweets, collecting any accompanying images or videos, and periodically collecting account data (e.g., the number of account followers). We are also streaming tweets for every hashtag that one of these organizations uses more than once, with some standard exclusions. For example, if any organization uses the hashtag #LasVegasShooting more than once, we automatically start collecting the entire stream of #LasVegasShooting tweets by all organizations and individuals.
One purpose of the methodology is to capture social mobilization efforts in their early stages – something we could not do if we were to focus on known successful cases. There are many potential questions that could be addressed with the data, however. The challenge, from a data management perspective, is that these overlapping processes are producing a large quantity of data. The Twitter data collection is ongoing, and we will soon embark on a secondary stage of data collection, hiring annotators on Mechanical Turk to provide labels for each collected image (for example, we will ask how much sadness a respondent feels after looking at a given image). Currently all of the data is stored in AWS S3 buckets.
At the incubator, our primary goal is to build an infrastructure for our own research that is ultimately accessible to a broader audience of scholars, in particular scholars studying the social media activities of political organizations and political mobilization via social media. With a massive amount of data stored in AWS, how can we best integrate disparate data types for own analyses?
We would like, for example, a way to quickly access data in response to sample queries such as:
- Tweets from organizations classified by the Encyclopedia of Associations as "environmental" groups.
- Tweets and their accompanying images from a specified date range.
- Tweets and images from groups with more than 5,000 followers.
- Tweets retweeted less than 5 times.
- Images included on tweets that use a specific hashtag.
We do also have secondary objectives that we could potentially begin work on during the incubator. The first is to efficiently collect and incorporate the image annotations from Mechanical Turk. The second is to begin applying existing Computer Vision methods to analyze the collected images (for example, automatically counting the number of people in the image, reading any text on the image, reading facial expressions, etc.).
The primary measure of success with the creation of a working, easily queryable infrastructure. Currently, we think of this as a two-part process required both back-end and front-end development. For the back-end, we will work to format the existing data and all future data into an appropriate database format. For the front-end, we will build a simple website.
We currently have a series of initial deliverables due dates planned:
- On Tuesday, January 23 we expect to have a front-end (simple website) built. [Week 4]
- On Tuesday, February 6 we expect to have the back-end completed, at least for a subset of the data. [Week 6]
- On Tuesday, February 13 we expect to have the back-end and front-ends connected. [Week 7]
Assuming we meet these deadlines, that will allow us to experiment with Mechanical Turk and some initial analyses for the remaining weeks.