-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
226 lines (193 loc) · 13 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
<!DOCTYPE html>
<html>
<head>
<meta charset='utf-8'>
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<meta name="description" content="Super-bowl-50-twitter-statistics : Some statistics taken using Twitter StreamingAPI to see Twitter user reactions to Super Bowl events">
<!-- begin CSS -->
<link rel="stylesheet" type="text/css" media="screen" href="stylesheets/stylesheet.css">
<link href="stylesheets/style.css" type="text/css" rel="stylesheet">
<link rel="stylesheet" href="jquery/sss/sss.css" type="text/css" media="all">
<!-- end CSS -->
<title>Super-bowl-50-twitter-statistics</title>
</head>
<body>
<!-- HEADER -->
<div id="header_wrap" class="outer">
<header class="inner">
<a id="forkme_banner" href="https://github.com/wyattdunn46/Super-Bowl-50-Twitter-Statistics">View on GitHub</a>
<h1 id="project_title">Super-bowl-50-twitter-statistics</h1>
<h2 id="project_tagline">Some statistics taken using Twitter StreamingAPI to see Twitter user reactions to Super Bowl events</h2>
<section id="downloads">
<a class="zip_download_link" href="https://github.com/wyattdunn46/Super-Bowl-50-Twitter-Statistics/zipball/master">Download this project as a .zip file</a>
<a class="tar_download_link" href="https://github.com/wyattdunn46/Super-Bowl-50-Twitter-Statistics/tarball/master">Download this project as a tar.gz file</a>
</section>
</header>
</div>
<!-- begin container -->
<div id="container" style="width: 600px; margin: 280px auto 0;">
<!-- begin navigation -->
<nav id="navigation">
<ul id="navbar">
<li><a href="index.html">Home</a></li>
<li><a href="data collection.html">Data Collection</a></li>
<li><a href="data wrangling.html">Data Wrangling</a></li>
<li><a href="creating figures.html">Creating Figures in R</a></li>
</ul>
</nav>
<!-- end navigation -->
</div>
<!-- end container -->
<!-- MAIN CONTENT -->
<div id="main_content_wrap" class="outer">
<section id="main_content" class="inner">
<h3 id="names">Paul Middendorf<br>
Wyatt Dunn</h3>
<h3 id="date">04/13/16</h3>
<h2>Project Overview</h2>
<p>We are studying Twitter. Twitter is a social network that allows users to send “tweets”. Tweets are visible to the public and are limited to 140 characters. We are specifically interested in the tweets, especially the content, location of the tweets and the time when they were posted. Twitter has 289 million active users[1]. These users send about 500 million tweets a day[2]. This massive amount of tweets is like a “stream of consciousness” for the general public. So if one was to take the general opinion or reaction of twitter as a whole in response to a certain event, they could generalize that as representative of the opinion of the public at large.</p>
<br>
<figure>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Which <a href="https://twitter.com/hashtag/commercial?src=hash">#commercial</a> will cause a controversy tonight? Guess correctly and you'll win a prize <a href="https://twitter.com/hashtag/SuperBowl?src=hash">#SuperBowl</a> <a href="https://twitter.com/hashtag/SB50?src=hash">#SB50</a></p>— Tuna Palace (@TheTunaPalace) <a href="https://twitter.com/TheTunaPalace/status/696475321399382018">February 7, 2016</a></blockquote>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>This is an example tweet</p>
</figure>
<br>
<p>This easily accessible large dataset that effectively represents so well the general opinion lead us to wonder if we could draw some conclusions about how people react to popular events. In the past this would have been almost impossible to do, but the information available through Twitter makes it possible to get an understanding of how people feel about current events. The datasets are actually so large, that we can even gain an understanding of the mechanics of how people talk about events, how opinions spread, and how long it takes for people to form reactions.</p>
<p>With 500 million tweets in a day there is just far too much data for the scope of this project. It isn’t reasonable for us to try and analyze data about everything that every user is tweeting about. So, we decided to limit our data to only tweets that have some relation to Super Bowl 50. Super Bowl 50 was a massive cultural event, with over 100 million viewers. So, even though this limited the amount of tweets we collected there was still enough people talking about it to allow us to collect enough data to perform an analysis on. Super Bowl 50 can then be thought of as a model that we can then relate to other popular current events such as political debates, large concerts, sports, and eSports.</p>
<p>Twitter publishes a Streaming API that allows anyone with a twitter account to collect tweets in real time. We used a python script called Tweepy to access the twitter API. We collected tweets based on lists of keywords. The categories we used to separate the tweets are commercials, referees, big plays, and the halftime show.</p>
<h3>Example of a tweet in JSON format</h3>
<figure id="json_figure">
<pre>
<code>
{
"created_at":"Mon Feb 08 00:34:42 +0000 2016",
"id":696492311996313600,
"id_str":"696492311996313600",
"text":"RT @mgafni: Lots of people lined up outside Levi's Stadium for halftime show when they can go on field; no Left Shark #SB50 https:\/\/t.co\/te\u2026",
"source":"\u003ca href=\"https:\/\/about.twitter.com\/products\/tweetdeck\" rel=\"nofollow\"\u003eTweetDeck\u003c\/a\u003e",
"truncated":false,
"in_reply_to_status_id":null,
"in_reply_to_status_id_str":null,
"in_reply_to_user_id":null,
"in_reply_to_user_id_str":null,
"in_reply_to_screen_name":null,
"user":{
"id":68433924,
"id_str":"68433924",
"name":"Robert Salonga",
"screen_name":"robertsalonga",
"location":"San Jose, Oakland",
"url":"http:\/\/mercurynews.com",
"description":"Crime & Public Safety, San Jose Mercury News (@mercnews). Bruin\/Terp alum. Middling triathlete. Just the tips: rsalonga@mercurynews.com",
"protected":false,
"verified":true,
"followers_count":2924,
"friends_count":460,
"listed_count":180,
"favourites_count":127,
"statuses_count":18075,
"created_at":"Mon Aug 24 15:38:19 +0000 2009",
"utc_offset":-28800,
"time_zone":"Pacific Time (US & Canada)",
"geo_enabled":true,
"lang":"en",
"contributors_enabled":false,
"is_translator":false,
"profile_background_color":"C0DEED",
"profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png",
"profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png",
"profile_background_tile":false,
"profile_link_color":"0084B4",
"profile_sidebar_border_color":"C0DEED",
"profile_sidebar_fill_color":"DDEEF6",
"profile_text_color":"333333",
"profile_use_background_image":true,
"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/509219502232330240\/MXxXjb2S_normal.jpeg",
"profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/509219502232330240\/MXxXjb2S_normal.jpeg",
"profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/68433924\/1431536609",
"default_profile":true,
"default_profile_image":false,
"following":null,
"follow_request_sent":null,
"notifications":null
},
"geo":null,
"coordinates":null,
"place":null,
"contributors":null,
"favorited":false,
"retweeted":false,
"possibly_sensitive":false,
"filter_level":"low",
"lang":"en",
"timestamp_ms":"1454891682038"
}
</code>
</pre>
<p>This is what a tweet looks like as it comes from the Streaming API</p>
</figure>
<br>For more information on our data collection please see our <a href="data collection.html">data collection page.</a>
<p>As you can see every tweet has a wealth of data associated with it. We are mainly interested in the “text”, “created_at”, and “location” fields. Due to the nature of the data collection the tweets are spread across a large number of separate JSON files. To solve this we have a python script that reads every JSON file in the directory and combines them into a single file that contains the entire dataset. Then we have another script that writes the tweets to a .csv file containing only the fields we are interested in. This format makes the data easier to work with in R.</p>
Please <a href="data wrangling.html">click here</a> to see more detail on how we wrangled this data.
<br><br><br>
<h2>Data Analysis</h2>
<p>So, now we have data. But what are we going to do with it? We have a few hypotheses that we can explore using this data. Firstly, it is intuitive to think that twitter trends will show a reaction to real life events. In this case, a trend is the frequency of tweets about a topic over time. This means that we expect to see a spike in how many and how often people are tweeting about topics related to an event immediately after it happens. We also have a working theory about how these trends will be shaped. We anticipate a rapid and sharp spike immediately following the event, then a slow decay in popularity of related topics in the time after the event. These trends should follow a fairly consistent pattern. We expect the trends that correspond to most events to match the shape described above.</p>
<h3>Halftime Show Data</h3>
<div class="slider">
<img src="images/halftime frequency.png" />
<img src="images/coldplay frequency.png" />
<img src="images/beyonce frequency.png" />
<img src="images/bruno mars frequency.png" />
</div>
<p>Data relating to tweets referencing the halftime show</p>
<br><br>
<h3>Commerical Data</h3>
<div class="slider">
<img src="images/CommercialsPlot.png" />
<img src="images/AudiPlot.png" />
<img src="images/BudweiserPlot.png" />
<img src="images/DoritosPlot.png" />
</div>
<p>Data relating to tweets referencing commercials</p>
<br><br>
<h3>Plays Data</h3>
<div class="slider">
<img src="images/plays frequency.png" />
<img src="images/fumble frequency.png" />
<img src="images/interception frequency.png" />
<img src="images/sack frequency.png" />
</div>
<p>Data relating to tweets referencing football plays</p>
<br><br>
<h3>Referee Data</h3>
<div class="slider">
<img src="images/RefereesPlot.png" />
<img src="images/GoodCallPlot.png" />
<img src="images/BadCallPlot.png" />
<img src="images/InterferencePlot.png" />
</div>
<p>Data relating to tweets referencing the referees</p>
<br><br>
<p>There is also the content of people’s reactions and opinions, ie what are people actually saying, not just how they are saying it. Any Hypothesis we could come up with about this would be dependent on the exact event and the context in which the event happened. Therefore, our general hypothesis is that the general reaction to an event is context sensitive. This makes logical sense, seeing as how nothing really happens in a vacuum. Even the frequency of an event, or if the event has happened before at all could affect the nature of twitter users’ reactions and opinions. Take for instance a turnover in a football game. If the score of the game is ten to ten and a turnover occurs, it is extremely significant to the game and its viewers. We would expect to see a lot of tweets about it, and the content of those tweets to be very emotional and excited. However, if the game is thirty-five to nothing, we would expect the content of the tweets to be more subdued.</p>
<p>All of the code we used for this analysis is available on our github. Please <a href="creating figures.html">click here</a> for an explanation of how we made these figures.</p>
</section>
</div>
<!-- FOOTER -->
<div id="footer_wrap" class="outer">
<footer class="inner">
<p class="copyright">Super-bowl-50-twitter-statistics maintained by <a href="https://github.com/wyattdunn46">wyattdunn46</a></p>
<p>Published with <a href="https://pages.github.com">GitHub Pages</a></p>
</footer>
</div>
<script type="text/javascript" src="jquery/jquery-1.12.3.js"></script>
<script src="jquery/sss/sss.min.js"></script>
<script>
jQuery(function($) {
$('.slider').sss({
slideShow: false
});
});
</script>
</body>
</html>