In [27]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import cPickle as pickle
import os
import nltk
import tweepy
import time

from collections import Counter

%matplotlib inline

pd.options.display.max_colwidth = 300

To obtain the tweets for this analysis, I will use <a href="http://www.tweepy.org/">Tweepy</a>, a Python package that can be used to connect to Twitter's API. Below, I set up the OAuth and access tokens so that I can use Tweepy to connect to the API.

In [None]:
auth = tweepy.OAuthHandler(os.environ['CONSUMER_KEY'], os.environ['CONSUMER_SECRET'])
auth.set_access_token(os.environ['ACCESS_TOKEN'], os.environ['ACCESS_TOKEN_SECRET'])

api = tweepy.API(auth)

Now, I want to search Twitter for tweets that contain the hashtag "#NoBillNoBreak". I  want the maximum number of tweets returned per page, so I set count to 100. I also limit the language to English, to simplify the analysis. I add 900 as a parameter to the pages method, so this will stop the data collection when 900 pages of tweets have been collected. Lastly, I want to avoid Twitter's rate limit errors, so I add time.sleep(5) to wait 5 seconds before calling the next page (see Twitter's <a href="https://dev.twitter.com/rest/public/rate-limits">Docs</a> for more information).

In [None]:
tweets_list = []

for page in tweepy.Cursor(api.search, q="#NoBillNoBreak", count=100, lng="en",
                          ).pages(900):
    for tweet in page:
        tweets_list.append(tweet._json)
    time.sleep(5)

Now, I want to extract the information I want from the tweets in tweets_list, and convert them into a pandas DataFrame.

In [4]:
df = pd.DataFrame()
df['id'] = [tweet['id'] for tweet in tweets_list]
df['user_name'] = [tweet['user']['name'] for tweet in tweets_list]
df['screen_name'] = [tweet['user']['screen_name'] for tweet in tweets_list]
df['text'] = [tweet['text'] for tweet in tweets_list]
df['retweet_count'] = [tweet['retweet_count'] for tweet in tweets_list]
df['created_at'] = [tweet['created_at'] for tweet in tweets_list]

Below is a sample of the tweet data.

In [6]:
df.head()

Unnamed: 0,id,user_name,screen_name,text,retweet_count,created_at
0,746132412019654656,♛ Gaga Bieber Fan ♛,GagaBieberMania,RT @ladygaga: Thank you so much to the women &amp; men on the house floor who are protesting to save American lives ☕️ #NoBillNoBreak https://t…,5489,Fri Jun 24 00:06:44 +0000 2016
1,746132407439396864,cass,CassieMichaud,"RT @WhiteHouse: ""If we’re going to raise our kids in a safer, more loving world, we need to speak up for it."" —@POTUS #NoBillNoBreak https:…",13582,Fri Jun 24 00:06:43 +0000 2016
2,746132407204515840,(((Jenn Jacques))),JennJacques,"RT @bob_owens: Oh, Yee of little faith.\n\n#NoBillNoBreak https://t.co/PqFImAqsG1",7,Fri Jun 24 00:06:43 +0000 2016
3,746132399952633856,jokeronparade,jokeronparade,RT @KamVTV: Any comment @tomforemancnn? You just got BUSTED! \n@CNN @DanScavino #Trump2016 #NoBillNoBreak #Hillary2016 \n\n https://t.co/yN1xV…,107,Fri Jun 24 00:06:41 +0000 2016
4,746132392826576896,Patricia Berry,Pebbles2770,RT @ladygaga: Thank you so much to the women &amp; men on the house floor who are protesting to save American lives ☕️ #NoBillNoBreak https://t…,5489,Fri Jun 24 00:06:39 +0000 2016


Now, I will tokenize the text of the tweets. To prevent errors due to special characters (such as emoji), I encode and then decode the text before I split it into tokens. It is also important to make each word lowercase, as well as split on punctuation. If a word is capitalized in some places and all lowercase in others, these would be counted as completely different words when tokenizing - which we would like to avoid. 

Also, we aren't interested in stopwords (i.e. 'the', 'if', 'and', etc.), because they are very common and don't give us a good idea of the meaning or overall theme of the text. NLTK has a list of english stopwords, so I convert the list to a set called 'stopwords.' Converting the list to a set reduces lookup time significantly when checking whether each token is a stopword.

Lastly, I use a counter to get a total count of how often each token appears in the collection of tweets.

In [8]:
tokens = []
stopwords = set(nltk.corpus.stopwords.words('english'))
for text in df['text'].values:
    text = text.encode('utf-8')
    text = text.decode('utf-8')
    for t in text.split():
        token = t.lower().strip(":,.?!*/\n=()'")
        if token in stopwords:
            continue
        else:
            tokens.append(token)
    
tokens_counter = Counter(tokens)

Now, I will take a look at the most common tokens in the tweets collection.

In [9]:
tokens_counter.most_common(50)

[(u'rt', 75348),
 (u'#nobillnobreak', 73603),
 (u'gun', 16331),
 (u'&amp;', 12545),
 (u'house', 10548),
 (u'@speakerryan', 9695),
 (u'@repjohnlewis', 9451),
 (u'sit-in', 8355),
 (u'trouble', 8273),
 (u'violence', 7382),
 (u'vote', 6885),
 (u'floor', 6846),
 (u'got', 6377),
 (u'people', 5603),
 (u'democrats', 5304),
 (u'want', 5026),
 (u'https\u2026', 4784),
 (u'congress', 4675),
 (u'good', 4598),
 (u'americans', 4492),
 (u'thank', 4473),
 (u'#democraticsitin', 4256),
 (u'support', 4070),
 (u'#disarmhate', 4057),
 (u'going', 4054),
 (u'standing', 4027),
 (u'hours', 3878),
 (u'way', 3844),
 (u'us', 3683),
 (u'#holdthefloor', 3517),
 (u'https://t\u2026', 3456),
 (u'need', 3425),
 (u'really', 3246),
 (u'dems', 3189),
 (u'@housedemocrats', 3189),
 (u"it's", 3123),
 (u'like', 3080),
 (u'control', 3077),
 (u'#nobill\u2026', 2956),
 (u'guns', 2903),
 (u'victims', 2901),
 (u'american', 2889),
 (u'sitting-in', 2855),
 (u'still', 2852),
 (u'necessary', 2777),
 (u'without', 2755),
 (u'#noflynobuy'

Some interesting things to note:
<ol>
    <li>Certain usernames are very common in the tweets.
        <ol>
            <li>
                <b>@speakerryan</b> - Republican Speaker of the House, Paul Ryan
            </li>
            <li>
                <b>@repjohnlewis</b> - Democrat Representative from Georgia, John Lewis
            </li>
        </ol>
    </li>
    This makes sense, as these two were at the very center of the events leading to the     #NoBillNoBreak hashtag. John Lewis was one of the main organizers of the Democrats' demonstration. Paul Ryan, as the Speaker of the House, joined other Republicans in not recognizing Rep. James Clyburn when he tried to introduce two gun control bills (<a href='http://www.politico.com/story/2016/06/house-democrats-gun-control-224627'>Politico</a>). He also was accused of turning the House cameras off so that the nation could not see the democrats' demonstration on the House floor. 
    
    <li>Other hashtags that often co-occur with #NoBillNoBreak also show up.
        <ol>
            <li>
                <b>#DemocraticSitIn</b>
            </li>
            <li>
                <b>#DisarmHate</b> - Hashtag in support of the gun control bills.
            </li>
            <li>
                <b>#HoldTheFloor</b> - In reference to the Democrats 'holding the floor' until the legislation they were attempting to introduce was considered for a vote.
            </li>
        </ol>
    </li>
    
    <li>Some of the top words that show up in the tweets:
        <ol>
            <li>
                <b>gun</b> - Gun control legislation was the main reason for the demonstration.
            </li>
            <li>
                <b>sit-in</b> - A word that is typically associated with civil rights demonstrations; this is interesting since Rep. John Lewis, who organized the demonstration, is considered an icon of the civil rights movement.
            </li>
            <li>
                <b>trouble</b> - I will explore the context of this word below to see how trouble was used in tweets about #NoBillNoBreak.
            </li>
            <li>
                <b>thank</b> - It seems that there could be a lot of support for the Democrats that took part in the demonstration. I will explore the context of this further below, as well.
            </li>
        </ol>
    </li>
    
    <li>A significant number of the tweets ($\frac{75348}{89884}\approx84\%$) appear to be retweets, because they contain 'RT'.
    </li>
</ol>

I will use NLTK's collocations method to get an idea of what phrases occur most often.

In [11]:
nltk_text = nltk.Text(tokens) # Convert tweet tokens into an nltk object
nltk_text.collocations()

gun violence; sitting-in really; families reckless; standing #nobill…;
leave without; really standing; without acting; acting victims;
trouble necessary; necessary trouble; way good; trouble sitting-in;
house floor; got way; gun control; @speakerryan leave; good trouble;
reckless gun; &amp; families; victims &amp;


It looks like 'families' and 'gun' occur next to 'reckless' often. Also, above, we noticed that trouble was a common word. Now, we can see that 'trouble' occurs along with 'necessary' and 'good'; I will further explore that association below. Also, '@speakerryan leave' is an interesting phrase.. I want to look into that, as well.

In [12]:
nltk_text.concordance('reckless')

Displaying 25 of 2415 matches:
hout acting victims &amp; families reckless gun violence #nobillnobreak https…
hout acting victims &amp; families reckless gun violence #nobillnobreak https…
hout acting victims &amp; families reckless gun violence #nobillnobreak https…
hout acting victims &amp; families reckless gun violence #nobillnobreak https…
hout acting victims &amp; families reckless gun violence #nobillnobreak https…
hout acting victims &amp; families reckless gun violence #nobillnobreak https…
hout acting victims &amp; families reckless gun violence #nobillnobreak https…
hout acting victims &amp; families reckless gun violence #nobillnobreak https…
hout acting victims &amp; families reckless gun violence #nobillnobreak https…
hout acting victims &amp; families reckless gun violence #nobillnobreak https…
hout acting victims &amp; families reckless gun violence #nobillnobreak https…
hout acting victims &amp; families reckless gun violence #nobillnobreak https…
hout acting victims &

Now there is some context around 'families', 'reckless', and 'guns' occurring together often. It appears there is a popular tweet (given that it seems to have been retweeted a number of times) that is discussing the victims and families of reckless gun violence.

In [13]:
nltk_text.concordance('trouble')

Displaying 25 of 8273 matches:
.co/58oqns0a9r rt @repjohnlewis got trouble got way good trouble necessary trou
pjohnlewis got trouble got way good trouble necessary trouble sitting-in really
uble got way good trouble necessary trouble sitting-in really standing #nobill…
ive yt cnn hillary 843 lol smh dems trouble #nobillnobreak https://t.… rt @mish
#holdthenation rt @repjohnlewis got trouble got way good trouble necessary trou
pjohnlewis got trouble got way good trouble necessary trouble sitting-in really
uble got way good trouble necessary trouble sitting-in really standing #nobill…
.co/b9bdzyamtc rt @repjohnlewis got trouble got way good trouble necessary trou
pjohnlewis got trouble got way good trouble necessary trouble sitting-in really
uble got way good trouble necessary trouble sitting-in really standing #nobill…
efive #nobill… rt @repjohnlewis got trouble got way good trouble necessary trou
pjohnlewis got trouble got way good trouble necessary trouble sitting-in really
uble got 

Based on the concordance above, it appears that there was a popular tweet from Rep. John Lewis talking about 'necessary trouble' and 'good trouble.' The text above doesn't seem to be a coherent sentence, but that is because we removed stopwords earlier, so only the "important" words are left.

'Good trouble' is actually a phrase commonly seen on Rep. John Lewis's Twitter profile. 
An example:
<blockquote class="twitter-tweet" data-cards="hidden" data-lang="en"><p lang="en" dir="ltr">55 years ago today, I was one of 13 original Freedom Riders who set out to integrate America&#39;s buses. <a href="https://twitter.com/hashtag/goodtrouble?src=hash">#goodtrouble</a> <a href="https://t.co/5SDpHMgcMl">pic.twitter.com/5SDpHMgcMl</a></p>&mdash; John Lewis (@repjohnlewis) <a href="https://twitter.com/repjohnlewis/status/727908088133652485">May 4, 2016</a></blockquote>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
He usually uses the hashtag '#goodtrouble' when talking about his involvement in past civil rights demonstrations. It is a reference to the trouble he got in (i.e. involvement in sit-ins and being arrested multiple times) during the civil rights movement. (<a href="https://www.washingtonpost.com/news/morning-mix/wp/2016/06/23/good-trouble-how-john-lewis-fuses-new-and-old-tactics-to-teach-about-civil-disobedience/">Washington Post</a>)

Lewis appears to have been the first to use the phrase in reference to the Democrats' demonstration on the House floor, which caused it to become so popular in the tweets sample. He used this phrase because the Democrats believed that even though they were breaking the rules and getting in trouble (they were occupying the well of the House during a session, as well as using electronic devices on the House floor to stream the demonstration), it was 'good trouble' since their motivation was protecting American lives.

In [14]:
nltk_text.concordance('thank')

Displaying 25 of 4473 matches:
                                     thank much women &amp; men house floor pro
016 https://t.co/yn1xv… rt @ladygaga thank much women &amp; men house floor pro
 standing #nobill… rt @jenniferbeals thank house democrats stay strong #nobilln
ps://t.co/zhj5x210ea rt @rweingarten thank @repjohnlewis &amp; @housedemocrats-
nobillnobreak #letsdoit rt @ladygaga thank much women &amp; men house floor pro
e #nobillnobreak https… rt @ladygaga thank much women &amp; men house floor pro
@dianaltra @tammyforil @repjohnlewis thank representing american people demand 
 last night good reaso… rt @ladygaga thank much women &amp; men house floor pro
fighting reduce gun violence america thank #disarmhate https… rt @sweetatertot 
fighting reduce gun violence america thank #disarmhate https… #nobillnobreak bu
elindaescott @ericryan… rt @ladygaga thank much women &amp; men house floor pro
re #stopthestunt https… rt @bobbytbd thank 2 amazing @housedemocrats 4 taking s
https://t

It appears that 'thank' was a common word due to a tweet from Lady Gaga. She was thanking the Democrats for standing up for gun control, and this tweet was retweeted many times. Also, it appears there are other tweets thanking specific members of Congress and the Democrats in general.

In [25]:
print nltk_text.concordance('Ryan')
print '\n'
print nltk_text.concordance('@speakerryan')

Displaying 25 of 2411 matches:
and ready when… rt @nprpolitics paul ryan #nobillnobreak don't think proud mom
n democrats shout house speaker paul ryan chanting #nobillnobreak continuing s
 holiday… rt @cherokeelair hope paul ryan knows watching political career die 
 https://t.co/mx… rt @hokieinsa paul ryan calling #nobillnobreak publicity stu
llnobreak https:… rt @hokieinsa paul ryan calling #nobillnobreak publicity stu
tes repeal obamacare weren't speaker ryan #nobillnobreak https://t.co/hcwpppg4
dhajl4mocl rt @dloindustries speaker ryan says he's defending 2nd shuts camera
/t.co/pqfimaqsg1 rt @normsmusic paul ryan tough love life children worth 1 ter
tes repeal obamacare weren't speaker ryan #nobillnobreak https://t.co/hcwpppg4
t @donnabrazile face dem sit-in paul ryan found way shut house down—which gop 
nitalowey responding protest speaker ryan adjourned house 2 weeks rather put m
ight good reaso… rt @pari_passu paul ryan scheduled work 110 days 2016 mean ti
//t.co/pqfimaqsg1 rt 

I was curious to see what people were saying about Speaker of the House, Paul Ryan. Although it can be hard to tell exactly what people are saying using concordance, based on the sample above, it does not appear the tweets are saying nice things about him.

<p>Paul Ryan called the Democrats' demonstration "a publicity stunt" (<a href="http://www.cnn.com/2016/06/22/politics/paul-ryan-sit-in-guns-publicity-stunt/">CNN</a>), so that is a common theme in the tweets.</p>

<p>Related to that, another interesting theme of the tweets about Paul Ryan is a reference to him washing dishes at a soup kitchen. The tweets are mocking Paul Ryan for calling the Democrats' sit-in "a publicity stunt," by calling his appearance at a soup kitchen a "publicity stunt." There are claims that he did not actually work at the soup kitchen, and was only there to take photos (<a href="https://www.washingtonpost.com/news/post-politics/wp/2012/10/15/charity-president-unhappy-about-paul-ryan-soup-kitchen-photo-op/">Washington Post</a>).</p>
<p>'Shame' also appears in multiple tweets in reference to Paul Ryan. Other tweets are calling Paul Ryan an NRA henchman, in reference to the NRA appearing to fund political campaigns in return for votes against gun control (<a href="http://www.nydailynews.com/news/politics/nra-big-spending-stop-congress-enacting-gun-safety-laws-article-1.2643408">NY Daily News</a>).</p>

Based on what we have discovered so far, it would appear that there might be a favorable attitude toward the Democrats that staged the sit-in, and an unfavorable opinion toward Paul Ryan. I want to look at the tweets with the highest retweet count to get an idea of what was most popular in the sample of tweets I  obtained. While a retweet does not necessarily mean the retweeter agreed with the content of the tweet (as politicians and public figures often remind us in their Twitter bios), it can be a useful proxy to understand, overall, how Twitter users felt about the content.


I want to look at only the retweet count and text content of the tweets to do this. I am not interested in who exactly the tweet came from, as the original tweet's writer will be in the text of the tweet itself. I also want to drop duplicate tweet texts, as I want to know the number of time each unique tweet was retweeted.

In [21]:
df.sort(['retweet_count'], ascending=False)[['retweet_count', 'text']].drop_duplicates(subset='text',
                                                                                       keep='first').head(30)

Unnamed: 0,retweet_count,text
89317,13630,"RT @WhiteHouse: ""If we’re going to raise our kids in a safer, more loving world, we need to speak up for it."" —@POTUS #NoBillNoBreak https:…"
84441,8648,RT @SenWarren: Nowhere I'd rather spend my bday than the House floor w/ @repjohnlewis for gun control. #NoBillNoBreak #goodtrouble https://…
89279,8547,"RT @repjohnlewis: .@SpeakerRyan, we will not leave without acting for the victims &amp; families of reckless gun violence. #NoBillNoBreak https…"
89814,7017,RT @BarackObama: We need more than moments of silence. We need action. And that's what's going on in the House now: https://t.co/L9m0dSfipA…
89619,6423,"RT @KimKardashian: After Orlando, Congress hasn't done anything and now they're going on vacation. I say #NoBillNoBreak https://t.co/WNdeZI…"
82254,6405,RT @SenWarren: Hero @repjohnlewis is leading a sit-in on gun violence &amp; @SpeakerRyan shut off the camera so you can’t watch. Shameful. #NoB…
85631,6158,RT @tparsi: So I'm meeting with @keithellison. His scheduler walks in and hands him this note. Meeting ends :) #NoBillNoBreak https://t.co/…
89819,5582,RT @ladygaga: Thank you so much to the women &amp; men on the house floor who are protesting to save American lives ☕️ #NoBillNoBreak https://t…
88907,5413,RT @Phil_Lewis_: Straight Outta Congress\n\n#NoBillNoBreak https://t.co/cyZz23Rd4e
89348,5335,RT @repjohnlewis: We cannot give up or give in or give out. We must keep our eyes on the prize. #goodtrouble #NoBillNoBreak https://t.co/mx…


Almost all of the most retweeted tweets came from democrats that were involved in the sit-in.
<p>The most retweeted tweet was from the White House, which was in support of the #NoBillNoBreak demonstration.</p>
<p>Interestingly, tweets from two pop culture figures, Kim Kardashian and Lady Gaga, also were retweeted many times. These tweets were in support of the Democrats' sit-in. This is likely due to the sheer number of followers these two have on Twitter, but it does show that a number of their followers likely support the Democrats on this issue.</p>
<p>User @Phil\_Lewis\_ wrote a more humorous tweet that was also popular. It shows a picture of Democrats on the House floor, with the caption "Straight Outta Congress," a reference to the movie "Straight Outta Compton."</p>
<p>One other popular tweet came from @periscopetv. When the House cameras were turned off, Democrats on the House floor began streaming the sit-in on Periscope, so that the nation could hear their speeches and see the demonstration. Periscope picked up on it, and let followers know how they could see the live streams.</p>

<h2>Conclusion</h2>

After this simple analysis of the tweets about #NoBillNoBreak, it appears there was much support for House Democrats, and some contempt for Speaker Paul Ryan. I will not make a formal conclusion, since the goal of this analysis was to get a basic idea of what was being said about the event, and a much deeper analysis would be required. What is clear is that the debate over gun control is likely to continue for quite some time, and seems to be getting much more heated as time goes on and the political gridlock strengthens.