# Tweepy Notebook

Notebook will provide an example of how to use the Tweepy package to read tweets, do some simple processing and then load them into pandas.

The first thing you need to do is log into twitter and create an application.

[Twitter Apps](https://apps.twitter.com)

Select the **Create New App** button

and fill out the application information.

You will ultimately need the following pieces of information:

- access_token
- access_token_secret
- consumer_key
- consumer_secret




In [26]:
import tweepy
import json
import re
import pandas as pd
from config import Config as cfg

In [27]:
access_token = cfg.access_token
access_token_secret = cfg.access_token_secret
consumer_key = cfg.consumer_key
consumer_secret = cfg.consumer_secret

tweet_file_name = 'tweets.txt'
file_mode = 'a'

Create a tweepy OAuthHandler, to be used when we stream the tweets.

In [28]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)


In [30]:
class MyStreamListener(tweepy.StreamListener):
    def __init__(self, api=None, file_name='tweets.txt', mode='a', max_tweets=300):
        super(MyStreamListener, self).__init__()
        self.num_tweets = 0
        self.file = open(file_name, mode)
        self.max_tweets = max_tweets

    def on_status(self, status):
        tweet = status._json
        self.file.write( json.dumps(tweet) + '\n' )
        self.num_tweets += 1
        if self.num_tweets < self.max_tweets:
            return True
        else:
            return False
        self.file.close()

    def on_error(self, status):
        print(status)


Select a filter word or words.  Becareful to pick a set of filter words that will give you hits on tweets quickly, else the stream will wait until it has enough tweets before it exits.

In [31]:
filter_words=['trump']

In [32]:
l = MyStreamListener()
stream = tweepy.Stream(auth, l, file_name=tweet_file_name, mode=file_mode, max_tweets=300)

# filters twitter streams to capture data by keywords
stream.filter(track=filter_words)


### Post process the collected tweets

Read the tweets file

In [42]:
def word_in_text(word, tweet):
    word = word.lower()
    text = tweet.lower()
    match = re.search(word, text)

    if match:
        return True
    return False


In [43]:
tweets_data = []
tweets_with_words = ['russia', 'SeanHannity', 'Mueller', 'clinton']
with open(tweet_file_name, 'r') as tweets_file:
    for line in tweets_file:
        tweet = json.loads(line)
        user = tweet['user']
        tweet['user_id'] = user['id']
        tweet['screen_name'] = user['screen_name']
        tweet['len'] = len(tweet['text'])
        for word in tweets_with_words:
            tweet[word] = 1 if word_in_text(word, tweet['text']) else 0
        tweets_data.append(tweet)


In [45]:
columns=['text', 'screen_name', 'len']
columns.extend(tweets_with_words)
df = pd.DataFrame(tweets_data, columns=columns)

print(df.head(20))
print(df.shape)

                                                 text     screen_name  len  \
0             @AlamoOnTheRise https://t.co/ukp07vXGqf        ne1for23   39   
1                     So true https://t.co/Vq9ayIAcAn        jend0315   31   
2   RT @sumariumcom: Cabello sobre ausencia de Tru...   REACCI0NARIAA  139   
3   RT @TrumpTrainMRA4: @facebook \n@DiamondandSil...     Crowntiptoe  140   
4   RT @page88: Thread. Don’t think this overstate...    JadeJensen29   80   
5   @SpeakerRyan @SenateMajLdr #ProtectMueller #Pr...      Terryg1979   85   
6   RT @funder: Trump sending troops to the border...   IamBrooklyn31  140   
7   RT @AnnCoulter: Mueller is DYING to get fired....          SPeek1  139   
8   Yup.\nTrain wreck coming.\n#Resist\n#Resign\n#...       titeman50   73   
9   RT @drscott_atlanta: #FBI raids @realDonaldTru...       Jdflygirl  140   
10  RT @MSNBC: The New York Times reports special ...     hani_azer11  140   
11  RT @CoreyLMJones: Mark Zuckerberg said that Fa...  Mikeypoos