# Extract Twitter Followers and Follower Tweets with Tweepy

- This notebook was made to answer [How to extract data from a Tweepy object into a pandas dataframe?][1] on [Stack Overflow][2].

  [1]: https://stackoverflow.com/questions/58666135/how-to-extract-data-from-a-tweepy-object-into-a-pandas-dataframe
  [2]: https://stackoverflow.com/

In [None]:
import tweepy
import json
import pandas as pd
from pandas.io.json import json_normalize
from datetime import datetime

In [None]:
pd.set_option('display.max_columns', 700)
pd.set_option('display.max_rows', 100)
pd.set_option('display.min_rows', 10)
pd.set_option('display.expand_frame_repr', True)

## Convert the tweepy object to JSON:

- Attribution to [Tweepy for beginners][1]
- `followers` is a generator containing `User(...)`, which is a `tweepy.models.User` type
 - Wrap `followers` in `list()` to unpack the generator, or just iterate through the `followers` without unpacking it.
 - I unpacked it into a `list` in case there's some need to inspect the content
- Extract `_json` for each user, with `def jsonify_tweepy`
- Call the function to create a list containing `_json` for each follower, in a JSON format
- Load it into a dataframe with `json_normalize`.
- You'll need a [Twitter Developer][2] account.

### To get followers:

  [1]: https://towardsdatascience.com/tweepy-for-beginners-24baf21f2c25
  [2]: https://developer.twitter.com/en.html

In [None]:
#insert your Twitter keys here
consumer_key = ''
consumer_secret= ''
access_token = ''
access_token_secret = ''

auth = tweepy.auth.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

followers = list(tweepy.Cursor(api.followers).items())

# function to convert _json to JSON
def jsonify_tweepy(tweepy_object):
    json_str = json.dumps(tweepy_object._json)
    return json.loads(json_str)

# Call the function and unload each _json into follower_list
followers_list = [jsonify_tweepy(follower) for follower in followers]

# Convert followers_list to a pandas dataframe
df = json_normalize(followers_list)

# save to csv
df.to_csv('followers.csv', index=False)

### To get follower tweets:

- Use `class TweetMiner`, as shown in the link at the top
- As already noted, I did not write this class, but I did use it and it performs as specified, to extract tweets.
- That said, bare `except` clauses are a no-no.
- `twitter_keys` are coming from outer scope variables.

In [None]:
class TweetMiner(object):

    result_limit = 20    
    data = list()
    api = False

    twitter_keys = {'consumer_key': consumer_key,
                    'consumer_secret': consumer_secret,
                    'access_token_key': access_token,
                    'access_token_secret': access_token_secret}
    
    def __init__(self, keys_dict=twitter_keys, api=api, result_limit=20):
        
        self.twitter_keys = keys_dict
        
        auth = tweepy.OAuthHandler(keys_dict['consumer_key'],
                                   keys_dict['consumer_secret'])
        auth.set_access_token(keys_dict['access_token_key'],
                              keys_dict['access_token_secret'])
        
        self.api = tweepy.API(auth, wait_on_rate_limit=True,
                              wait_on_rate_limit_notify=True)
        self.twitter_keys = keys_dict
        self.result_limit = result_limit
        

    def mine_user_tweets(self, user, mine_rewteets=False, max_pages=5):

        data = list()
        last_tweet_id = False
        page = 1
        
        while page <= max_pages:
            if last_tweet_id:
                statuses =  self.api.user_timeline(screen_name=user,
                                                   count=self.result_limit,
                                                   max_id=last_tweet_id - 1,
                                                   tweet_mode = 'extended',
                                                   include_retweets=True)        
            else:
                statuses = self.api.user_timeline(screen_name=user,
                                                  count=self.result_limit,
                                                  tweet_mode = 'extended',
                                                  include_retweets=True)
                
            for item in statuses:

                mined = {'tweet_id': item.id,
                         'name': item.user.name,
                         'screen_name': item.user.screen_name,
                         'retweet_count': item.retweet_count,
                         'text': item.full_text,
                         'mined_at': datetime.now(),
                         'created_at': item.created_at,
                         'favourite_count': item.favorite_count,
                         'hashtags': item.entities['hashtags'],
                         'status_count': item.user.statuses_count,
                         'location': item.place,
                         'source_device': item.source}
                
                try:
                    mined['retweet_text'] = item.retweeted_status.full_text
                except:
                    mined['retweet_text'] = 'None'
                try:
                    mined['quote_text'] = item.quoted_status.full_text
                    mined['quote_screen_name'] = status.quoted_status.user.screen_name
                except:
                    mined['quote_text'] = 'None'
                    mined['quote_screen_name'] = 'None'
                
                last_tweet_id = item.id
                data.append(mined)
                
            page += 1
            
        return data

#### Call the class

- The original object does not contain tweets
- Using `df` from above, get all the followers and use `class TweetMiner` to download the tweets for each user.
- The follow code, will create a dict of dataframes, `mined_tweets_dict`, where each key is a user.

In [None]:
miner=TweetMiner(result_limit=200)
mined_tweets_dict = dict()
for name in df['screen_name'].unique():
    try:
        mined_tweets = miner.mine_user_tweets(user=name, max_pages=17)
        mined_tweets_dict[name] = pd.DataFrame(mined_tweets)
    except tweepy.TweepError as e:
        print(f'{name} could not be processed because {e}')

### Save with `.to_csv`:

In [None]:
with open('follower_tweets.csv', mode='a', encoding='utf-8') as f:
    for i, df in enumerate(mined_tweets_dict.values()):
        if i == 0:
            df.to_csv(f, header=True, index=False)
        else:
            df.to_csv(f, header=False, index=False)