<a href="https://colab.research.google.com/github/theclassofai/TweetSearch_FullArchive/blob/main/SearchTweet_FullArchive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fetch Tweets 

This document shows how to use Tweepy to conduct a full archive search using v2 of the Twitter API.

## Prep work
In order to use this code, you will need to have a developer account on Twitter, with access to the Academic Research product track. Information about who is eligible and how to apply is here.

Once you have an account, you will need to create a new app at https://developer.twitter.com/en/portal/dashboard and generate a "bearer token" from the app. Copy the bearer token to your clipboard and paste it into a new file in the same directory as this file, called twitter_authentication.py. The entire contents of the file should look like this:

`bearer_token = "YOUR BEARER TOKEN HERE"`
Note that you should never share this token with anyone else. If, for example, you are saving your work in a Git repository, make sure that you add the twitter_authentication.py file to your .gitignore.

If anyone gets this token, they will have access to your Twitter account and you will need to revoke the token (from the same interface where you created it).

In [1]:
!pip install tweepy==4.4.0
import tweepy

Collecting tweepy==4.4.0
  Downloading tweepy-4.4.0-py2.py3-none-any.whl (65 kB)
[?25l[K     |█████                           | 10 kB 22.7 MB/s eta 0:00:01[K     |██████████                      | 20 kB 11.9 MB/s eta 0:00:01[K     |███████████████                 | 30 kB 9.3 MB/s eta 0:00:01[K     |████████████████████            | 40 kB 8.7 MB/s eta 0:00:01[K     |█████████████████████████       | 51 kB 5.1 MB/s eta 0:00:01[K     |██████████████████████████████  | 61 kB 5.7 MB/s eta 0:00:01[K     |████████████████████████████████| 65 kB 2.1 MB/s 
Installing collected packages: tweepy
  Attempting uninstall: tweepy
    Found existing installation: tweepy 3.10.0
    Uninstalling tweepy-3.10.0:
      Successfully uninstalled tweepy-3.10.0
Successfully installed tweepy-4.4.0


In [2]:
import tweepy
from twitter_authentication import bearer_token
import time
import pandas as pd

In [3]:
client = tweepy.Client(bearer_token, wait_on_rate_limit=True)


In [29]:
Omicorn_tweets = []
for response in tweepy.Paginator(client.search_all_tweets, 
                                 query = 'Omicorn',
                                 user_fields = ['username', 'public_metrics', 'description', 'location'],
                                 tweet_fields = ['created_at', 'geo', 'public_metrics', 'text'],
                                 expansions = 'author_id',
                                 start_time = '2021-12-20T00:00:00Z',
                                 end_time = '2021-12-31T00:00:00Z',
                              max_results=500):
    time.sleep(1)
    Omicorn_tweets.append(response)

In [30]:
Omicorn_tweets[0].data[0]

<Tweet id=1476704549578575880 text=RT @RDA_UP: Teri mitti me mil jawan.... 

Thank you @FordaIndia for leading this historic movement for the #Doctors

#neetpg2021counselling…>

In [31]:
result = []
user_dict = {}
# Loop through each response object
for response in Omicorn_tweets:
    # Take all of the users, and put them into a dictionary of dictionaries with the info we want to keep
    for user in response.includes['users']:
        user_dict[user.id] = {'username': user.username, 
                              'followers': user.public_metrics['followers_count'],
                              'tweets': user.public_metrics['tweet_count'],
                              'description': user.description,
                              'location': user.location
                             }
    for tweet in response.data:
        # For each tweet, find the author's information
        author_info = user_dict[tweet.author_id]
        # Put all of the information we want to keep in a single dictionary for each tweet
        result.append({'author_id': tweet.author_id, 
                       'username': author_info['username'],
                       'author_followers': author_info['followers'],
                       'author_tweets': author_info['tweets'],
                       'author_description': author_info['description'],
                       'author_location': author_info['location'],
                       'text': tweet.text,
                       'created_at': tweet.created_at,
                       'retweets': tweet.public_metrics['retweet_count'],
                       'replies': tweet.public_metrics['reply_count'],
                       'likes': tweet.public_metrics['like_count'],
                       'quote_count': tweet.public_metrics['quote_count']
                      })

# Change this list of dictionaries into a dataframe
df = pd.DataFrame(result)

In [32]:
df

Unnamed: 0,author_id,username,author_followers,author_tweets,author_description,author_location,text,created_at,retweets,replies,likes,quote_count
0,1148865328358629376,Whytcoatsoldier,271,86,"•Central Zonal Head, IMA MSN HQ\n•State Joint ...",Bhubaneswar,RT @RDA_UP: Teri mitti me mil jawan.... \n\nTh...,2021-12-30 23:59:35+00:00,94,0,0,0
1,1330313115918012418,errquivocal,121,1201,still air briefs existence,,@justindclemens @maariaris Omicorn - the wheat...,2021-12-30 23:57:50+00:00,0,0,3,0
2,1338611875,its_lisa25,453,24326,I'm probably at a concert or Disney I root for...,"Knoxville, TN",You put your other dog down because you it had...,2021-12-30 23:56:02+00:00,0,1,0,0
3,1303645774576746499,Fx1Teach,103,2741,"Dad of three, Rugby, FX, crypto educator, Scie...",,RT @latimeralder: The 'tsunami' of Omicorn dea...,2021-12-30 23:55:53+00:00,53,0,0,0
4,438629878,RetiredDent,5624,44069,I use to be a big-time dentist. https://t.co/w...,"Calgary, AB",I have a bad feeling they’re going to cancel c...,2021-12-30 23:48:40+00:00,0,3,11,0
...,...,...,...,...,...,...,...,...,...,...,...,...
5002,1175299172003483650,privatelydee,61,3812,"Anese, Aneeq & Lil A❤️",,RT @PuanHannah: Prayers for those affected wit...,2021-12-20 00:23:50+00:00,51,0,0,0
5003,1131536829319041025,HelenSm76464997,2940,33535,Fully Vaxxed. Maintaining the rage. #LNPCrimeF...,unceded land,RT @KateEmerson88: Dom going on about 95% firs...,2021-12-20 00:22:52+00:00,92,0,0,0
5004,188143601,blanketcrap,20328,697239,Unpaid #Disabled Carer #heritage for @qlduseum...,Mt Morgan,RT @KateEmerson88: Dom going on about 95% firs...,2021-12-20 00:18:12+00:00,92,0,0,0
5005,1307453966796689408,hibakirishima91,619,14124,The only time you should ever look back. Is to...,,RT @Jagannath2050: Are players not bringing Om...,2021-12-20 00:14:55+00:00,6,0,0,0


In [33]:
df.to_csv('Omicorn.csv', index=False)