# Collect Tweets into MongoDB with Twitter API v2

## Install Python libraries

We need the [pymongo](https://pypi.org/project/pymongo/) to manage the MongoDB database, and [tweepy](https://www.tweepy.org/) to call the Twitter APIs.

In [None]:
!pip install pymongo

In [None]:
!pip install tweepy

## Import Python libraries

In [None]:
import pymongo
from pymongo import MongoClient
import json
from pprint import pprint
import tweepy
import configparser

## Load the authorization info

Save the database connection info and API key in a config.ini file and use the configparse to load the authorization info.

The config.ini file shoud look like:
``` 
[mytwitter]
bearer_token = <your bearer token from twitter>

[mymongo]
connection = <your monogdb connection>
```


In [None]:
config = configparser.ConfigParser(interpolation=None)
config.read('config.ini')

BEARER_TOKEN   = config['mytwitter']['bearer_token']

mongod_connect = config['mymongo']['connection']

## Connect to the MongoDB cluster

We will create a database named 'demo' and a collection named 'tweet_collection' in your MongoDB database.

In [None]:
client = MongoClient(mongod_connect)
db = client.demo # use or create a database named demo
tweet_collection = db.tweet_collection #use or create a collection named tweet_collection
tweet_collection.create_index([("tweet.id", pymongo.ASCENDING)],unique = True) # make sure the collected tweets are unique

## Use the API to collect tweets

### Define the query

For more about Twitter API 2.0 query operators, please check [Search Tweets](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query)

In [None]:
query = 'covid'  #query tweets about covid

### Insert the data into mognodb

You can set a different max_result, but the max tweets we can collect is 100.

In [None]:

client = tweepy.Client(BEARER_TOKEN)

tweets = client.search_recent_tweets(query=query, max_results=100,
                                    expansions=['author_id'], 
                                    tweet_fields = ['created_at','entities','lang','public_metrics','geo'],
                                    user_fields = ['id', 'location','name', 'public_metrics','username'])

next_token = tweets.meta['next_token']
for user, tweet in zip(tweets.includes['users'],tweets.data):
    tweet_json = {}
    tweet_json['tweet']= tweet.data
    tweet_json['user'] = user.data
    try:
        tweet_collection.insert_one(tweet_json)
        print(tweet_json['tweet']['created_at'])
    except:
        pass



Continue fetching early tweets with the same query. <span style="color:red">YOU WILL REACH YOUR RATE LIMIT VERY FAST</span>

In [None]:
for i in range(0):
    tweets = client.search_recent_tweets(query=query, max_results=10,
                                        expansions=['author_id'], 
                                        tweet_fields = ['created_at','entities','lang','public_metrics','geo'],
                                        user_fields = ['id', 'location','name', 'public_metrics','username'],
                                        next_token=next_token)
    next_token = tweets.meta['next_token']
    for user, tweet in zip(tweets.includes['users'],tweets.data):
        tweet_json = {}
        tweet_json['tweet']= tweet.data
        tweet_json['user'] = user.data
        try:
            tweet_collection.insert_one(tweet_json)
            print(tweet_json['tweet']['created_at'])
        except:
            pass

## View the collected tweets

In [None]:
print('Number of collected tweets:',tweet_collection.estimated_document_count())# number of tweets collected

Create a text index and print the Tweets containing specific keywords.

In [None]:
tweet_collection.create_index([("tweet.text", pymongo.TEXT)], name='text_index', default_language='english') # create a text index

Create a cursor to query tweets with the created index

In [None]:
tweet_cursor = tweet_collection.find({"$text": {"$search": "covid"}}) # return tweets that contain covid

In [None]:
for tweet in tweet_cursor:
    print('---')
    print (tweet['tweet']['text'])
    print (tweet['user']['name'])