# Easily Get TWEETS about a TOPIC on Twitter    
In this video we will see how to get the tweets related to covid19, and finally a pandas dataframe containing those tweets.

In [4]:
# Useful Modules
import tweepy as tw
import pandas as pd
import json

### 1- Load Credentials For Authentication      

To access the Twitter API, you will need your Twitter developer credentials. I made a separate video explaining how to get them (link: https://www.youtube.com/watch?v=PqqXjwoDQiY).   
To make sure nobody else access those credentials but me, I created a json file containing all the information. So the following function is used to load them. 

In [18]:
def load_credentials(path_to_file):

    with open(path_to_file) as json_file:
        credentials = json.load(json_file)
    return credentials

### 1- Access the API   
Once I have the credentials, it can then easily access the API 

In [6]:
def get_API(credentials):
    auth = tw.OAuthHandler(credentials['consumer_key'], credentials['consumer_secret'])
    auth.set_access_token(credentials['access_token'], credentials['access_token_secret'])
    
    return tw.API(auth, wait_on_rate_limit=True)

In [7]:
credentials = get_credentials('./data/credentials.json')
api = get_API(credentials)

### 2- Search Tweets     
At this stage, I already have the API, so I can start searching for tweets related to a topic, a given date (date_since) and the number of tweets we want to retrieve

In [17]:
def search_tweets(search_word, date_since, limit=20):
    tweets_cursor = tw.Cursor(api.search,
              q=search_words,
              lang="en",
              since=date_since).items(limit)
    
    return tweets_cursor

Lets say we want tweets containing the term #covid19 since October 1st.

In [9]:
# Define the search term and the date_since date as variables
search_words = "#covid19"
date_since = "2020-10-01" # October 1st

In the function above, we used .Cursor() to search twitter for tweets containing the term #covid19. We can restrict the number of tweets returned by specifying a number in the .items() method. .items(15) will return 15 of the most recent tweets.

In [10]:
covid_cursor = search_tweets(search_words, date_since)

.Cursor() returns an object that you can iterate or loop over to access the data collected. Each item in the iterator has various attributes that you can access to get information about each tweet including:

- the text of the tweet  
- the person who posted the tweet  
- the date the tweet was posted   
- and more.    

In [11]:
i=0
for tweet in covid_cursor:
    i=i+1
    print(f"Tweet N° {i}: {tweet.text}\n")

Tweet N° 1: RT @newschambers: Government expected to introduce "Level 4+" restrictions today.

Three steps to the day:

▶️ Cabinet Sub-Committee meetin…

Tweet N° 2: RT @drsfaizanahmad: Meet my #COVID duty colleague Dr Arup Senapati an ENT surgeon at Silchar medical college Assam .
Dancing infront of COV…

Tweet N° 3: RT @drsfaizanahmad: Meet my #COVID duty colleague Dr Arup Senapati an ENT surgeon at Silchar medical college Assam .
Dancing infront of COV…

Tweet N° 4: RT @vijay27anand: From Whatsapp forward. Wake up people #COVID19 is not over yet, many countries and districts in India are in starting sec…

Tweet N° 5: RT @ShashiTharoor: Brilliantly appropriate #covid19-themed Durga Puja creativity from Kolkata, with the goddess slaying the virus! Salutati…

Tweet N° 6: RT @ShashiTharoor: Brilliantly appropriate #covid19-themed Durga Puja creativity from Kolkata, with the goddess slaying the virus! Salutati…

Tweet N° 7: #BREAKING: Global #COVID19 cases surpass 40 mln: Johns Hopkins U

### Transform the tweets into Pandas DataFrame

In [12]:
"""
Function creating pandas dataframe from Cursor
"""
def create_df_from_cursor(tweet_cursor):
   
    all_tweets = []
    
    for tweet in tweet_cursor:
        parsed_tweet = {}
        parsed_tweet['date'] = tweet.created_at
        parsed_tweet['author'] = tweet.user.name
        parsed_tweet['twitter_name'] = tweet.user.screen_name
        parsed_tweet['text'] = tweet.text
        parsed_tweet['number_of_likes'] = tweet.favorite_count
        parsed_tweet['number_of_retweets'] = tweet.retweet_count
        
        all_tweets.append(parsed_tweet)
    
    tweets_df = pd.DataFrame(all_tweets)
    
    # Remove duplicates
    tweets_df = tweets_df.drop_duplicates( "text" , keep='first')
    
    return tweets_df 

### 3- Final Step - Putting All Together

In [13]:
covid_cursor = search_tweets(search_words, date_since)

In [14]:
df = create_df_from_cursor(covid_cursor)

In [15]:
df.shape

(10, 6)

In [16]:
df.head()

Unnamed: 0,date,author,twitter_name,text,number_of_likes,number_of_retweets
0,2020-10-19 08:57:16,Mooses Me Grapely,culturalfatwa,RT @newschambers: Government expected to intro...,0,25
1,2020-10-19 08:57:16,Chandan Pradhan,CPradhan07,RT @drsfaizanahmad: Meet my #COVID duty collea...,0,6215
3,2020-10-19 08:57:15,premin shijo,premin_shijo,RT @vijay27anand: From Whatsapp forward. Wake ...,0,25
4,2020-10-19 08:57:15,Mayank,simply_mayank,RT @ShashiTharoor: Brilliantly appropriate #co...,0,447
6,2020-10-19 08:57:14,Global.TV,GlobalTelevsion,#BREAKING: Global #COVID19 cases surpass 40 ml...,0,0
