# Search for tweets

#### — _Using Twitter API_ —

### ⚠️ This notebook was not useful to retrieve historic tweets.

Considering that Trudeau's announcement was done on March 27th, I was willing to get tweets from March 1st until April 30th, to look for messages before and after the announcement and compare how much the sentiment of people changed over this period.

However, the standard twitter developer's account only gives access to search historic databases with a 7-day limit, which means that I was not able to find tweets for a date older than one week. 

For this reason, this notebook which uses the standard twitter developer's account is not useful for the objective of our project. However, this notebook can be used as reference; to use it you will need to get a Twitter developer's account. 

Instead, we invite you to look at the notebook **Canada_Twitter.ipynb**, located in the [same _src_ folder](https://github.com/vcuspinera/Canada_response_covid/blob/master/src), where we successfully download the historic tweets used for this project.

_Author: [vcuspinera](https://github.com/vcuspinera)._

### Import libraries

In [1]:
import numpy as np
import pandas as pd
import twitter
import time
import collections
from datetime import datetime, timedelta
from pytz import timezone

### Bring your twitter's keys

In [2]:
# Call your twitter keys
keys = ! ../keys/twitter_config.py

# Save keys to use them later
api = twitter.Api(consumer_key = keys[0],
                  consumer_secret = keys[1],
                  access_token_key = keys[2],
                  access_token_secret = keys[3],
                  sleep_on_rate_limit=True #
                 )

### Parameters

In [3]:
# twitter accounts
accounts = ("CanadianPM", "Canada", "OpenGovCan", "JustinTrudeau")

# dates
# format YYYY-MM-DD, and has a 7-day limit (no tweets will be found 
# for a date older than one week.)
today = datetime.now()#timezone('US/Pacific'))
dates = list()
for d in range(7, -2, -1):
    aux = today - timedelta(days=d)
    dates.append(aux.strftime("%Y-%m-%d"))

today = today.strftime("%Y-%m-%d")

# tweets
num_tweets = 100
max_tweets = 180
time_epoc = 900 # 15 minutes = 900 seconds
time_waiting = time_epoc / max_tweets

### Restrictions
The standard API rate limits described in the [Twitter developer's documents](https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets) related with GET (read) endpoints are shown in the next table.  

Note that endpoints not listed in the chart default to 15 requests per allotted user. All request windows are *15 minutes in length*.  These rate limits apply to the standard API endpoints only.

| Endpoint | Resource family | Requests / window (user auth) | Requests / window (app auth) |
|---|:--:|:---:|:---:|
|GET search/tweets | search | 180 | 450 |

Also, the search index has a 7-day limit, which means that no tweets will be found for a date older than one week.

### Retrieve tweets

In [4]:
# dictionary of dictionaries
results = collections.defaultdict(dict)
print("--- Retrieving tweets ---")
print("\nDATE       : ACCOUNT")
print("---------- : -------")
for ac in accounts:
    for d in range(0, len(dates)-1):
#     for da in dates:
        print(dates[d] + " : " + ac)
        results[ac][dates[d]] = api.GetSearch(
            raw_query = "q=" + ac + 
                        "%20&until=" + dates[d+1] +
                        "&count=" + str(num_tweets)
        )
        time.sleep(time_waiting)
print("\n--- Completed ---")

--- Retrieving tweets ---

DATE       : ACCOUNT
---------- : -------
2020-08-21 : CanadianPM
2020-08-22 : CanadianPM
2020-08-23 : CanadianPM
2020-08-24 : CanadianPM
2020-08-25 : CanadianPM
2020-08-26 : CanadianPM
2020-08-27 : CanadianPM
2020-08-28 : CanadianPM
2020-08-21 : Canada
2020-08-22 : Canada
2020-08-23 : Canada
2020-08-24 : Canada
2020-08-25 : Canada
2020-08-26 : Canada
2020-08-27 : Canada
2020-08-28 : Canada
2020-08-21 : OpenGovCan
2020-08-22 : OpenGovCan
2020-08-23 : OpenGovCan
2020-08-24 : OpenGovCan
2020-08-25 : OpenGovCan
2020-08-26 : OpenGovCan
2020-08-27 : OpenGovCan
2020-08-28 : OpenGovCan
2020-08-21 : GovCanHealth
2020-08-22 : GovCanHealth
2020-08-23 : GovCanHealth
2020-08-24 : GovCanHealth
2020-08-25 : GovCanHealth
2020-08-26 : GovCanHealth
2020-08-27 : GovCanHealth
2020-08-28 : GovCanHealth
2020-08-21 : JustinTrudeau
2020-08-22 : JustinTrudeau
2020-08-23 : JustinTrudeau
2020-08-24 : JustinTrudeau
2020-08-25 : JustinTrudeau
2020-08-26 : JustinTrudeau
2020-08-27 : Just

In [5]:
print("accounts:", accounts)
print("dates:", dates[:-1])

accounts: ('CanadianPM', 'Canada', 'OpenGovCan', 'GovCanHealth', 'JustinTrudeau')
dates: ['2020-08-21', '2020-08-22', '2020-08-23', '2020-08-24', '2020-08-25', '2020-08-26', '2020-08-27', '2020-08-28']


In [6]:
## Uncomment and run this cell to see the retrieved results
# results

#### Example:

In [7]:
# First tweet by the public to the 'Canada' account on the last day of the period.
print('Account: Canada')
print('Date:', today)
results['CanadianPM'][today][0]

Account: Canada
Date: 2020-08-28


Status(ID=1299319363930148865, ScreenName=durike, Created=Fri Aug 28 12:14:11 +0000 2020, Text='RT @ChukwudubemIgb1: @MaziNnamdiKanu https://t.co/GPVzTPEzuP @netanyahu @PMOIndia @JapanGov @CanadianPM @BrazilGovNews @BorisJohnson_MP @MF…')

### Load dictionaries (if exist) and save tweets

In [8]:
# Load 
#  - retrieve dictionaries if they exist,
#  - if they doesn't exist, it creates them,
#  - finally adds the tweets from the last 7 days.
results_all = collections.defaultdict(dict)
for ac in accounts: 
    try:
        results_all[ac] = np.load('../tweets/non-historic/tweets_' +ac+ '.npy',allow_pickle='TRUE').item()
    except:
        next
    for d in range(0, len(dates)-1):
        results_all[ac][dates[d]] = results[ac][dates[d]]

In [9]:
## Uncomment and run this cell to see all the retrieved results over time
# results_all

In [10]:
# Save
for ac in accounts:
    np.save('../tweets/non-historic/tweets_' + ac + '.npy', results_all[ac])

In [11]:
for ac in accounts:
    print(results_all[ac].keys())

dict_keys(['2020-07-22', '2020-07-23', '2020-07-24', '2020-07-25', '2020-07-26', '2020-07-27', '2020-07-28', '2020-07-29', '2020-07-30', '2020-07-31', '2020-08-01', '2020-08-02', '2020-08-03', '2020-08-04', '2020-08-05', '2020-08-06', '2020-08-07', '2020-08-08', '2020-08-09', '2020-08-10', '2020-08-11', '2020-08-12', '2020-08-13', '2020-08-15', '2020-08-16', '2020-08-17', '2020-08-18', '2020-08-19', '2020-08-20', '2020-08-21', '2020-08-22', '2020-08-14', '2020-08-23', '2020-08-24', '2020-08-25', '2020-08-26', '2020-08-27', '2020-08-28'])
dict_keys(['2020-07-22', '2020-07-23', '2020-07-24', '2020-07-25', '2020-07-26', '2020-07-27', '2020-07-28', '2020-07-29', '2020-07-30', '2020-07-31', '2020-08-01', '2020-08-02', '2020-08-03', '2020-08-04', '2020-08-05', '2020-08-06', '2020-08-07', '2020-08-08', '2020-08-09', '2020-08-10', '2020-08-11', '2020-08-12', '2020-08-13', '2020-08-15', '2020-08-16', '2020-08-17', '2020-08-18', '2020-08-19', '2020-08-20', '2020-08-21', '2020-08-22', '2020-08-14