# REST API

In [10]:
import json
import tweepy
from tweepy import TweepError
import logging
from pathlib import Path
from pprint import pprint

## Accessing Twitter through Code

Twitter provides an API for downloading tweet data in large batches.  The `tweepy` package makes it fairly easy to use.

There are instructions on using `tweepy` [here](http://tweepy.readthedocs.io/en/v3.5.0/getting_started.html), but we will give you example code.

Twitter requires you to have authentication keys to access their API.  To get your keys, you'll have to sign up as a Twitter developer.  The next question will walk you through this process.

## Setting Up an Account

Follow the instructions below to get your Twitter API keys.  **Read the instructions completely before starting.**

1. [Create a Twitter account](https://twitter.com).  You can use an existing account if you have one; if you prefer to not do this assignment under your regular account, feel free to create a throw-away account.
2. Under account settings, add your phone number to the account.
3. [Create a Twitter developer account](https://dev.twitter.com/resources/signup) by clicking the 'Apply' button on the top right of the page. Attach it to your Twitter account. You'll have to fill out a form describing what you want to do with the developer account. Explain that you are doing this for a class at NYU.
4. Once you're logged into your developer account, [create an app for this demo](https://apps.twitter.com/app/new).  You can call it whatever you want, and you can write any URL when it asks for a web site.  You don't need to provide a callback URL.
5. On the page for that application, find your Consumer Key and Consumer Secret.
6. On the same page, create an Access Token.  Record the resulting Access Token and Access Token Secret.
7. Edit the file [keys.json](keys.json) and replace the placeholders with your keys.  


## Caution


### Protect your Twitter Keys
<span style="color:red">
If someone has your authentication keys, they can access your Twitter account and post as you!  So don't give them to anyone, and **don't write them down in this notebook**. 
</span>
The usual way to store sensitive information like this is to put it in a separate file and read it programmatically.  That way, you can share the rest of your code without sharing your keys.  That's why we're asking you to put your keys in `keys.json` for this assignment.


### Avoid making too many API calls.

<span style="color:red">
Twitter limits developers to a certain rate of requests for data.  If you make too many requests in a short period of time, you'll have to wait awhile (around 15 minutes) before you can make more.  </span> 
So carefully follow the code examples you see and don't rerun cells without thinking.  Instead, always save the data you've collected to a file.  We've provided templates to help you do that.


### Be careful about which functions you call!

<span style="color:red">
This API can retweet tweets, follow and unfollow people, and modify your twitter settings.  Be careful which functions you invoke! </span> 
</span>


In [14]:
key_file = 'keys.json'
# Loading your keys from keys.json 
with open(key_file) as f:
    keys = json.load(f)

<span style="color:red">
If you print or view the contents of keys be sure to delete the cell!

Remember to add keys.json to .gitignore
</span>

This cell tests the Twitter authentication. It should run without errors or warnings and display your Twitter username.

In [16]:
try:
    auth = tweepy.OAuthHandler(keys["consumer_key"], keys["consumer_secret"])
    auth.set_access_token(keys["access_token"], keys["access_token_secret"])
    api = tweepy.API(auth, wait_on_rate_limit=True)
    print("Your username is:", api.auth.get_username())
except TweepError as e:
    logging.warning("There was a Tweepy error. Double check your API keys and try again.")
    logging.warning(e)

Your username is: CdsAccount


## Downloading Tweets

In the example below, we have loaded some tweets by @NYUDataScience.  Run it and read the code.

In [17]:
num_tweets_to_download = 1000
max_queries_to_make = 100

ds_tweets_save_path = "NYU_CDS_recent_tweets.json"
# Guarding against attempts to download the data multiple
# times:
if not Path(ds_tweets_save_path).is_file():
    # Getting as many recent tweets by @NYUDataScience as Twitter will let us have.
    # We use tweet_mode='extended' so that Twitter gives us full 280 character tweets.
    
    # Note that we're not guaranteed how many items tweepy.Cursor() returns
    # So we need to continue iterating until we have as many tweets as we want
    
    example_tweets = []
    query_count = 0
    min_id_reached = None
    
    while len(example_tweets) < num_tweets_to_download and query_count < max_queries_to_make:
        # The tweepy Cursor API actually returns "sophisticated" Status objects but we 
        # will use the basic Python dictionaries stored in the _json field. 
        batch = [t._json for t in tweepy.Cursor(api.user_timeline, id="NYUDataScience", 
                                                count=num_tweets_to_download,
                                                max_id=min_id_reached,
                                                tweet_mode='extended').items()]
        if len(batch) == 0:
            continue
        
        min_id_reached = min([t['id'] for t in batch])
        example_tweets.extend(batch)
        query_count += 1
    
    # Saving the tweets to a json file on disk for future analysis
    with open(ds_tweets_save_path, "w") as f:        
        json.dump(example_tweets, f)

# Re-loading the json file:
with open(ds_tweets_save_path, "r") as f:
    example_tweets = json.load(f)

Assuming everything ran correctly you should be able to look at the first tweet by running the cell below.

<span style="color:red">
**Warning** Do not attempt to view all the tweets in a notebook.  It will likely freeze your browser.  The following would be a **bad idea**:
```python
    pprint(example_tweets)
```

</span> 

In [18]:
# Looking at one tweet object, which has type Status: 
 # ...to get a more easily-readable view.
pprint(example_tweets[0])

{'contributors': None,
 'coordinates': None,
 'created_at': 'Thu Dec 12 17:00:00 +0000 2019',
 'display_text_range': [0, 280],
 'entities': {'hashtags': [], 'symbols': [], 'urls': [], 'user_mentions': []},
 'favorite_count': 2,
 'favorited': False,
 'full_text': 'Have you listened to CDSâ€™s Admissions Podcast yet? In Episode '
              '2, MS alumna &amp; 2nd year PhD student Katrina Evtimova '
              'discusses mastering the PhD application, life in New York, '
              '&amp; getting set up for research at CDS. Listen here and find '
              'us on your streaming service of choice:',
 'geo': None,
 'id': 1205170447995699200,
 'id_str': '1205170447995699200',
 'in_reply_to_screen_name': None,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'is_quote_status': False,
 'lang': 'en',
 'place': None,
 'retweet_count': 1,
 'retweeted': False,
 'source': '<a href="https://ads-api.twit