# Introduction to Social Media: Twitter

-----

When looking for data to use for text data processing, one of the more popular data sources is [Twitter][tw]. In this notebook, we introduce the Twitter API, and demonstrate how to use the Twitter API from within a Python gram to acquire and process tweets, or Twitter messages. First, we review the mechanisms by which an application authenticates with Twitter. Next, we discuss different techniques for interacting with the twitter data stream by using the twitter API. Finally, we construct a tweet sentiment analysis pipeline before applying this pipeline to new tweets from a specific user.

-----
[tw]: https://www.twitter.com

## Table of Contents

[Python and Twitter](#Python-and-Twitter)

[Reading Twitter Data](#Reading-Twitter-Data)

[Obtaining Tweets](#Obtaining-Tweets)

[Searching for Tweets](#Searching-for-Tweets)

[Trending Topics](#Trending-Topics)

[Twitter Text Analysis](#Twitter-Text-Analysis)

- [Blind Testing](#Blind-Testing)
- [Classifying New Tweets](#Classifying-New-Tweets)
-----

Before proceeding with the rest of this notebook, we first include the notebook setup code.

-----

In [1]:
# Standard imports
import numpy as np
import nltk
import pprint

-----

[[Back to TOC]](#Table-of-Contents)

## Python and Twitter

To work with the Twitter API from within a Python program, we need a Python library that wraps the official [Twitter API][twapi]. There are a number of different Python libraries that provide this capability, we will use the [tweepy][tpy] library, which is fairly popular and provides a fairly complete interface.

The full Twitter API is large and robust (and continuous to evolve), for this course we will restrict our attention to several basic concepts, namely authenticating to Twitter, searching for Tweets, and digesting the messages.

----
[twapi]: https://dev.twitter.com
[tpy]: http://www.tweepy.org

In [2]:
# Import python twitter library
import tweepy as tw

-----

[[Back to TOC]](#Table-of-Contents)

## Reading Twitter Data

To read Twitter data, you need to first need to be a registered Twitter user and you need to create a new _Twitter Application_ in order to obtain credentials for connecting to Twitter and querying to the Twitter data. You create (and later manage) Twitter applications by visiting the [Twitter Application Management](https://apps.twitter.com) website.

![Twitter App Sign-in](images/twitter-app-signin.png)

At this point you need to authenticate with Twitter, if you are already logged in to Twitter on your computer (for instance by using the Twitter website) you should already be authenticated. If you are not authenticated, click the _sign in_ link to be directed to the Twitter sign in page where you can enter your credentials (if you do not have Twitter credentials, you will need to obtain a Twitter account to proceed).

![Twitter Sign-in](images/twitter-signin.png)

After you have been authenticated, you will be redirected to the Twitter apps page. If you have never created a Twitter application, you will have nothing listed. To create a new application, press the _Create New App_ button, as shown in the following screenshot.

![Twitter Create App](images/twitter-create.png)

This will open up the Twitter _Create an application_ webpage, where you need to supply some basic information for your Twitter application such as an application name, description, and website.

![Twitter Application details](images/twitter-appdetails.png)

Scroll to the bottom of this webpage where the **Developer Agreement** is located. Following this agreement, is a check box that you should click to signify you agree to be bound by the agreement (of course you should read this to be sure you do _agree_ with it first). Following this, press the _Create your Twitter application_ button as shown in the following screenshot.

![Twitter Agree](images/twitter-agree.png)

This will create your new application, and provide you with your application webpage, which will be similar to the following screenshot. 

![Twitter App-page](images/twitter-apppage.png)

While you can control a number of application features from this webpage, the most important tasks to complete include:

1. Change your application to _read-only_ in case it is set to read-write.

2. Obtain the application **Consumer Key** and **Consumer Secret**.

3. Obtain your personal **Access Token** and **Access Token Secret**.

You should change your application read-only to ensure you don't accidentally send data out to Twitter. You change this by selecting the _Permissions_ tab and selecting _Read only_, shown in the following screenshot. To save this setting, scroll down this webpage and click the _Update Settings_ button at the bottom of the page.

![Twitter Read Only Setting](images/twitter-ro.png)

These credentials can be found by selecting the _Keys and Access Tokens_ tab, and scrolling down appropriately as shown in the following two screenshots.

![Twitter Consumer Application Credentials](images/twitter-consume.png)

![Twitter User Credentials](images/twitter-access.png)

<font color='red'> Warning: Never share these credentials with others or they will be able to fully impersonate you on Twitter! </font>

You can directly copy these credentials into your notebook, or, alternatively, save them into a file (for example by opening a terminal window and using `vim` to create a text file. In the rest of this notebook, I demonstrate this functionality by using my credentials, which I have saved into a file called `twitter.cred`. In this empty file, which is in your github repository, I have saved the following four credentials in order:

1. Access Token
2. Access Token Secret
3. Consumer Key
4. Consumer Secret

If you are using git for source code management and control, you can inform `git` to ignore changes in the `twitter.cred` file by using the following command in the current module directory:

```bash
git update-index --assume-unchanged notebooks/twitter.cred 
```

The following Code cell demonstrates how these credentials are read from the file and used to properly authenticate our application with Twitter.

-----

In [3]:
tokens = []

# Order: Access Token, Access Token Secret, Consumer Key, Consumer SecretAccess

with open("twitter.cred", 'r') as fin:
    for line in fin:
        if line[0] != '#': # Not a comment line
            tokens.append(line.rstrip('\n'))

auth = tw.OAuthHandler(tokens[2], tokens[3])
auth.set_access_token(tokens[0], tokens[1])

api = tw.API(auth)

user = api.me()

print("Twitter Screen Name: ", user.screen_name)
print("Twitter Follower Count: ", user.followers_count)

Twitter Screen Name:  ProfBrunner
Twitter Follower Count:  135


-----

If the previous Code cell runs without an error, you have successfully connected to Twitter. If you are new to Twitter and are not following anyone, you can instead display the user information for a different Twitter user. For example, the following code would display my Twitter information.

```python
user = api.get_user('ProfBrunner')
```

Replacing `ProfBrunner` with any valid Twitter user id will display their information. You can find examples by looking at those Twitter users you (or `ProfBrunner`) follow. Or, alternatively, you could chose a specific twitter account; for example, to analyze the _NY Times_ Twitter account you would use the following statement:

```python
user = api.get_user('NYTimes')
```

This is demonstrated in the following Code cell.

-----

In [4]:
user = api.get_user('nytimes')

print("Twitter Screen Name: ", user.screen_name)
print("Twitter Follower Count: ", user.followers_count)

print("\nThis user follows:\n--------------")
for friend in user.friends():
    print(friend.screen_name)

Twitter Screen Name:  nytimes
Twitter Follower Count:  41141220

This user follows:
--------------
TinaJordanNYT
ByMattStevens
NellieBowles
Jonesieman
jessicabennett
Pfraboni
RBlumenstein
TheSteinLine
malika_andrews
taffyakner
jialynnyang
johannabarr
kevinmdraper
itsjina
Jan_Ransom
mega2e
bencasselman
jimtankersley
Tmgneff
deborah_solomon


-----

At any point, you can return to your Twitter application management webpage to view your new application. You can now view and manage your existing application, or create a new application as shown in the following screenshot.

![Twitter new app management](images/twitter-manage.png)

-----

-----

<font color='red' size = '5'> Student Exercise </font>


To run the Twitter application in the preceding cells, you will need to register your own Twitter Application. To do so, complete the following steps.

1. Create a New Twitter application.

2. Save your Twitter credentials and Application credentials into the provided `twitter.cred` file.

3. Run the _tweepy_ sample code to connect to Twitter and display your Twitter user information.

Finally, try running the preceding code, but for someone famous (if you do not know the Twitter handle for someone famous, Google will be your friend). 

-----

[[Back to TOC]](#Table-of-Contents)

## Obtaining Tweets

Once you have authenticated with Twitter, you can begin to [search the Twitter stream][stw] for tweets of interest. The easiest method to get started is to being with your own (or another specific Twitter user's) own Twitter feed. To access your own Twitter feed, you can simply use your `home_timeline` to retrieve your own Tweets or Tweets from those whom you follow. This is demonstrated in the following Code cell, where we display the `text` values from the ten most recent Tweets from our timeline.

-----
[stw]: https://dev.twitter.com/rest/public/search

In [5]:
for tweet in tw.Cursor(api.home_timeline).items(10):
    # Process a single status
    print(tweet.text) 

RT @openminedorg: We're very excited to announce our next HACKATHON is this Friday, Saturday, and Sunday the 16th-18th of February. We'll b…
We 🚨EMERGENCY PODCASTED🚨 on today's Russia indictments: https://t.co/0gn1Cw6ZiK
Dilbert Classics (February 16): https://t.co/T4GIWyQyl2
RT @quantopian: The Machine Learning on Quantopian algo has been updated to fit the constraints of our new daily contest. Learn more about…
Legendre published the method of least squares in 1805.
This Russian trolling operation basically sounds like the FiveThirtyEight newsroom. https://t.co/sYzVAkJZv8
RT @PyData: Ever thought that pandas documentation could be better? 
March 10th is your opportunity to help improve it! 

Join a sprint in…
https://t.co/INfAObhlQr
RT @rickyyean: Can you identify the fraudulent Tweet? @coinbase https://t.co/EZhBqNQdxV
Case in point... one of my most upvoted StackOverflow answers, which really is absurd when you think about it https://t.co/avCyIp4F0E


-----

[[Back to TOC]](#Table-of-Contents)

## Searching for Tweets

Twitter also provides the capability to search for specific tweets by using the Tweepy [`search` method][twse]. In this method, you supply a query string (and optional arguments) and are returned a list of Tweets. The query string should follow the [Twitter Search API][tsa], but basically you can search for specific text in a string by using the text of interest, you can search for a person by using the `@` character followed by their Twitter username, and hashtags by using the `#` character followed by the tag text.

-----
[twse]: http://docs.tweepy.org/en/stable/api.html#API.search
[tsa]: https://dev.twitter.com/rest/public/search

In [6]:
# Hash Tage search: term = '#python'
# User search: term = '@nytimes'
# Keyword search: term = 'data science'
# Keyword and Sentiment: term ='data science :)' # Positive attitute

term ='data science :)'
num_tweets = 5

for tweet in tw.Cursor(api.search, q=term).items(num_tweets):
    # Process a single status
    print("Tweet ID:", tweet.id)
    print('Tweeted by ', tweet.user.screen_name)
    print("Created at ",tweet.created_at)
    print("Location: ",tweet.source)
    print('Tweet Text: ', tweet.text)
    print('-------------------------')

Tweet ID: 964604931247353856
Tweeted by  markmcan
Created at  2018-02-16 20:58:31
Location:  Twitter for Android
Tweet Text:  @daveweeden @stephen_wigmore @stephenpollard @DamCou @Oxfam @TomChivers The title and byline say it. Regardless, ho… https://t.co/Rj6XfK1Oy3
-------------------------
Tweet ID: 964600611298832384
Tweeted by  nicoleradziwill
Created at  2018-02-16 20:41:21
Location:  Twitter Web Client
Tweet Text:  @AutismSite @aspergersgirls My achilles heel is being overly enthusiastic. BUT I LOVE DATA SCIENCE FOR QUALITY IMPR… https://t.co/Q0cPBzQ8i0
-------------------------
Tweet ID: 964579887146291206
Tweeted by  nschaetti
Created at  2018-02-16 19:19:00
Location:  pyTweetInfoBot
Tweet Text:  RT @MahSayedSalem: @el_ostaa Thought the same in the beginning dear. Check the courses &amp; the content of specializations. You will be amazed…
-------------------------
Tweet ID: 964563409713774593
Tweeted by  lighthouse_labs
Created at  2018-02-16 18:13:32
Location:  Twitter for iPh

-----

We can view the available attributes to display by using Python `dir` method to perform introspection. In the following Code cell we explicitly remove _class_ methods to minimize the display list and focus on the items of interest. After this, we display the Tweet in its raw JSON format by accessing the `_json` attribute.

-----

In [7]:
# Print out single tweet

pp = pprint.PrettyPrinter(indent=2, depth=2, width=80, compact=True)

tweets = api.search(q='Deloitte', rpp=1)

pp.pprint([att for att in dir(tweets) if '__' not in att])

[ '_max_id', '_since_id', 'append', 'clear', 'completed_in', 'copy', 'count',
  'extend', 'ids', 'index', 'insert', 'max_id', 'next_results', 'parse', 'pop',
  'query', 'refresh_url', 'remove', 'reverse', 'since_id', 'sort']


In [8]:
# Pick a single tweet to analyze
tweet = tweets[1]
pp.pprint([att for att in dir(tweet) if '__' not in att])

[ '_api', '_json', 'author', 'contributors', 'coordinates', 'created_at',
  'destroy', 'entities', 'favorite', 'favorite_count', 'favorited', 'geo', 'id',
  'id_str', 'in_reply_to_screen_name', 'in_reply_to_status_id',
  'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str',
  'is_quote_status', 'lang', 'metadata', 'parse', 'parse_list', 'place',
  'possibly_sensitive', 'retweet', 'retweet_count', 'retweeted', 'retweets',
  'source', 'source_url', 'text', 'truncated', 'user']


In [9]:
# We can display the message data in JSON format
pp.pprint(tweet._json)

{ 'contributors': None,
  'coordinates': {'coordinates': [...], 'type': 'Point'},
  'created_at': 'Fri Feb 16 22:09:15 +0000 2018',
  'entities': { 'hashtags': [...],
                'symbols': [],
                'urls': [...],
                'user_mentions': []},
  'favorite_count': 0,
  'favorited': False,
  'geo': {'coordinates': [...], 'type': 'Point'},
  'id': 964622732364210176,
  'id_str': '964622732364210176',
  'in_reply_to_screen_name': None,
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str': None,
  'is_quote_status': False,
  'lang': 'en',
  'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
  'place': { 'attributes': {},
             'bounding_box': {...},
             'contained_within': [],
             'country': 'United States',
             'country_code': 'US',
             'full_name': 'Dallas, TX',
             'id': '18810aa5b43e76c7',
             'name': 'Dallas',
   

-----

[[Back to TOC]](#Table-of-Contents)

## Trending Topics

Twitter tracks tweet data to identify topics or people that are being frequently mentioned, which is known as [_trending_][twtr]. The Twitter API enables an application to obtain a list of currently trending topics. These topics include metadata that can be used to learn more about trending topics. One component of the metadata is the physical location of the trending topic. This location is encoded as a [**WOEID**][woeid], which is a Yahoo developed standard that is short for _where on the earth ID_. In the first Code cell below, we demonstrate obtaining the locations of currently trending topics before displaying these physical locations. In the second Code cell, we display the complete metadata for one location, which can be used to obtain a list of trending topics for a particular location on Earth, via the WOEID. Note that since trending topics change, this notebook will provide different results when run at different times.

----
[twtr]: https://dev.twitter.com/rest/reference/get/trends/available
[woeid]: https://developer.yahoo.com/geo/geoplanet/guide/concepts.html

In [10]:
# Returns a JSON object that contains (a large number of) locations 
# that are currently trending.

top_display = 20
trending = api.trends_available()

# We skip first value, which is entry for the World in JSON.
for trend in trending[1:top_display]:
    print('WOEID Code ({2:d}): {0}, {1}'.format(trend['name'], \
                                                trend['country'], trend['woeid']))

WOEID Code (2972): Winnipeg, Canada
WOEID Code (3369): Ottawa, Canada
WOEID Code (3444): Quebec, Canada
WOEID Code (3534): Montreal, Canada
WOEID Code (4118): Toronto, Canada
WOEID Code (8676): Edmonton, Canada
WOEID Code (8775): Calgary, Canada
WOEID Code (9807): Vancouver, Canada
WOEID Code (12723): Birmingham, United Kingdom
WOEID Code (12903): Blackpool, United Kingdom
WOEID Code (13383): Bournemouth, United Kingdom
WOEID Code (13911): Brighton, United Kingdom
WOEID Code (13963): Bristol, United Kingdom
WOEID Code (15127): Cardiff, United Kingdom
WOEID Code (17044): Coventry, United Kingdom
WOEID Code (18114): Derby, United Kingdom
WOEID Code (19344): Edinburgh, United Kingdom
WOEID Code (21125): Glasgow, United Kingdom
WOEID Code (25211): Hull, United Kingdom


In [11]:
pp.pprint(trending[10])

{ 'country': 'United Kingdom',
  'countryCode': 'GB',
  'name': 'Blackpool',
  'parentid': 23424975,
  'placeType': {'code': 7, 'name': 'Town'},
  'url': 'http://where.yahooapis.com/v1/place/12903',
  'woeid': 12903}


In [12]:
# We can use a WOEID to find location specific trends.
# Here we use the WOEID for the UK (from previous example)

top_display = 10

print("UK Trends")
print(10*'-')

for trends in api.trends_place(id = 23424975):
    for trend in trends["trends"][:top_display]:
        print("  {0:s}".format(trend["name"]))

UK Trends
----------
  #FirstDates
  #CHEHUL
  #TOTP
  #cruisingwithjanemcdonald
  #SLWidWar
  Gareth Barry
  Kyle Scott
  Jamie Vardy
  Drake
  Roger Federer


-----

<font color='red' size = '5'> Student Exercise </font>


In the preceding cells, we used the twitter API to obtain tweets and to identify trending topics. Now that you have run the notebook, try making the following changes.

1. Pick a particular twitter user and search their twitter stream.
2. Pick a different location from the trending topics location list, and identify trending topics from a different WOEID.

-----

[[Back to TOC]](#Table-of-Contents)


## Twitter Text Analysis

We can now develop a text analysis project that uses twitter data. To simplify the application, we will use the NLTK twitter corpus. Otherwise, we would need a separate notebook to obtain the necessary tweets (because of the twitter rate limitation). The [NLTK twitter corpus][ntw] includes thirty thousand tweets retrieved from the twitter streaming API, the data have been cached on our course JupyterHub server. The tweets were explicitly selected from a recent election in the United Kingdom and one-third of them have been classified into positive or negative (with equal numbers of each). With these tweets, we can build a classification pipeline to perform sentiment analysis on twitter data.

In the following Code cells, we first obtain the tweets, build the NumPy arrays for our classification pipeline, before constructing and testing this simple sentiment analysis text analysis application.

-----
[ntw]: http://www.nltk.org/howto/twitter.html#Using-a-Tweet-Corpus

In [13]:
tws = nltk.corpus.twitter_samples

pos_tweets = np.array(tws.strings('positive_tweets.json'))
neg_tweets = np.array(tws.strings('negative_tweets.json'))

pos_labels = np.ones(pos_tweets.shape[0])
neg_labels = np.zeros(neg_tweets.shape[0])

targets = np.concatenate((pos_labels, neg_labels), axis=0)
data = np.concatenate((pos_tweets, neg_tweets), axis = 0)

print('{0} Positive Tweets'.format(pos_tweets.shape[0]))
print('{0} Negative Tweets'.format(neg_tweets.shape[0]))

5000 Positive Tweets
5000 Negative Tweets


-----

We will employ 75% of the data for training, with 25% held out for validation. The classification pipeline will use a simply tokenizer to build a document-term matrix before applying a Naive Bayes classifier. The tokenizer will use _English_ stop words, will convert the text to all lowercase, and includes both unigrams and bigrams. Overall, this simple classification pipeline gives reasonable results.

-----

In [14]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(
    data, targets, test_size=0.25, random_state=23)

In [15]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn import metrics

tools = [('cv', CountVectorizer()), ('nb', MultinomialNB())]
pclf = Pipeline(tools)


# Lowercase, English Stop Words, and unigrams and bigrams.
pclf.set_params(cv__stop_words = 'english', \
                cv__ngram_range=(1,2), \
                cv__lowercase=True)

pclf.fit(x_train, y_train)
y_pred = pclf.predict(x_test)
print(metrics.classification_report(y_test, y_pred, target_names = ['Positive', 'Negative']))

             precision    recall  f1-score   support

   Positive       0.73      0.79      0.76      1240
   Negative       0.78      0.71      0.74      1260

avg / total       0.75      0.75      0.75      2500



-----

[[Back to TOC]](#Table-of-Contents)

### Blind Testing

We can use the remaining twenty thousand tweets in the NLTK corpus for blind testing. We first obtain the tweets as a NumPy array, before applying our sentiment analysis pipeline. Finally, we display the relative numbers of positive and negative classifications and we display examples of both positive and negative classified tweets.

-----

In [16]:
unknown_tweets = np.array(tws.strings('tweets.20150430-223406.json'))
unknown_pred = pclf.predict(unknown_tweets)

unknown_pos = unknown_tweets[unknown_pred == 1]
unknown_neg = unknown_tweets[unknown_pred == 0]

In [17]:
tweet_idx = 101

print(f'{unknown_tweets.shape[0]} tweets to classify.')
print(f'{unknown_pos.shape[0]} tweets classified as positive.')
print(f'{unknown_neg.shape[0]} tweets classified as negative.')

print(75*'-')
print('Sample Positve Tweet:')
print(75*'-')
print(unknown_pos[tweet_idx])

print(75*'-')
print('Sample Negatve Tweet:')
print(75*'-')
print(unknown_neg[tweet_idx])

20000 tweets to classify.
8508 tweets classified as positive.
11492 tweets classified as negative.
---------------------------------------------------------------------------
Sample Positve Tweet:
---------------------------------------------------------------------------
"David Cameron: smooth, smiley but unconvincing" #bbcqt http://t.co/mJ2ZkX1TjB
---------------------------------------------------------------------------
Sample Negatve Tweet:
---------------------------------------------------------------------------
RT @DouglasDaniel: Miliband's new line 'if you don't vote Labour in Scotland I will punish you by letting the Tories in'.


-----

[[Back to TOC]](#Table-of-Contents)

### Classifying New Tweets

We can now combine everything covered in this notebook in order to apply our trained sentiment analysis pipeline on new twitter data. For this, we pick _random_ user, in this case CNN Political Twitter Feed (note this wasn't completely random. The training data was obtained from a similar feeds). We first obtain tweets from this user before creating the NumPy arrays to use with our scikit learn classifier. Finally, we display the results of this sentiment analysis classifier. Note, since the tweets will change over time, the results presented in this notebook will also change.

-----

In [18]:
newtweets = api.user_timeline(screen_name='@CNNPolitics', include_rts=False, count=100)
                           
print(f'We obtained a sample of {len(newtweets)} tweets')

We obtained a sample of 99 tweets


In [19]:
messages = []

for tweet in newtweets:
    messages.append(tweet.text)
    
new_tweets = np.array(messages)
new_pred = pclf.predict(new_tweets)

new_pos = new_tweets[new_pred == 1]
new_neg = new_tweets[new_pred == 0]

In [20]:
# Pick a tweet index
tweet_idx = 13

print(f'{new_tweets.shape[0]} tweets to classify.')
print(f'{new_pos.shape[0]} tweets classified as positive.')
print(f'{new_neg.shape[0]} tweets classified as negative.')

print(75*'-')
print('Sample Positve Tweet:')
print(75*'-')
print(new_pos[tweet_idx])

print(75*'-')
print('Sample Negatve Tweet:')
print(75*'-')
print(new_neg[tweet_idx])

99 tweets to classify.
53 tweets classified as positive.
46 tweets classified as negative.
---------------------------------------------------------------------------
Sample Positve Tweet:
---------------------------------------------------------------------------
Deputy Attorney General Rod Rosenstein will announce the indictments against 13 Russian nationals over interference… https://t.co/CTzsXgVy1E
---------------------------------------------------------------------------
Sample Negatve Tweet:
---------------------------------------------------------------------------
Amid renewed debate over gun control after the school shooting in Florida, House Speaker Paul Ryan says that now is… https://t.co/lc4Xwuqcyg


-----

<font color='red' size = '5'> Student Exercise </font>


In the preceding cells, we build a sentiment analysis classification pipeline by using the NLTK corpus before applying it to new tweets. Now that you have run the notebook, try making the following changes.

1. Modify the pipeline parameters, for example, apply stemming, change the `max_features`, or the number of n-grams. Can you improve the classification results on the validation data?

2. Change the type of classification algorithm (e.g., random forest or SVC with regularization). Can you improve the classification results on the validation data?

3. Using your new classification pipeline, examine the performance on the current twitter user. Look at other tweets, does the performance improve?

4. Try classifying tweets from a different user, either an _election_ type feed or a popular figure. By looking at select tweets and their classification, comment on how your classifier performs?

Finally, why do you think the classification pipeline performs in the manner that is does (i.e., why are some tweets classified negative/positive)? Feel free to use the class forums.

-----

## Ancillary Information

The following links are to additional documentation that you might find helpful in learning this material. Reading these web-accessible documents is completely optional.

1. Wikipedia article on [twitter][wt]  
1. Wikipedia article on [Social Media][wsm]  
1. Twitter, [official documentation][tod]  
1. Map of a [twitter status object][mtso]  
1. [Tweepy][twd]: Python Twitter client documentation (Getting Started and Authentication Tutorial)  
1. [Using nltk][unt] with twitter (note uses a different twitter client library)  
1. Blog demonstrating [twitter access via URLs][tu]  
1. Blog article on collecting tweets by using [streaming api][tsa]  
1. **Chapter 1: Mining Twitter** from _Mining the Social Web_, [Jupyter notebook][msw1]  
1. **Chapter 9: Twitter Cookbook** from _Mining the Social Web_, [Jupyter notebook][msw1]  

-----

[wt]: https://en.wikipedia.org/wiki/Twitter
[wsm]: https://en.wikipedia.org/wiki/Social_media

[twd]: http://tweepy.readthedocs.org
[tod]: https://dev.twitter.com/overview/documentation

[unt]: http://www.nltk.org/howto/twitter.html

[tsa]: http://badhessian.org/2012/10/collecting-real-time-twitter-data-with-the-streaming-api/
[tu]: http://nealcaren.web.unc.edu/pizza-twitter-and-apis/

[msw1]: https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/ipynb/Chapter%201%20-%20Mining%20Twitter.ipynb
[msw9]: https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/ipynb/Chapter%209%20-%20Twitter%20Cookbook.ipynb
[mtso]: http://online.wsj.com/public/resources/documents/TweetMetadata.pdf

**&copy; 2017: Robert J. Brunner at the University of Illinois.**

This notebook is released under the [Creative Commons license CC BY-NC-SA 4.0][ll]. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.

[ll]: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode