# INST728E - Module 4. Collecting Social Media Data

This notebook contains examples for using web-based APIs (Application Programmer Interfaces) to download data from social media platforms.
Our examples will include:

- Reddit
- Facebook
- Twitter

For most services, we need to register with the platform in order to use their API.
Instructions for the registration processes are outlined in each specific section below.

We will use APIs because they *can* be much faster than manually copying and pasting data from the web site, APIs provide uniform methods for accessing resources (searching for keywords, places, or dates), and it should conform to the platform's terms of service (important for partnering and publications).
Note however that each of these platforms has strict limits on access times: e.g., requests per hour, search history depth, maximum number of items returned per request, and similar.

In [90]:
%matplotlib inline

import json

<hr>
<img src="http://www.cs.umd.edu/~cbuntain/inst728e/RedditLogo.jpg" width="20%">

## Topic 2.1: Reddit API

Reddit's API used to be the easiest to use since it did not require credentials to access data on its subreddit pages.
Unfortunately, this process has been changed, and developers now need to create a Reddit application on Reddit's app page located here: (https://www.reddit.com/prefs/apps/).

In [91]:
# For our first piece of code, we need to import the package 
# that connects to Reddit. Praw is a thin wrapper around reddit's 
# web APIs and works well

import praw

### Creating a Reddit Application
Go to https://www.reddit.com/prefs/apps/.
Scroll down to "create application", select "web app", and provide a name, description, and URL (which can be anything).

After you press "create app", you will be redirected to a new page with information about your application. Copy the unique identifiers below "web app" and beside "secret". These are your client_id and client_secret values, which you need below.

<img src="http://www.cs.umd.edu/~cbuntain/inst728e/reddit_screens/0-001.png" scale="10%"/>
<img src="http://www.cs.umd.edu/~cbuntain/inst728e/reddit_screens/1-002.png" scale="20%"/>
<img src="http://www.cs.umd.edu/~cbuntain/inst728e/reddit_screens/1-003a.png" scale="10%"/>

In [92]:
# Now we specify a "unique" user agent for our code
# This is primarily for identification, I think, and some
# user-agents of bad actors might be blocked
redditApi = praw.Reddit(client_id='OdpBKZ1utVJw8Q',
                        client_secret='KH5zzauulUBG45W-XYeAS5a2EdA',
                        user_agent='crisis_informatics_v01')

### Capturing Reddit Posts

Now for a given subreddit, we can get the newest posts to that sub. 
Post titles are generally short, so you could treat them as something similar to a tweet.

In [93]:
subreddit = "worldnews"

targetSub = redditApi.subreddit(subreddit)

submissions = targetSub.new(limit=10)
for post in submissions:
    print(post.title)

Jeff Bezos is the richest person in history
Collapse of Huawei-AT&T deal ‘will threaten China-US trade ties’
Julian Assange's stay in London embassy untenable, says Ecuador
Israeli PM Netanyahu’s son, seeking cash for stripper, brags of US$20billion deal for friend’s father
World Health Organization to declare gaming addiction a mental disorder.
Fusion GPS: Trump-Russia firm chief's transcript released
‘Swatting’ suspect Tyler Barriss now has warrants issued in Calgary, Canada
Facebook has settled a landmark legal action case in which a 14-year-old girl sued Facebook, after a man allegedly posted a naked photo of her on the website
Germany's Foreign Minister: 'We Are Seeing What Happens When the U.S. Pulls Back'


### Leveraging Reddit's Voting

Getting the new posts gives us the most up-to-date information. 
You can also get the "hot" posts, "top" posts, etc. that should be of higher quality. 
In theory.
__Caveat emptor__

In [94]:
subreddit = "worldnews"

targetSub = redditApi.subreddit(subreddit)

submissions = targetSub.hot(limit=5)
for post in submissions:
    print(post.title)

North Korea to join Olympics in South Korea
Trump-Russia: Senator Dianne Feinstein releases testimony of dossier firm boss
Plastic microbeads can no longer be used in cosmetics and personal care products in the UK, after a long-promised ban came into effect on Tuesday. The ban initially bars the manufacture of such products and a ban on sales will follow in July.
Heartbroken scientists lament the likely loss of ‘most of the world’s coral reefs’: 'Scientists surveyed 100 reefs around the world and found that extreme bleaching events that once occurred every 25 or 30 years now happen about every five or six years.'
Russian historian who exposed Stalin's crimes faces enforced psychiatric testing


### Following Multiple Subreddits

Reddit has a mechanism called "multireddits" that essentially allow you to view multiple reddits together as though they were one.
To do this, you need to concatenate your subreddits of interesting using the "+" sign.

In [95]:
subreddit = "worldnews+news"

targetSub = redditApi.subreddit(subreddit)
submissions = targetSub.new(limit=10)
for post in submissions:
    print(post.title, post.author)

Jeff Bezos is the richest person in history Petroleum-Engineer
Democratic Senators clear key hurdle to voting against the FCC's repeal of Net Neutrality itsBonez
Collapse of Huawei-AT&T deal ‘will threaten China-US trade ties’ dcismia
A friend of Kurt Cobain made Demotapes from Nirvana public that Kurt gave him. DJKaito
Julian Assange's stay in London embassy untenable, says Ecuador TexanDemocrat
Israeli PM Netanyahu’s son, seeking cash for stripper, brags of US$20billion deal for friend’s father Waldongrado
World Health Organization to declare gaming addiction a mental disorder. AlKarakhboy
Dad turns in teenage son after finding child pornography on cell phone MIngmire
Fusion GPS: Trump-Russia firm chief's transcript released The_man_who_sold
13,000 tourists stranded at ski resort amid avalanche fears brotogeris1


### Accessing Reddit Comments

While you're never supposed to read the comments, for certain live streams or new and rising posts, the comments may provide useful insight into events on the ground or people's sentiment.
New posts may not have comments yet though.

Comments are attached to the post title, so for a given submission, you can pull its comments directly.

Note Reddit returns pages of comments to prevent server overload, so you will not get all comments at once and will have to write code for getting more comments than the top ones returned at first.
This pagination is performed using the MoreXYZ objects (e.g., MoreComments or MorePosts).

In [96]:
subreddit = "worldnews"

breadthCommentCount = 5

targetSub = redditApi.subreddit(subreddit)

submissions = targetSub.hot(limit=1)

for post in submissions:
    print (post.title)
    
    post.comment_limit = breadthCommentCount
    
    # Get the top few comments
    for comment in post.comments.list():
        if isinstance(comment, praw.models.MoreComments):
            continue
        
        print ("---", comment.name, "---")
        print ("\t", comment.body)
        
        for reply in comment.replies.list():
            if isinstance(reply, praw.models.MoreComments):
                continue
            
            print ("\t", "---", reply.name, "---")
            print ("\t\t", reply.body)


North Korea to join Olympics in South Korea
--- t1_dsfjo4x ---
	 Really hoping that Kim Jong Un doesn't participate though... I'd like to see the US win at least one gold medal.
--- t1_dsf05wr ---
	 As I understand it, that was the goal of the recent peace talks, with a soft objective of calming the North down. As someone living in South Korea, this is fantastic news.
	 --- t1_dsf5sms ---
		 Hello from Seoul. Yup, feeling pretty good about it.
--- t1_dsezuqt ---
	 Stuff like this is a step in the right direction.
--- t1_dsf0yvt ---
	 This is the best tl;dr I could make, [original](https://www.ctvnews.ca/mobile/world/north-korea-to-join-olympics-in-south-korea-1.3751239) reduced by 87%. (I'm a bot)
*****
> SEOUL, Korea, Republic Of - The rival Koreas took steps toward reducing their bitter animosity during rare talks Tuesday, as North Korea agreed to send a delegation to next month&#039;s Winter Olympics in South Korea and reopen a military hotline.

> North Korea is weak in winter spor

<hr>
<img src="http://www.cs.umd.edu/~cbuntain/inst728e/FacebookLogo.jpg" width="20%">

## Topic 2.2: Facebook API

Getting access to Facebook's API is slightly easier than Twitter's in that you can go to the Graph API explorer, grab an access token, and immediately start playing around with the API.
The access token isn't good forever though, so if you plan on doing long-term analysis or data capture, you'll need to go the full OAuth route and generate tokens using the approved paths.

In [97]:
# As before, the first thing we do is import the Facebook
# wrapper

import facebook

### Connecting to the Facebook Graph

Facebook has a "Graph API" that lets you explore its social graph. 
For privacy concerns, however, Facebook's Graph API is extremely limited in the kinds of data it can view.
For instance, Graph API applications can now only view profiles of people who already have installed that particular application.
These restrictions make it quite difficult to see a lot of Facebook's data.

That being said, Facebook does have many popular public pages (e.g., BBC World News), and articles or messages posted by these public pages are accessible.
In addition, many posts and comments made in reply to these public posts are also publically available for us to explore.

To connect to Facebook's API though, we need an access token (unlike Reddit's API).
Fortunately, for research and testing purposes, getting an access token is very easy.

#### Acquiring a Facebook Access Token

1. Log in to your Facebook account
1. Go to Facebook's Graph Explorer (https://developers.facebook.com/tools/explorer/)
1. Copy the *long* string out of "Access Token" box and paste it in the code cell bedlow

<img src="http://www.cs.umd.edu/~cbuntain/inst728e/FacebookInstructions_f1.png"/>

In [98]:
fbAccessToken = "EAACEdEose0cBAK2kyW5pcrgzUUMqmr4uR1ppwlz1lC5aIhJyVLm9Bfo1jOXBQwILsVzlt28dSmqwPdX9DQQDLz5zMEZC3ZB6HYTj5LyZA5hKoa3YneQpRyg3cCxwmb0Ea6uazjxaJX2QLNkL7i6BTVhy0bZCZBfvVb29AFZARFXhjcmsFO8QhY2EEhFyZBXIucZD"

Now we can use the Facebook Graph API with this temporary access token (it does expire after maybe 15 minutes).

In [99]:
# Connect to the graph API, note we use version 2.7
graph = facebook.GraphAPI(access_token=fbAccessToken, version='2.7')

### Parsing Posts from a Public Page

To get a public page's posts, all you need is the name of the page. 
Then we can pull the page's feed, and for each post on the page, we can pull its comments and the name of the comment's author.
While it's unlikely that we can get more user information than that, author name and sentiment or text analytics can give insight into bursting topics and demographics.

In [100]:
# What page to look at?
targetPage = "nytimes"

# Other options for pages:
# nytimes, bbc, bbcamerica, bbcafrica, redcross, disaster

maxPosts = 10 # How many posts should we pull?
maxComments = 5 # How many comments for each post?

post = graph.get_object(id=targetPage + '/feed')

# For each post, print its message content and its ID
for v in post["data"][:maxPosts]:
    print ("---")
    print (v["message"], v["id"])
        
    # For each comment on this post, print its number, 
    # the name of the author, and the message content
    print ("Comments:")
    comments = graph.get_object(id='%s/comments' % v["id"])
    for (i, comment) in enumerate(comments["data"][:maxComments]):
        print ("\t", i, comment["from"]["name"], comment["message"])


GraphAPIError: Error validating access token: Session has expired on Wednesday, 27-Dec-17 17:00:00 PST. The current time is Tuesday, 09-Jan-18 14:02:11 PST.

<hr>
<img src="http://www.cs.umd.edu/~cbuntain/inst728e/TwitterLogo.png" width="20%">

## Topic 2.1: Twitter API

Twitter's API is probably the most useful and flexible but takes several steps to configure. 
To get access to the API, you first need to have a Twitter account and have a mobile phone number (or any number that can receive text messages) attached to that account.
Then, we'll use Twitter's developer portal to create an "app" that will then give us the keys tokens and keys (essentially IDs and passwords) we will need to connect to the API.

So, in summary, the general steps are:

0. Have a Twitter account,
1. Configure your Twitter account with your mobile number,
2. Create an app on Twitter's developer site, and
3. Generate consumer and access keys and secrets.

We will then plug these four strings into the code below.

In [101]:
# For our first piece of code, we need to import the package 
# that connects to Twitter. Tweepy is a popular and fully featured
# implementation.

import tweepy

### Creating Twitter Credentials

For more in-depth instructions for creating a Twitter account and/or setting up a Twitter account to use the following code, I will provide a walkthrough on configuring and generating this information.

First, we assume you already have a Twitter account.
If this is not true, either create one real quick or follow along.
See the attached figures.

- __Step 1. Create a Twitter account__ If you haven't already done this, do this now at Twitter.com.

- __Step 2. Setting your mobile number__ Log into Twitter and go to "Settings." From there, click "Mobile" and fill in an SMS-enabled phone number. You will be asked to confirm this number once it's set, and you'll need to do so before you can create any apps for the next step.

<img src="http://www.cs.umd.edu/~cbuntain/inst728e/TwitterInstructions_f1.png" scale="10%"/>
<img src="http://www.cs.umd.edu/~cbuntain/inst728e/TwitterInstructions_f2.png" scale="10%"/>

- __Step 3. Create an app in Twitter's Dev site__ Go to (apps.twitter.com), and click the "Create New App" button. Fill in the "Name," "Description," and "Website" fields, leaving the callback one blank (we're not going to use it). Note that the website __must__ be a fully qualified URL, so it should look like: http://test.url.com. Then scroll down and read the developer agreement, checking that agree, and finally click "Create your Twitter application."

<img src="http://www.cs.umd.edu/~cbuntain/inst728e/TwitterInstructions_f3.png" scale="10%"/>
<img src="http://www.cs.umd.edu/~cbuntain/inst728e/TwitterInstructions_f4.png"/>

- __Step 4. Generate keys and tokens with this app__ After your application has been created, you will see a summary page like the one below. Click "Keys and Access Tokens" to view and manage keys. Scroll down and click "Create my access token." After a moment, your page should refresh, and it should show you four long strings of characters and numbers, a consume key, consumer secret, an access token, and an access secret (note these are __case-sensitive__!). Copy and past these four strings into the quotes in the code cell below.

<img src="http://www.cs.umd.edu/~cbuntain/inst728e/TwitterInstructions_f5.png" scale="10%"/>
<img src="http://www.cs.umd.edu/~cbuntain/inst728e/TwitterInstructions_f6.png"/>

In [102]:
# Use the strings from your Twitter app webpage to populate these four 
# variables. Be sure and put the strings BETWEEN the quotation marks
# to make it a valid Python string.

consumer_key = "1jFG5MF4PNf8zhg8Nmkk3kWVb"
consumer_secret = "MOfU9zxDvsk7nKHLnYvpTUeWW5C7PsXrS9TuwnvYcx3ANzc5LG"
access_token = "2343077714-N9yB6UKYegygTgTPl7xgm7PfUbLhO6TzqitlFP0"
access_secret = "d44DHDHV3CYmeuDnWbITumnPcnrVJwS0mJhgITQYWyXdx"

### Connecting to Twitter

Once we have the authentication details set, we can connect to Twitter using the Tweepy OAuth handler, as below.

In [103]:
# Now we use the configured authentication information to connect
# to Twitter's API
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

print("Connected to Twitter!")

Connected to Twitter!


### Testing our Connection

Now that we are connected to Twitter, let's do a brief check that we can read tweets by pulling the first few tweets from our own timeline (or the account associated with your Twitter app) and printing them.

In [104]:
# Get tweets from our timeline
public_tweets = api.home_timeline()

# print the first five authors and tweet texts
for tweet in public_tweets[:5]:
    print (tweet.author.screen_name, tweet.author.name, "said:", tweet.text)

pewresearch Pew Research Center said: Most Americans evaluate #STEM education in the U.S. as middling compared with other developed nations… https://t.co/9fTwHepPnF
business Bloomberg said: Get caught up on all things tech! #BTECH is streaming LIVE right here on @Twitter https://t.co/Z2eKcAuXAJ
WorldBank World Bank said: Transport is taking a serious toll on the wellbeing of people and the planet. Can technology help save the day? Joi… https://t.co/gcrDFlaYd4
McKinsey McKinsey & Company said: Conventional wisdom has taught leaders that the first 100 days are imperative in a new role, but the data simply do… https://t.co/9BOw2TmFrh
TechCrunch TechCrunch said: How AI and copyright would work https://t.co/fJmtkdI5R6


### Searching Twitter for Keywords

Now that we're connected, we can search Twitter for specific keywords with relative ease just like you were using Twitter's search box.
While this search only goes back 7 days and/or 1,500 tweets (whichever is less), it can be powerful if an event you want to track just started.

Note that you might have to deal with paging if you get lots of data. Twitter will only return you one page of up to 100 tweets at a time.

In [105]:
# Our search string
queryString = "earthquake"

# Perform the search
matchingTweets = api.search(queryString)

print ("Searched for:", queryString)
print ("Number found:", len(matchingTweets))

# For each tweet that matches our query, print the author and text
print ("\nTweets:")
for tweet in matchingTweets:
    print (tweet.author.screen_name, tweet.author.name, tweet.text)

Searched for: earthquake
Number found: 14

Tweets:
SoCalEq SoCal Earthquakes USGS reports a M1.34 #earthquake 7km SSE of Redlands, California on 1/9/18 @ 21:59:11 UTC https://t.co/tmMamupjNe #quake
everyEarthquake Every Earthquake USGS reports a M1.34 #earthquake 7km SSE of Redlands, California on 1/9/18 @ 21:59:11 UTC https://t.co/l1Ei3H2Egi #quake
AvaBH64 Ava’s Gone Feral RT @SenKamalaHarris: These are people who rebuilt their lives in the U.S. after fleeing an earthquake over a decade ago. They have kids, jo…
tannerlikeskara Tanner Griffith A final breath. An earthquake. Dead raise. A curtain tears in two. A soldier falls to his knees. A mother weeps. Si… https://t.co/XPn2Zo7jbX
GePeirson Gail Peirson RT @SenKamalaHarris: These are people who rebuilt their lives in the U.S. after fleeing an earthquake over a decade ago. They have kids, jo…
CornubiaGeol Gordon Neighbour RT @icelandgeology: Currently there is storm in part of Iceland. Tomorrow there is going to be more storm in Icelan

### More Complex Queries

Twitter's Search API exposes many capabilities, like filtering for media, links, mentions, geolocations, dates, etc.
We can access these capabilities directly with the search function.

For a list of operators Twitter supports, go here: https://dev.twitter.com/rest/public/search

In [106]:
# Lets find only media or links about earthquakes
queryString = "earthquake (filter:media OR filter:links)"

# Perform the search
matchingTweets = api.search(queryString)

print ("Searched for:", queryString)
print ("Number found:", len(matchingTweets))

# For each tweet that matches our query, print the author and text
print ("\nTweets:")
for tweet in matchingTweets:
    print (tweet.author.screen_name, tweet.text)

Searched for: earthquake (filter:media OR filter:links)
Number found: 12

Tweets:
LiveFrom_Mars RT @VSUShare: SHARE isn’t allowed to directly sponsor a girl in Tanzania this semester due to their earthquake rebuilding fund. However, we…
SoCalEq USGS reports a M1.34 #earthquake 7km SSE of Redlands, California on 1/9/18 @ 21:59:11 UTC https://t.co/tmMamupjNe #quake
everyEarthquake USGS reports a M1.34 #earthquake 7km SSE of Redlands, California on 1/9/18 @ 21:59:11 UTC https://t.co/l1Ei3H2Egi #quake
AvaBH64 RT @SenKamalaHarris: These are people who rebuilt their lives in the U.S. after fleeing an earthquake over a decade ago. They have kids, jo…
GePeirson RT @SenKamalaHarris: These are people who rebuilt their lives in the U.S. after fleeing an earthquake over a decade ago. They have kids, jo…
Quake_Tracker4 Mag: 4 - Depth: 84 km - UTC 21:50 - Guadeloupe Region, Leeward Isl. - EMSC Info: https://t.co/IHOtVbRkTC
clango4 Whose structure can survive a 20 sec. earthquake? @BarleySheafFRSD ht

### Dealing with Pages

As mentioned, Twitter serves results in pages. 
To get all results, we can use Tweepy's Cursor implementation, which handles this iteration through pages for us in the background.

In [108]:
# Lets find only media or links about earthquakes
queryString = "earthquake (filter:media OR filter:links)"

# How many tweets should we fetch? Upper limit is 1,500
maxToReturn = 100

# Perform the search, and for each tweet that matches our query, 
# print the author and text
print ("\nTweets:")
for status in tweepy.Cursor(api.search, q=queryString).items(maxToReturn):
    print (status.author.screen_name, status.text)


Tweets:
myearthquakeapp 3.02 earthquake occurred near Ashhurst, Manawatu-Wanganui, New Zealand at 01:08 UTC! #earthquake #Ashhurst https://t.co/oeVIhrLQNH
SP16_EARTHQUAKE https://t.co/CPrIKlKfeI
everyEarthquake USGS reports a M0.34 #earthquake 11km W of Toms Place, CA on 1/10/18 @ 0:30:28 UTC https://t.co/7GyIhSFNtq #quake
myearthquakeapp 2.48 earthquake occurred near Mohaka, Hawke's Bay, New Zealand at 01:07 UTC! #earthquake #Mohaka https://t.co/jRs1aYrG7i
earthquake_all7 【微小地震速報 三重県20/74】
2018/01/10 7:21:35 JST, 
日本 三重県 志摩市役所の東南東57km, 
M2.5, TNT84.8kg, 深さ0.0km, 
MAP https://t.co/ZHQTagrb35
earthquake_all 【微小地震速報 和歌山県3/131】
2018/01/10 9:34:51 JST, 
日本 和歌山県 和歌山市役所の南西6km, 
M1.4, TNT1.9kg, 深さ7.1km, 
MAP https://t.co/ZZ3cR8ycVF 1464
myearthquakeapp 2.6 earthquake occurred near Otaki Beach, New Zealand at 01:06 UTC! #earthquake #OtakiBeach https://t.co/8lR0utskud
EARTH3R For the 200,000 Salvadoran earthquake refugees in the U.S., being sent back could be a death sentence… https://t.co/GmJ

### Other Search Functionality

The Tweepy wrapper and Twitter API is pretty extensive.
You can do things like pull the last 3,200 tweets from other users' timelines, find all retweets of your account, get follower lists, search for users matching a query, etc.

More information on Tweepy's capabilities are available at its documentation page: (http://tweepy.readthedocs.io/)

Other information on the Twitter API is available here: (https://developer.twitter.com/en/docs/tweets/search/overview).

### Twitter Streaming

Up to this point, all of our work has been retrospective. 
An event has occurred, and we want to see how Twitter responded over some period of time. 

To follow an event in real time, Twitter and Tweepy support Twitter streaming.
Streaming is a bit complicated, but it essentially lets of track a set of keywords, places, or users.

To keep things simple, I will provide a simple class and show methods for printing the first few tweets.
Larger solutions exist specifically for handling Twitter streaming.

You could take this code though and easily extend it by writing data to a file rather than the console.
I've marked where that code could be inserted.

In [63]:
# First, we need to create our own listener for the stream
# that will stop after a few tweets
class LocalStreamListener(tweepy.StreamListener):
    """A simple stream listener that breaks out after X tweets"""
    
    # Max number of tweets
    maxTweetCount = 10
    
    # Set current counter
    def __init__(self):
        tweepy.StreamListener.__init__(self)
        self.currentTweetCount = 0
        
        # For writing out to a file
        self.filePtr = None
        
    # Create a log file
    def set_log_file(self, newFile):
        if ( self.filePtr ):
            self.filePtr.close()
            
        self.filePtr = newFile
        
    # Close log file
    def close_log_file(self):
        if ( self.filePtr ):
            self.filePtr.close()
    
    # Pass data up to parent then check if we should stop
    def on_data(self, data):

        print (self.currentTweetCount)
        
        tweepy.StreamListener.on_data(self, data)
            
        if ( self.currentTweetCount >= self.maxTweetCount ):
            return False

    # Increment the number of statuses we've seen
    def on_status(self, status):
        self.currentTweetCount += 1
        
        # Could write this status to a file instead of to the console
        print (status.text)
        
        # If we have specified a file, write to it
        if ( self.filePtr ):
            self.filePtr.write("%s\n" % status._json)
        
    # Error handling below here
    def on_exception(self, exc):
        print (exc)

    def on_limit(self, track):
        """Called when a limitation notice arrives"""
        print ("Limit", track)
        return

    def on_error(self, status_code):
        """Called when a non-200 status code is returned"""
        print ("Error:", status_code)
        return False

    def on_timeout(self):
        """Called when stream connection times out"""
        print ("Timeout")
        return

    def on_disconnect(self, notice):
        """Called when twitter sends a disconnect notice
        """
        print ("Disconnect:", notice)
        return

    def on_warning(self, notice):
        print ("Warning:", notice)
        """Called when a disconnection warning message arrives"""



Now we set up the stream using the listener above

In [64]:
listener = LocalStreamListener()
localStream = tweepy.Stream(api.auth, listener)

In [65]:
# Stream based on keywords
localStream.filter(track=['earthquake', 'disaster'])

0
RT @WendySiegelman: @maddow This oil tanker disaster is happening now in the East China Sea - the ship may explode and release 1 million ba…
1
"People in war-torn Yemen are facing a situation that "'looks like the Apocalypse'", says UN's humanitarian chief.… https://t.co/jhjg6kQPAp
2
RT @ARTVReviews: I got a @MoviePass subscription ($10 a month to be able to see 1 movie a day in theaters) and I saw some flicks this past…
3
RT @jennygardiner: Share your dating disaster story at @JUSTConRom for a chance to win an ebook copy of FALLING FOR MR. MAYBE. https://t.co…
4
Now all I have to left to finish of the original NES Mega Man is the horrifically difficult beautiful disaster of e… https://t.co/AI6Yk6agSL
5
RT @sosadtoday: one time i tried to be positive and it was a disaster
6
@amjoyshow @ChetPowell @teresatomlinson : it would be a goat rope disaster. NEVER.
7
RT @178kakapo: "Our children are suffering."
Yemen is on the brink of the worst humanitarian disaster in 50 years. 
fr. AJEnglis

In [66]:
listener = LocalStreamListener()
localStream = tweepy.Stream(api.auth, listener)

# List of screen names to track
screenNames = ['bbcbreaking', 'CNews', 'bbc', 'nytimes']

# Twitter stream uses user IDs instead of names
# so we must convert
userIds = []
for sn in screenNames:
    user = api.get_user(sn)
    userIds.append(user.id_str)

# Stream based on users
localStream.filter(follow=userIds)

0
@nytimes What a POS. You couldn’t pay me enough money to watch these man hating pigs.
1
RT @nytimes: Trump's lawyers are assessing the risks of allowing him to be interviewed by the special counsel, who told them a request was…
2
RT @nytimes: Best: Oprah Winfrey's speech, Natalie Portman's one-liner 
Worst: Mostly silent men, some of those red-carpet interviews https…
3
@nytimes What a joke! Aren’t you the journalists that suppressed the Weinstein story!
4
@nytimes …. unless we disagree with the truth. Then we just make it up
5
RT @nytimes: Times investigations expose the truth that holds power to account. #TruthHasAVoice
6
RT @nytimes: Times journalists pursue the truth wherever it leads.
7
@nytimes Donald that dossier was on me Hillary and the Democrats paid to have your name put on it with the help of… https://t.co/sFCV0pSgWO
8
@nytimes Your truth could be a crappy opinion that spreads as truth tho.
9
RT @nytimes: The truth has power. #TruthHasAVoice

 https://t.co/GjlBsfUAa7


In [67]:
listener = LocalStreamListener()
localStream = tweepy.Stream(api.auth, listener)

# Specify coordinates for a bounding box around area of interest
# In this case, we use San Francisco
swCornerLat = 36.8
swCornerLon = -122.75
neCornerLat = 37.8
neCornerLon = -121.75

boxArray = [swCornerLon, swCornerLat, neCornerLon, neCornerLat]

# Say we want to write these tweets to a file
listener.set_log_file(open("tweet_log.json", "w"))

# Stream based on location
localStream.filter(locations=boxArray)

# Close the log file
listener.close_log_file()

0
SYRE IS SO GOOD THANK YOU. @officialjaden
1
@BreitbartNews My 2 favorite NJ. Generals. Walker &amp; Trump! Fluties pretty good 2.
2
A Chorus of Color: Amazing Birds on Public Lands https://t.co/dUw4OV6zqO
3
@brooke_castro It was great. I had sweets today.  There was 6 cupcakes on the table. I had 4 of them as a snack and… https://t.co/wOYmdkntQa
4
Agree wholeheartedly with Kerr on this. Media's moves are a reflection of society, not the other way around. https://t.co/QfysXwoujI
5
@Twitch_Pink You have to try puerto nuevo style lobster in Chipotle sauce! Soo tasty!
6
@TyIslit Oya let me send you my head too. Mschewww
7
Wowww ): https://t.co/s5xWdsi3AG
8
TEST_PLACE: 6d8ae305-ebf6-447f-a930-4a89d69cb34e
9
Me too 🙋🏻‍♂️🙋🏻‍♂️ https://t.co/mwGB6rmOZz
