# Twitter Streaming API in Python

This notebook uses the [Tweepy](https://www.tweepy.org/) library for Python to access Twitter's streaming API to get tweets in real-time. The notebook demonstrates how to access tweets based on user IDs, search terms or by geolocation.

In order to use the Twitter Streaming API you will need to [Apply for a Twitter Developer Account](https://developer.twitter.com/en/application/use-case). You will then need to set up a Twitter appand obtain the authentication credentials that will enable your application to connect to the API. See [Twitter developer apps: Overview](https://developer.twitter.com/en/docs/basics/apps/overview) for further details.

This example is based on a tutorial series [How to use the Twitter API v1.1 with Python to stream tweets](https://youtu.be/pUUxmvvl2FE) by sentdex on Youtube.

#### Import dependencies

In [None]:
# Import Tweepy library to connect to the Twitter Streaming API and process the response
import tweepy
# Import time to enable us to set a time limit for data collection
import time
# Import json library to enable access to raw data
import json

#### Authentication Credentials

In [None]:
# Enter the details from the Twitter App you created here
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

#### Data Collection Timer

Once connected to the streaming API the stream will not terminate unless the connection is closed. We'll use a set duration for data collection in seconds, which will be used in a timer function.

1 min = 60 / 5 mins = 300 / 10 mins = 600 / 30 mins = 1800 / 1 hour = 3600 / 12 hours = 43200 / 24 hours = 86400

In [None]:
# Variable to control the duration of data collection in seconds
data_coll_time = 60

#### Filter Criteria

When calling the real-time streaming API you will use filters to specify criteria for the type of tweets you wish the streaming endpoint to return. For example, you can filter tweets by specifying one of the following parameters and providing appropriate values: `follow` (list of users); `track` (list of search terms); `locations` (list of lon/lat pairs in WGS84 describing the southwest and northeast corners of one or more geographic bounding boxes). See [Filter realtime Tweets](https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters) for further details.

Other filters include `languages` and a `filter_level` which can be used to moderate tweets for display purposes.

The coordinates for a suitable bounding box can be found using [BoundingBox](https://boundingbox.klokantech.com/) to draw a bounding box and then selecting `CSV` as the 'Copy & Paste' method in the lower left of the screen.

**NOTE: You can use combinations of filters such as track and location but this will filter using Either/OR logic rather than an AND operator.**

In [None]:
# Follow: A comma-separated list of user IDs
users = ["2247929742"] #Test this by using your own user ID and Tweeting from that account while running the code

# Track: A comma-separated list of phrases that are matched to Tweet text, hashtags, screen name and URLs
terms = ['virtual reality', 'augmented reality']

# Locations: A list of two Lon/Lat pairs for each location
london = [-0.603549,51.239469,0.359128,51.726932]
dublin = [-6.391232,53.297358,-6.160176,53.395736]
# To search a number of locations simultaneously
dub_lon = [-6.391232,53.297358,-6.160176,53.395736,-0.603549,51.239469,0.359128,51.726932]

# Assign preferred location to a variable
chosen_location = london

#### Authentication

In [None]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

## Simple StreamListener

The Twitter streaming API allows data to be downloaded in real-time by pushing messages to a persistent streaming session. We can listen to those messages by creating a StreamListener. See [Streaming With Tweepy](https://tweepy.readthedocs.io/en/latest/streaming_how_to.html) for further details.

Streams do not terminate unless the connection is closed. Setting Tweepy's `is_async` parameter to `=True` will prevent blocking by running the stream on a new thread. We'll also use a timer which can be checked in Tweepy's `on_status` or `on_data` methods in order to close the connection after a set period by setting `return False`. The design of the timer derives from the answer by user  yprez to the question [Unable to stop Streaming in tweepy after one minute](https://stackoverflow.com/questions/33498975/unable-to-stop-streaming-in-tweepy-after-one-minute) on Stack Overflow. 

#### Create a StreamListener

In [None]:
class MyStreamListener(tweepy.StreamListener):
    
    # Initialise timer for this MyStreamLogger instance
    def __init__(self, time_limit = data_coll_time):
        self.start_time = time.time()
        self.limit = time_limit
        super(MyStreamListener, self).__init__()

    # Confirm connection established
    def on_connect(self):
        print('Streaming API Connection Established!')
        pass
    
    # Display the text of Twitter status updates as they are received
    def on_status(self, status):
        # Check timer
        if (time.time() - self.start_time) < self.limit:
            print(status.text)
            return True
        else:
            print('Closing Connection!')
            return False

    # Handle errors
    def on_error(self, status_code):
        print(status_code)
        if status_code == 420:
            #returning False in on_error disconnects the stream
            return False

#### Create a Stream

In [None]:
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth, myStreamListener)

# Select one of the following filters

#myStream.filter(follow=users, is_async=True)
#myStream.filter(track=terms, is_async=True)
myStream.filter(locations=chosen_location, is_async=True)

## StreamLogger

In addition to streaming status updates and printing them to the console we can save the raw data to a file or database. In this case we'll save the raw output in its native JSON format to a file.

**NOTE: Saving raw tweets will create large files so consider setting a low value for `data_collection_time` and/or adapting the code to only write the parsed JSON fields you are interested in.**

#### Create a StreamLogger

In [None]:
class MyStreamLogger(tweepy.StreamListener):
    
    # Initialise timer for this MyStreamLogger instance
    def __init__(self, time_limit = data_coll_time):
        self.start_time = time.time()
        self.limit = time_limit
        super(MyStreamLogger, self).__init__()
    
    # Confirm connection established
    def on_connect(self):
        print('Streaming API Connection Established!')
        pass
    
    # Process received data
    def on_data(self, data):
        # Get current date to use as prefix for filename 
        timestr = time.strftime("%Y_%m_%d_")
        # Reopen or Create new file with parameter set to append data
        saveFile = open(timestr + 'Tweets.json','a')
        # Check timer
        if (time.time() - self.start_time) < self.limit:
            try:
                # Load data response as JSON
                payload = json.loads(data)
                #Parse JSON response
                try:
                    tweetStr = payload['extended_tweet']['full_text']
                except Exception as e:
                    tweetStr = payload['text']
                user = payload['user']['screen_name']
                # Print parsed data
                print('@' + user + ' tweeted ' + tweetStr)
                # Save data to the open file
                saveFile.write(data)
                return True
            except BaseException as e:
                # Print any exception
                print('Failed on_data, ',str(e))
                # Allow a pause of n seconds before reconnecting in case of unexpected rate limiting
                time.sleep(5)
        else:
            #Save and close the file
            saveFile.close()
            print('Closing Connection!')
            return False
    
    # Handle errors
    def on_error(self, status_code):
        print(status_code)
        if status_code == 420:
            #returning False in on_error disconnects the stream
            return False

#### Create a stream

In [None]:
myStreamLogger = MyStreamLogger()
myStream = tweepy.Stream(auth, myStreamLogger)

# Select one of the following filters

#myStream.filter(follow=users, is_async=True)
#myStream.filter(track=terms, is_async=True)
myStream.filter(locations=chosen_location, is_async=True) #Specify location