### Twitter Producer

**Original Author:** Walker Rowe.<br/>
**With modification of:** Astrid Krickl.<br/>
**Additional Info:** Working with Streaming Twitter Data Using Kafka. https://www.bmc.com/blogs/working-streaming-twitter-data-using-kafka/<br/>

In [None]:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install tweepy
!{sys.executable} -m pip install kafka-python

In [None]:
# Import the tweepy, kafka and json modules 
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
from kafka import KafkaProducer
import json

# Setup Twitter access token variables
consumer_key = '*consumer_key*'
consumer_secret = '*consumer_secret*'
access_token = '*access_token*'
access_secret ='*access_secret*'

#### Creating a stream listener class
* on_data - called when raw data is received from connection, publishes a message to a weather topic
* on_error - called when a non-200 status code is returned, prints error

In [None]:
class StdOutListener(StreamListener):
    
    def on_data(self, data):
        producer.send("weather", data.encode('utf-8'))
        return True
    
    def on_error(self, status):
        print ('Error: ', status)

#### Starting Stream
* Preparing producer, StdOutListener, OAuthHandler (setting access tokens), Stream object
* Setting location bounding boxes
    * First two parameters are lon, lat of south-west border
    * Third and fourth parameters are lon, lat of north-east border
    * Chosen cities in order: London, Dublin, Belfast, Manchester, Liverpool, Miami, LA, Dallas 
* Starting the stream with filter on location

As stream was regularly crashing, we implemented an infinite while loop that restarted the stream when an exception was thrown. 

Unexpected error: (<class 'urllib3.exceptions.ProtocolError'>, ProtocolError('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read)), <traceback object at 0x7fac792e4b80>)

In [None]:
# Encapsulate the code in try except blocks
try:    
    # Publish a message to a topic
    producer = KafkaProducer(bootstrap_servers='localhost:9092')

    # Create a StdOutListener object
    l = StdOutListener()

    # Create a OAuth authentication handler object
    auth = OAuthHandler(consumer_key, consumer_secret)

    # Set the tweepy access tokenes
    auth.set_access_token(access_token, access_secret)

    # Create a Stream object
    stream = Stream(auth, l)
    
    location = [-0.72, 51.31, 0.55, 51.71,\
                -6.49, 53.25, -6.1, 53.41,\
                -6.11, 54.5, -5.8, 54.7,\
                -2.49, 53.3, -2.01, 53.66,\
                -3.12, 53.31, -2.85, 53.5,\
                -80.68, 25.5, -79.93, 26.9,\
                -118.5, 33.51, -116.79, 34.4,\
                -97.57, 32.33, -96.42, 33.35]
    #London, Dublin, Belfast, Manchester, Liverpool, Miami, LA, Dallas 
    while(True):
        try:
            # Filter the stream for all tweets containing the search locations. 
            stream.filter(locations=location)
        except:
            print("Prob. connection broken, restarting")
            continue
    
except:
    # Print the error
    print("Unexpected error:", sys.exc_info())