# Ingest Twitter Feed Data and Sentiments into iguazio Stream & Time-Series DB 

## Initialization 
Install packages and set environment variables.<br>
Need to fill the following environment variables with real credentials.<br>
### Create a file called `tweet_env.txt` in the same path and write the credentials vars in the following form:

```
    # Twitter credentials
    app_key = <..>
    app_secret = <..>
    oauth_token = <..>
    oauth_token_secret = <..>

```

## Initialize nuclio emulation, environment variables and configuration
use `# nuclio: ignore` for sections that don't need to be copied to the function

In [1]:
# nuclio: ignore
# if the nuclio-jupyter package is not installed run !pip install nuclio-jupyter
import nuclio 

In [2]:
%nuclio env -c V3IO_ACCESS_KEY=${V3IO_ACCESS_KEY}
%nuclio env -c V3IO_USERNAME=${V3IO_USERNAME}
%nuclio env -c V3IO_API=${V3IO_API}

In [3]:
%nuclio env_file tweet_env.txt

%nuclio: setting 'app_key' environment variable
%nuclio: setting 'app_secret' environment variable
%nuclio: setting 'oauth_token' environment variable
%nuclio: setting 'oauth_token_secret' environment variable


### Install required packages
`%nuclio cmd` allows you to run image build instructions and install packages<br>
Note: when using the `-c` option commands will only run in nuclio, not locally

In [None]:
%%nuclio cmd
pip install textblob
pip install twython
pip install v3io_frames

In [5]:
%nuclio config spec.build.baseImage = "python:3.6-jessie"

%nuclio: setting spec.build.baseImage to 'python:3.6-jessie'


### Twitter stream handling class

In [6]:
from twython import TwythonStreamer
import json
import re
import os
from textblob import TextBlob
import v3io_frames as v3f
import pandas as pd

oauth = {
    'app_key' : os.getenv('app_key'),
    'app_secret' : os.getenv('app_secret'),  
    'oauth_token' : os.getenv('oauth_token'), 
    'oauth_token_secret' : os.getenv('oauth_token_secret'),
}
lastText = ''

# initialize iguazio v3io APIs
client = v3f.Client('framesd:8081')

# Twitter stream handler  
class MyStreamer(TwythonStreamer):
    def __init__(self, context, name, **kw):
        self.name = name
        self.context = context
        TwythonStreamer.__init__(self, **kw)
        
    def start(self, cb, limit=10, **kw):
        self.cb = cb
        self.limit = limit
        self.statuses.filter(**kw)
        
    def on_success(self, data):
        if 'text' in data:
            record = {'text': data['text'], 
                      'user': '@'+data['user']['screen_name'],
                      'id': data['id'],
                      'created_at':data['created_at'],
                     }
            self.context.last_message = record
            if self.cb:
                self.cb(self.context, self.name, record)
                
        self.limit -= 1 
        if self.limit <= 0 :
            self.disconnect()

    def on_error(self, status_code, data):
        self.context.logger.error_with('Error in stream', status_code=status_code)

def process_event(context, name, record):
    clean = ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", record['text']).split())
        
    # enrich the record with natural language metadata
    blob = TextBlob(clean)
    record['polarity'] = blob.sentiment.polarity
    record['subjectivity'] = blob.sentiment.subjectivity

    # Write the record into a iguazio straem
    context.logger.info_with('writing data to Stream', record=record)
    client.execute('stream', 'stock_stream', 'put', args={'data': json.dumps(record)})
        
    # Write data to iguazio Time-Series DB
    df = pd.DataFrame(index=[[pd.to_datetime(record['created_at'])],['GOOG']], columns=['sentiment'])
    df['sentiment'] = [float(blob.sentiment.polarity)]
    df.index.names=['time','symbol']
    client.write(backend='tsdb', table='stock_metrics',dfs=df)

## Nuclio service initialization (init_context) and event handler implementation
the twitter function acts as an always on (listening) service function, we initialize the listener thread in nuclio `init_context` hook

In [7]:
import threading

def start_listener(context):
    stream = MyStreamer(context, 'GOOG', **oauth)
    stream.start(process_event, 200, track='@Google', lang='en')

def handler(context, event):
    return json.dumps(context.last_message)

def init_context(context):
    context.last_message = {}
    t = threading.Thread(target=start_listener, args=(context,))
    t.start()

## Function testing
the following section simulates nuclio function invocation and will emit the function results

In [8]:
# nuclio: ignore
init_context(context)
event = nuclio.Event(body='')

Python> 2019-03-20 16:26:04,365 [info] writing data to Stream: {'record': {'text': 'RT @twintvofficial: Laut Axel Voss gibt‘s bei Google die klickbare Kategorie „Memes“. Folgende Kategorien sollen ganz fix folgen:\n\n- Leberw…', 'user': '@Mara53825761', 'id': 1108404326060433413, 'created_at': 'Wed Mar 20 16:25:59 +0000 2019', 'polarity': 0.0, 'subjectivity': 0.0}}
Python> 2019-03-20 16:26:06,842 [info] writing data to Stream: {'record': {'text': 'مسخره نیست\nدودل نوروز در اسرائیل نشون داده میشه ولی توی خود ایران نشون داده نمیشه\n@Google', 'user': '@iHMahmoodi', 'id': 1108404336521105408, 'created_at': 'Wed Mar 20 16:26:01 +0000 2019', 'polarity': 0.0, 'subjectivity': 0.0}}
Python> 2019-03-20 16:26:07,156 [info] writing data to Stream: {'record': {'text': '@Ampemusicos @loxatus @brusselauri @OHCHR_Europe @davidakaye @edri @Google @_burakozgen @crispinhunt @HelgaTruepel… https://t.co/8I2JkVwoUO', 'user': '@Goreminister', 'id': 1108404337863213057, 'created_at': 'Wed Mar 20 16:26:01 +000

## Deploy a function onto a cluster
the `%nuclio deploy` command deploy functions into a cluster<br>check the help (`%nuclio help deploy`) for more information

In [9]:
%nuclio deploy -p stocks -c

%nuclio: ['deploy', '-p', 'stocks', '-c', '/User/demos/stocks/read-tweets.ipynb']
%nuclio: [nuclio.deploy] 2019-03-20 16:28:06,028 (info) Building processor image
%nuclio: [nuclio.deploy] 2019-03-20 16:28:08,047 (info) Pushing image
%nuclio: [nuclio.deploy] 2019-03-20 16:28:08,048 (info) Build complete
%nuclio: [nuclio.deploy] 2019-03-20 16:28:11,077 done updating read-tweet, function address: 18.194.137.243:32080
%nuclio: function deployed
