# Stream Enrichment

This example demonstrates how to enrich stream data, in real time, with additional features that are stored in the NoSQL data store of the Iguazio Data Science Platform (**"the platform"**).<br>
In this notebook you'll learn how to create and deploy a Nuclio function which is triggered by incoming event-messages to a V3IO-Stream. <br>
The function enriches the original event-message with data from V3IO-NoSQL table and writes the enriched message to an output V3IO-Stream.
In this notebook we'll create two streams: Stream 1 for input and Stream 2 for output and in addition we'll create a NoSQL table with additional info for enrichment <br>
The demo demonstrates sending an event to Iguazio stream with client name, car ID and email. Then the event will be enriched by joining the stream with the relevant record in the Cars table based on the CarID with additional information such as the car's color, manufacture year, vendor and state and then stored in another stream (called Stream2).

The streams and the table are stored in a **&lt;running user&gt;/examples/stream_enrich** directory in the "users" data container.

In [1]:
import nuclio
import os
import requests

### Setting Parameters

In [2]:

NUCLIO_PROJ_NAME = 'examples'
NUCLIO_FUNC_NAME = 'enrich-stream'

CONTAINER_NAME = 'users'
TABLE_NAME = 'cars'
OUTPUT_STREAM_NAME = 'stream2'
V3IO_API = os.getenv('V3IO_API')
V3IO_ACCESS_KEY = os.environ['V3IO_ACCESS_KEY']
V3IO_USERNAME = os.getenv('V3IO_USERNAME')

##################################################################################################################
## Fill in the V3IO password. This is used for creating the trigger. The password to be set is for the logged in user
V3IO_PASSWORD = '123456'
#####################################################################################################################

INPUT_STREAM_NAME = 'stream1'
INPUT_STREAM_SEARCH_KEY = 'CarID'
INPUT_STREAM_URL = f'http://{V3IO_API}/{CONTAINER_NAME}/{V3IO_USERNAME}/examples/stream-enrich/{INPUT_STREAM_NAME}/'
INPUT_STREAM_PARTITIONS = [0, 1, 2]
INPUT_STREAM_SEEK_TO = 'earliest'


### Create input and output streams

In [3]:
payload = f'{{"ShardCount": {len(INPUT_STREAM_PARTITIONS)}, "RetentionPeriodHours": 1 }}'
headers = {
    'Content-Type': "application/json",
    'X-v3io-function': "CreateStream",
    'x-v3io-session-key': V3IO_ACCESS_KEY,
    'cache-control': "no-cache",
}

for stream in [INPUT_STREAM_NAME, OUTPUT_STREAM_NAME]:
    url = f'http://{V3IO_API}/{CONTAINER_NAME}/{V3IO_USERNAME}/examples/stream-enrich/{stream}/'

    response = requests.request("PUT", url, data=payload, headers=headers)

    print(response)

<Response [204]>
<Response [204]>


### Create a KV table with enrichment data

Creating a table called "cars" <br>
We'll insert two sample rows with the following columns: CarID (key) , color, vendor, manufacture year and state <br>
Those fields will be used for enriching the stream in real time

In [4]:
url = f'http://{V3IO_API}/{CONTAINER_NAME}/{V3IO_USERNAME}/examples/stream-enrich/{TABLE_NAME}/'

payloads = [{
    "Key" : {
        "CarID" : {"N" : "0"}
    },
    "Item" : {
        "Color":  {"S": "Gray"},
        "Vendor": {"S" : "Mitsubishi"},
        "Mfg_Year": {"N" : "2017"},
        "State": {"S" : "MI"}
    }
},
{
    "Key" : {
        "CarID" : {"N" : "1"}
    },
    "Item" : {
        "Color":  {"S": "Red"},
        "Vendor": {"S" : "Ford"},
        "Mfg_Year": {"N" : "2019"},
        "State": {"S" : "NY"}
    }
}]

headers = {
    'Content-Type': "application/json",
    'X-v3io-function': "PutItem",
    'x-v3io-session-key': V3IO_ACCESS_KEY,
    'cache-control': "no-cache",
}
for payload in payloads:
    response = requests.request("POST", url, json=payload, headers=headers)
    print(response)

<Response [200]>
<Response [200]>


### Define a list of environment variable to be set for our Nuclio-function

In [5]:
NUCLIO_ENV = {
   'V3IO_API':V3IO_API,
   'V3IO_USERNAME':V3IO_USERNAME,    
   'CONTAINER_NAME':CONTAINER_NAME,
   'TABLE_NAME':TABLE_NAME,
   'INPUT_STREAM_SEARCH_KEY':INPUT_STREAM_SEARCH_KEY,
   'OUTPUT_STREAM_NAME':OUTPUT_STREAM_NAME,
   'V3IO_ACCESS_KEY':V3IO_ACCESS_KEY
}

### Define trigger configuration for our Nuclio-function

In [6]:
NUCLIO_TRIGGER_CONF = {
    'spec.triggers':{
        INPUT_STREAM_NAME: {
            'kind': 'v3ioStream',
            'url': INPUT_STREAM_URL,
            'username': V3IO_USERNAME,
            'password': V3IO_PASSWORD,
            'attributes': {
                'partitions': INPUT_STREAM_PARTITIONS,
                'seekTo': INPUT_STREAM_SEEK_TO,
                }
            }
    }
}

### Define build commands for our Nuclio-function

In [7]:
NUCLIO_CMD = ['pip install requests']

### Define the Nuclio-function code

In [8]:
NUCLIO_CODE = '''
import requests
import json
import base64
import os


def init_context(context):
    # env -> config
    setattr(context.user_data, 'config', {
        'v3io_api': os.environ['V3IO_API'],
        'v3io_username': os.environ['V3IO_USERNAME'],
        'container_name': os.environ['CONTAINER_NAME'],
        'table_name': os.environ['TABLE_NAME'],
        'input_stream_search_key': os.environ['INPUT_STREAM_SEARCH_KEY'],
        'output_stream_name': os.environ['OUTPUT_STREAM_NAME'],
        'v3io_access_key': os.environ['V3IO_ACCESS_KEY'],
    })


def handler(context, event):
    config = context.user_data.config
    msg = json.loads(event.body)
    context.logger.info(f'Incoming message: {msg}')
    enrichment_data = _search_kv(msg, config)
    context.logger.info(f'Enrichment data: {enrichment_data}')
    msg['enrichment'] = enrichment_data
    _put_records([msg], config)
    context.logger.debug(f'Output message: {msg}')


def _get_url(v3io_api, container_name, collection_path):
    return f'http://{v3io_api}/{container_name}/{collection_path}'


def _get_headers(v3io_function, v3io_access_key):
    return {
        'Content-Type': "application/json",
        'X-v3io-function': v3io_function,
        'cache-control': "no-cache",
        'x-v3io-session-key': v3io_access_key
    }


def _search_kv(msg, config):
    v3io_api = config['v3io_api']
    v3io_username = config['v3io_username']
    container_name = config['container_name']
    search_value = msg[config['input_stream_search_key']]
    table_path_and_key = f"{v3io_username}/examples/stream-enrich/{config['table_name']}/{search_value}"
    v3io_access_key = config['v3io_access_key']

    url = _get_url(v3io_api, container_name, table_path_and_key)
    headers = _get_headers("GetItem", v3io_access_key)
    resp = requests.request("POST", url, json={}, headers=headers)

    json_response = json.loads(resp.text)

    response = {}
    if 'Item' in json_response:
        response = json_response['Item']

    return response


def _put_records(items, config):
    v3io_api = config['v3io_api']
    v3io_username = config['v3io_username']
    container_name = config['container_name']
    output_stream_path = f"{v3io_username}/examples/stream-enrich/{config['output_stream_name']}/"
    v3io_access_key = config['v3io_access_key']

    records = _items_to_records(items)
    url = _get_url(v3io_api, container_name, output_stream_path)
    headers = _get_headers("PutRecords", v3io_access_key)

    return requests.request("PUT", url, json=records, headers=headers)


def _item_to_b64(item):
    item_string = json.dumps(item)
    return base64.b64encode(item_string.encode('utf-8')).decode('utf-8')


def _items_to_records(items):
    return {'Records': [{'Data': _item_to_b64(item)} for item in items]}
'''

### Deploy the code

In [None]:
NUCLIO_SPEC = nuclio.ConfigSpec(env=NUCLIO_ENV, config=NUCLIO_TRIGGER_CONF, cmd=NUCLIO_CMD)
addr = nuclio.deploy_code(code=NUCLIO_CODE,name=NUCLIO_FUNC_NAME,project=NUCLIO_PROJ_NAME,verbose=True, spec=NUCLIO_SPEC)
#print(addr)

### Invoke the function by sending event-message to the input stream

In the example below we are sending an event with the client name, email and the car ID <br>
Then, the event will be enriched with the data that resides in the Cars table and eventually will be written to Stream 2 along with the enriched data

In [10]:
import base64

url = f'http://{V3IO_API}/{CONTAINER_NAME}/{V3IO_USERNAME}/examples/stream-enrich/{INPUT_STREAM_NAME}/'

msg = '{"ClientName": "John Smith", "Email": "john.smith@myemailprovider.com", "CarID": "0"}'
msg_b64 = base64.b64encode(msg.encode('utf-8')).decode('utf-8')

payload = f'{{"Records": [{{"Data": "{msg_b64}"}}]}}'

headers = {
    'Content-Type': "application/json",
    'X-v3io-function': "PutRecords",
    'x-v3io-session-key': V3IO_ACCESS_KEY,
    'cache-control': "no-cache",
}

response = requests.request("PUT", url, data=payload, headers=headers)

print(response)

<Response [200]>


### Check the enriched data in the output stream

Read from Stream2 <br>
Expected result is the enriched stream with all the rest of the data coming from the the cars table

In [11]:
import json

payload = '{"Location": "AQAAAAAAAAAAAAAAAAAAAA==", "Limit": 10}'
headers = {
    'Content-Type': "application/json",
    'X-v3io-function': "GetRecords",
    'x-v3io-session-key': V3IO_ACCESS_KEY,
    'cache-control': "no-cache",
}

for shard_id in INPUT_STREAM_PARTITIONS:
    url = f'http://{V3IO_API}/{CONTAINER_NAME}/{V3IO_USERNAME}/examples/stream-enrich/{OUTPUT_STREAM_NAME}/{shard_id}'
    response = requests.request("PUT", url, data=payload, headers=headers)

    if response.status_code == 200:
        data = json.loads(response.text)["Records"][0]["Data"]
        print(base64.b64decode(data).decode())

{"ClientName": "John Smith", "Email": "john.smith@myemailprovider.com", "CarID": "0", "enrichment": {"CarID": {"N": "0"}, "Color": {"S": "Gray"}, "Vendor": {"S": "Mitsubishi"}, "Mfg_Year": {"N": "2017"}, "State": {"S": "MI"}}}


### Delete the created data

In [12]:
!rm -r /v3io/$V3IO_HOME/examples/stream-enrich