# 3b. Enrich Stream
  --------------------------------------------------------------------

Receive a stream of events from from `incoming-events-stream` and enrich the relevant events with socioeconomic data by looking up the enrichment table. The enriched data is streamed out to `enriched-events-stream`.

![Model deployment with streaming Real-time operational Pipeline](../../assets/images/model-deployment-with-streaming.png)

You can change the enrichments and add additional enrichments by changing the `enrich_event` method.

## Initialize

Load the project

In [1]:
from mlrun import load_project
from os import path

project_path = path.abspath('conf')
project = load_project(project_path)

Get the incoming events stream as the input

In [2]:
input_stream = project.params.get('STREAM_CONFIGS').get('incoming-events-stream')
input_stream_path =  input_stream.get('path')

Nuclio leverages consumer groups. When one or more Nuclio replicas join a consumer group, each replica receives its equal share of the shards, based on the number of replicas that are defined in the function.

We set up the input stream URL below. A consumer-group URL is in the form of `http://v3io-webapi:8081/<container name>/<stream path>@<consumer group name>`. In this case we use `WEB_API_USERS` for URL prefix `http://v3io-webapi:8081/<container name>` and a consumer group named **`enrichstream`**.

For more information, refer to the [Nuclio v3iostream trigger reference documentation](https://nuclio.io/docs/latest/reference/triggers/v3iostream/).

In [3]:
WEB_API_USERS = project.params.get('WEB_API_USERS')
input_stream_url = path.join(WEB_API_USERS, input_stream_path) + "@enrichstream"
print(f'Input stream URL: {input_stream_url}')

Input stream URL: http://v3io-webapi:8081/users/iguazio/examples/model-deployment-with-streaming/data/incoming-events-stream@enrichstream


Get the enriched stream path, this is where we output the data

In [4]:
output_stream = project.params.get('STREAM_CONFIGS').get('enriched-events-stream')
output_stream_path =  output_stream.get('path')
print(f'Output stream path: {output_stream_path}')

Output stream path: iguazio/examples/model-deployment-with-streaming/data/enriched-events-stream


## Create and Test a Local Function 
Import nuclio SDK and magics,

In [5]:
import nuclio

#### Functions imports

In [6]:
# nuclio: start-code

In [7]:
import os
import json
import v3io.dataplane

<b>Specify function dependencies and configuration<b>

In [8]:
%nuclio cmd -c pip install v3io

In [9]:
%%nuclio config
spec.build.baseImage = "mlrun/ml-models"

%nuclio: setting spec.build.baseImage to 'mlrun/ml-models'


## Function code

In [10]:
def init_context(context):
    V3IO_ACCESS_KEY = os.getenv('V3IO_ACCESS_KEY')
    container = os.getenv('CONTAINER')
    output_stream_path = os.getenv('OUTPUT_STREAM_PATH')
    partition_attr = os.getenv('PARTITION_ATTR')
    enrichment_table_path = os.getenv('ENRICHMENT_TABLE_PATH')
    enrichment_key = os.getenv('ENRICHMENT_KEY')
    v3io_client = v3io.dataplane.Client(endpoint='http://v3io-webapi:8081', access_key=V3IO_ACCESS_KEY)
    
    setattr(context, 'container', container)
    setattr(context, 'output_stream_path', output_stream_path)
    setattr(context, 'partition_attr', partition_attr)
    setattr(context, 'enrichment_table_path', enrichment_table_path)
    setattr(context, 'enrichment_key', enrichment_key)
    setattr(context, 'v3io_client', v3io_client)


def handler(context, event):
    if type(event.body) is dict:
        event_dict = event.body
    else:
        event_dict = json.loads(event.body)
        
    context.logger.info_with('Got invoked',
                             trigger_kind=event.trigger.kind,
                             event_body=event_dict)
        
    partition_key = event_dict.get(context.partition_attr)
    
    record = {}
    if event_dict['event_type'] == 'registration':
        enriched_event = enrich_event(context, event_dict)
        record = event_to_record(enriched_event, partition_key)
    else:
        record = event_to_record(event_dict, partition_key)
    resp = context.v3io_client.put_records(container=context.container, 
                                   path=context.output_stream_path, 
                                   records=[record], 
                                   raise_for_status=v3io.dataplane.RaiseForStatus.never)
    
    context.logger.info_with('Sent event to stream', 
                             record=record,
                             response_status=resp.status_code, 
                             response_body=resp.body.decode('utf-8'))
    
    return resp.status_code


def enrich_event(context, event_dict):
    if context.enrichment_key in event_dict:
        enrichment_key_value = event_dict[context.enrichment_key]
        resp = context.v3io_client.get_item(container=context.container, 
                                            path=os.path.join(context.enrichment_table_path, str(enrichment_key_value)),
                                           raise_for_status=v3io.dataplane.RaiseForStatus.never)
        if 200 <= resp.status_code <= 299:
            enriched_event = dict(event_dict, **resp.output.item)
            context.logger.info_with('Event was enriched', enriched_event=enriched_event)
            return enriched_event
        else:
            context.logger.debug_with("Couldn't enrich event", 
                                      enrichment_key_value=enrichment_key_value,
                                      response_status=resp.status_code, 
                                      response_body=resp.body.decode('utf-8'))
            return event_dict
    else:
        return event_dict

    
def event_to_record(event_dict, partition_key):
    event_str = json.dumps(event_dict)
    return {'data': event_str, 'partition_key': str(partition_key)}

The following end-code annotation tells ```nuclio``` to stop parsing the notebook from this cell. _**Please do not remove this cell**_:

In [11]:
# nuclio: end-code
# marks the end of a code section

In [12]:
envs = {'V3IO_ACCESS_KEY': os.getenv('V3IO_ACCESS_KEY'),
        'CONTAINER': project.params.get('CONTAINER'),
        'OUTPUT_STREAM_PATH': output_stream_path,
        'PARTITION_ATTR': project.params.get('PARTITION_ATTR'),
        'ENRICHMENT_TABLE_PATH': project.params.get('ENRICHMENT_TABLE_PATH'),
        'ENRICHMENT_KEY':"postcode"}

## Test locally

In [13]:
for key, value in envs.items():
    os.environ[key] = str(value)
event = nuclio.Event(body=b'{"user_id" : 111111 , "event_type": "registration", "postcode": 11014}')
init_context(context)
handler(context, event)

Python> 2020-08-19 18:50:16,907 [info] Got invoked: {'trigger_kind': '', 'event_body': {'user_id': 111111, 'event_type': 'registration', 'postcode': 11014}}
Python> 2020-08-19 18:50:16,909 [info] Event was enriched: {'enriched_event': {'user_id': 111111, 'event_type': 'registration', 'postcode': 11014, 'socioeconomic_idx': 1}}
Python> 2020-08-19 18:50:16,910 [info] Sent event to stream: {'record': {'data': '{"user_id": 111111, "event_type": "registration", "postcode": 11014, "socioeconomic_idx": 1}', 'partition_key': '111111'}, 'response_status': 200, 'response_body': '{ "FailedRecordCount":0,"Records": [{ "SequenceNumber":1,"ShardId":5 } ] }'}


200

# MLRun

In [14]:
from mlrun import code_to_function

gen_func = code_to_function(name='enrich', kind = 'nuclio')
project.set_function(gen_func)
enrich = project.func('enrich')
enrich.set_envs(envs)
enrich.add_trigger('incoming', nuclio.triggers.V3IOStreamTrigger(url=input_stream_url, access_key=os.getenv('V3IO_ACCESS_KEY'), maxWorkers=10))

<mlrun.runtimes.function.RemoteRuntime at 0x7f64e4790790>

In [15]:
project.save()

In [16]:
enrich.deploy()

> 2020-08-19 18:50:18,191 [info] deploy started
[nuclio] 2020-08-19 18:50:19,264 (info) Build complete
[nuclio] 2020-08-19 18:50:23,302 (info) Function deploy complete
[nuclio] 2020-08-19 18:50:23,308 done creating model-deployment-with-streaming-iguazio-enrich, function address: 3.131.87.251:31896


'http://3.131.87.251:31896'