# 4. Stream to Features
  --------------------------------------------------------------------

Receive a stream of events from `enriched-events-stream`, update a set of aggregations on the data. The output data is stored to an aggregation table called `feature-table` and a new event that includes the calculated features is written to `serving-stream`

![Model deployment with streaming Real-time operational Pipeline](../../assets/images/model-deployment-with-streaming.png)

You can change the incoming events and the generated features by customizing the methods below.

## Initialize

Load the project

In [1]:
from mlrun import load_project
from os import path

project_path = path.abspath('conf')
project = load_project(project_path)

Get the enriched events stream as the input

In [2]:
input_stream = project.params.get('STREAM_CONFIGS').get('enriched-events-stream')
input_stream_path =  input_stream.get('path')

Nuclio leverages consumer groups. When one or more Nuclio replicas join a consumer group, each replica receives its equal share of the shards, based on the number of replicas that are defined in the function.

We set up the input stream URL below. A consumer-group URL is in the form of `http://v3io-webapi:8081/<container name>/<stream path>@<consumer group name>`. In this case we use `WEB_API_USERS` for URL prefix `http://v3io-webapi:8081/<container name>` and a consumer group named **`stream2features`**.

For more information, refer to the [Nuclio v3iostream trigger reference documentation](https://nuclio.io/docs/latest/reference/triggers/v3iostream/).

In [3]:
WEB_API_USERS = project.params.get('WEB_API_USERS')
input_stream_url = path.join(WEB_API_USERS, input_stream_path) + "@stream2features"
print(f'Input stream URL: {input_stream_url}')

Input stream URL: http://v3io-webapi:8081/users/iguazio/examples/model-deployment-with-streaming/data/enriched-events-stream@stream2features


Get the serving stream path, this is where we output the data

In [4]:
output_stream = project.params.get('STREAM_CONFIGS').get('serving-stream')
output_stream_path =  output_stream.get('path')
print(f'Output stream path: {output_stream_path}')

Output stream path: iguazio/examples/model-deployment-with-streaming/data/serving-stream


## Create and Test a Local Function 
Import nuclio SDK and magics

In [5]:
import nuclio

#### Functions imports

In [6]:
# nuclio: start-code

In [7]:
import os
import json
import numpy as np
from v3io import dataplane, common
from datetime import datetime

<b>Specify function dependencies and configuration<b>

In [8]:
%nuclio cmd -c pip install v3io numpy

In [9]:
%%nuclio config
spec.build.baseImage = "mlrun/ml-models"

%nuclio: setting spec.build.baseImage to 'mlrun/ml-models'


## Function code

In [10]:
def init_context(context):
    V3IO_ACCESS_KEY = os.getenv('V3IO_ACCESS_KEY')
    container = os.getenv('CONTAINER')
    feature_table_path = os.getenv('FEATURE_TABLE_PATH')
    feature_list = os.getenv('FEATURE_LIST').strip("'][").split("','")
    serving_events = os.getenv('SERVING_EVENTS').replace(' ','').strip("'][").split("','")
    output_stream_path = os.getenv('OUTPUT_STREAM_PATH')    
    partition_attr = os.getenv('PARTITION_ATTR')
    
    v3io_client = dataplane.Client(endpoint='http://v3io-webapi:8081', access_key=V3IO_ACCESS_KEY)
    
    event_handlers = {'registration': process_registration,
                      'purchase': process_purchase,
                      'bet': process_bet,
                      'win': process_win}
    
    setattr(context, 'v3io_client', v3io_client)
    setattr(context, 'container', container)
    setattr(context, 'feature_table_path', feature_table_path)
    setattr(context, 'feature_list', feature_list)
    setattr(context, 'serving_events', serving_events)
    setattr(context, 'output_stream_path', output_stream_path)
    setattr(context, 'partition_attr', partition_attr)
    setattr(context, 'event_handlers', event_handlers)

def handler(context, event):
    if type(event.body) is dict:
        event_dict = event.body
    else:
        event_dict = json.loads(event.body)
        
    if is_relevant_event(context, event_dict):
        event_type = get_event_type(event_dict)
        context.logger.info(f'Incoming event type: {event_type}')
        
        # python switch-case
        process_func = context.event_handlers.get(event_type)
        context.logger.info(f'Processing event {event_dict}')
        response = process_func(context, event_dict)
        context.logger.info(f'Finished processing with status: {response.status_code} - and response body: {response.body} , event: {event_dict}')
        if event_type in context.serving_events and (200 <= response.status_code < 300) :
            context.logger.info(f'sending event for serving')
            write_to_output_stream(context, event_dict)
    else:
        context.logger.info(f'Not relevant event')    

        
def get_event_type(event):
    return event['event_type']


def is_relevant_event(context, event):
    return get_event_type(event) in context.event_handlers
        
def get_features(context, event):
    user_id = event['user_id']
    features_list = context.feature_list
    resp = context.v3io_client.get_item(container=context.container, 
                                        path=common.helpers.url_join(context.feature_table_path, str(user_id)),
                                        raise_for_status=dataplane.RaiseForStatus.never)
    
    feat_list = [resp.output.item.get(feat) for feat in features_list]
    return json.dumps({'instances': np.array(feat_list).reshape(1,-1).tolist()})
    
def write_to_output_stream(context, event):
    partition_key = event.get(context.partition_attr)    
    data = get_features(context, event)
    
    record = {'partition_key': str(partition_key), 'data': data }
    resp = context.v3io_client.put_records(container=context.container, 
                                           path=context.output_stream_path, 
                                           records=[record], 
                                           raise_for_status=dataplane.RaiseForStatus.never)
    context.logger.info(f'Sent features for user: {event["user_id"]} to serving stream')


def event_time_to_ts(event_time):
    dt = datetime.strptime(event_time,'%Y-%m-%d %H:%M:%S.%f')
    return datetime.timestamp(dt)


def get_sum_count_mean_var_expr(feature: str, current_value):
    sum_str = f"SET {feature}_sum= if_not_exists({feature}_sum, 0) + {current_value};"
    count_str = f"SET {feature}_count= if_not_exists({feature}_count, 0) + 1;"
    delta_str = f"SET {feature}_delta= {current_value} - if_not_exists({feature}_mean, 0);"
    mean_str = f"SET {feature}_mean= if_not_exists({feature}_mean, 0) + ({feature}_delta / {feature}_count);"
    m2_str = f"SET {feature}_m2= if_not_exists({feature}_m2, 0) + ({feature}_delta * ({current_value} - {feature}_mean));"
    var_str = f"SET {feature}_var= {feature}_m2 / (max(2, {feature}_count)-1) ;"
    expression = sum_str + count_str + delta_str + mean_str + m2_str + var_str
    return expression


def update_features(context, user_id, expression, condition):
    return context.v3io_client.update_item(container=context.container,
                                          path=common.helpers.url_join(context.feature_table_path, str(user_id)),
                                          condition=condition,
                                          expression=expression,
                                          raise_for_status=dataplane.RaiseForStatus.never)


def process_registration(context, event):
    user_id = event['user_id']
    
    features = {'user_id': event['user_id'],
               'registration_date': event['event_time'],
               'date_of_birth': event['date_of_birth'],
               'socioeconomic_idx':  event['socioeconomic_idx'],
               'affiliate_url': event['affiliate_url'],
               'label': event['label']}
    
    response = context.v3io_client.put_item(container=context.container,
                                       path=common.helpers.url_join(context.feature_table_path, str(user_id)),
                                       attributes=features,
                                       raise_for_status=dataplane.RaiseForStatus.never)
    return response


def process_purchase(context, event):
    user_id = event['user_id']
    event_time = event['event_time']
    event_ts = event_time_to_ts(event_time)
    
    purchase_amount = event['amount']

    first_purchase_ts_str = f"SET first_purchase_ts=if_not_exists(first_purchase_ts, {event_ts});"
    sum_count_mean_var_expr = get_sum_count_mean_var_expr('purchase', purchase_amount)
    
    expression = first_purchase_ts_str + sum_count_mean_var_expr
    condition = f"exists(registration_date) AND (NOT exists(first_purchase_ts) OR first_purchase_ts >= ({event_ts} - 86400 ))"
    
    return update_features(context, user_id, expression, condition)


def process_bet(context, event):
    user_id = event['user_id']
    event_time = event['event_time']
    event_ts = event_time_to_ts(event_time)
    
    bet_amount = event['bet_amount']

    sum_count_mean_var_expr = get_sum_count_mean_var_expr('bet', bet_amount)
    
    expression = sum_count_mean_var_expr
    condition = f"first_purchase_ts >= ({event_ts} - 86400 )"
    
    return update_features(context, user_id, expression, condition)


def process_win(context, event):
    user_id = event['user_id']
    event_time = event['event_time']
    event_ts = event_time_to_ts(event_time)
    
    win_amount = event['win_amount']

    sum_count_mean_var_expr = get_sum_count_mean_var_expr('win', win_amount)
    
    expression = sum_count_mean_var_expr
    condition = f"first_purchase_ts >= ({event_ts} - 86400 )"
    
    return update_features(context, user_id, expression, condition)


The following end-code annotation tells ```nuclio``` to stop parsing the notebook from this cell. _**Please do not remove this cell**_:

In [11]:
# nuclio: end-code
# marks the end of a code section

In [12]:
envs = {'V3IO_ACCESS_KEY': os.getenv('V3IO_ACCESS_KEY'),
        'FEATURE_TABLE_PATH': project.params.get('FEATURE_TABLE_PATH'),
        'SERVING_EVENTS':['bet','win'],
        'FEATURE_LIST':['socioeconomic_idx','purchase_sum','purchase_mean','purchase_count','purchase_var','bet_sum','bet_mean','bet_count','bet_var','win_sum','win_mean','win_count','win_var'],
        'CONTAINER': project.params.get('CONTAINER'),
        'OUTPUT_STREAM_PATH': output_stream_path,
        'PARTITION_ATTR': project.params.get('PARTITION_ATTR')}

## Test locally

In [13]:
for key, value in envs.items():
    os.environ[key] = str(value)
reg_event = nuclio.Event(body=b'{"user_id" : 111111 ,"affiliate_url":"aa.biz", "event_type": "registration", "postcode": 11014, "event_time": "2020-07-20 11:00:00","date_of_birth": "1970-03-03", "socioeconomic_idx": 3, "label":0}')
pur_event = nuclio.Event(body=b'{"user_id" : 111111 ,"amount": 3000, "event_type": "purchase", "event_time": "2020-07-20 11:00:00.009"}') 
bet_event = nuclio.Event(body=b'{"user_id" : 111111 ,"bet_amount": 300, "event_type": "bet", "event_time": "2020-07-20 11:00:00.889"}') 
init_context(context)
handler(context, reg_event)
handler(context, pur_event)
handler(context, bet_event)

Python> 2020-08-19 18:50:29,541 [info] Incoming event type: registration
Python> 2020-08-19 18:50:29,542 [info] Processing event {'user_id': 111111, 'affiliate_url': 'aa.biz', 'event_type': 'registration', 'postcode': 11014, 'event_time': '2020-07-20 11:00:00', 'date_of_birth': '1970-03-03', 'socioeconomic_idx': 3, 'label': 0}
Python> 2020-08-19 18:50:29,543 [info] Finished processing with status: 200 - and response body: b'' , event: {'user_id': 111111, 'affiliate_url': 'aa.biz', 'event_type': 'registration', 'postcode': 11014, 'event_time': '2020-07-20 11:00:00', 'date_of_birth': '1970-03-03', 'socioeconomic_idx': 3, 'label': 0}
Python> 2020-08-19 18:50:29,544 [info] Incoming event type: purchase
Python> 2020-08-19 18:50:29,544 [info] Processing event {'user_id': 111111, 'amount': 3000, 'event_type': 'purchase', 'event_time': '2020-07-20 11:00:00.009'}
Python> 2020-08-19 18:50:29,546 [info] Finished processing with status: 200 - and response body: b'' , event: {'user_id': 111111, 'am

# MLRun

In [14]:
from mlrun import code_to_function

gen_func = code_to_function(name='features', kind = 'nuclio')
project.set_function(gen_func)
features = project.func('features')
features.set_envs(envs)
features.add_trigger('incoming', nuclio.triggers.V3IOStreamTrigger(url=input_stream_url, access_key=os.getenv('V3IO_ACCESS_KEY'), maxWorkers=10))

<mlrun.runtimes.function.RemoteRuntime at 0x7f4f7644cd90>

In [15]:
project.save()

In [16]:
features.deploy()

> 2020-08-19 18:50:31,481 [info] deploy started
[nuclio] 2020-08-19 18:50:32,554 (info) Build complete
[nuclio] 2020-08-19 18:50:36,611 (info) Function deploy complete
[nuclio] 2020-08-19 18:50:36,616 done creating model-deployment-with-streaming-iguazio-features, function address: 3.131.87.251:30387


'http://3.131.87.251:30387'