# Opensearch User Behavior Insights (UBI)

### This notebook covers the basics around setting up UBI, ingesting data using the UBI plugin, and setting up a basic UBI opensearch dashboard.
### Note: Since OpenSearch 3.0.0 the plugin no longer needs to be installed, but it must for any OpenSearch 2.x

**Information regarding UBI:**

https://opensearch.org/docs/latest/search-plugins/ubi/index

https://github.com/opensearch-project/user-behavior-insights

In [1]:
from aips import get_engine, set_engine
from engines.opensearch.config import OPENSEARCH_URL
from aips.spark.dataframe import from_sql
from aips.spark import create_view_from_collection
import tqdm
import aips.indexer
import requests, json
engine = get_engine("opensearch")


In [2]:
aips.indexer.build_collection(engine, "products")
aips.indexer.build_collection(engine, "signals")

Wiping "products" collection
Creating "products" collection
Loading Products
Schema: 
root
 |-- upc: string (nullable = true)
 |-- name: string (nullable = true)
 |-- manufacturer: string (nullable = true)
 |-- short_description: string (nullable = true)
 |-- long_description: string (nullable = true)

<Response [200]>
{'_shards': {'total': 2, 'successful': 1, 'failed': 0}}
Successfully written 48194 documents


<engines.opensearch.OpenSearchCollection.OpenSearchCollection at 0x7f5b6b7ca170>

### **Step 1**: Install UBI plugin (no longer required for OpenSearch 3.0.0)

To install UBI on an opensearch cluster, execute the following command on a node or during the building of an image. This command has already been run on the AIPS opensearch node.

**bin/opensearch-plugin install https://github.com/opensearch-project/user-behavior-insights/releases/download/2.18.0.0/opensearch-ubi-2.18.0.0.zip --batch**

### **Step 2**: - Initialize the UBI collections

In [3]:
requests.post(f"{OPENSEARCH_URL}/_plugins/ubi/initialize").json()

{'message': 'UBI indexes created.'}

### **Step 3**: - Bulk ingesting historic signals

Historic user events and queries should be bulk ingested into the appropriate UBI collections.

Here we bulk write all AIPS queries into the `ubi_queries` collection.

In [None]:
def get_queries_dataframe():
    signals_collection = engine.get_collection("signals")
    create_view_from_collection(signals_collection, "signals")
    queries = from_sql("SELECT * FROM signals WHERE type = 'query'")
    queries_transformed = queries.rdd.map(lambda r: 
        (r["signal_time"], r["query_id"], r["user"], r["target"]))
    ubi_queries_dataframe = queries_transformed.toDF(
        ["timestamp", "query_id", "client_id", "user_query"])
    return ubi_queries_dataframe

In [None]:
def batch_ingest_queries():
    #This function is not called, but is an example of batch loading.
    #Queries are ingested later in the ideal manner 
    queries_collection = engine.get_collection("ubi_queries")
    ubi_queries_dataframe = get_queries_dataframe()
    queries_collection.write(ubi_queries_dataframe)
    return queries_collection

#queries_collection = batch_ingest_queries()

<Response [200]>
{'_shards': {'total': 2, 'successful': 1, 'failed': 0}}
Successfully written 725459 documents


Next we can index events into the `ubi_events` collection which is intended to hold all non-query signals

In [9]:
def get_events_dataframe():
    signals_collection = engine.get_collection("signals")
    products_collection = engine.get_collection("products")
    create_view_from_collection(signals_collection, "signals")
    create_view_from_collection(products_collection, "products")
    query = """SELECT REPLACE(type, '-', '_') AS action_name, query_id, user AS client_id,
                      signal_time AS timestamp, type AS message_type,
                      target AS target, p.name AS message
               FROM signals s
               LEFT JOIN products p ON s.target == p.upc
               WHERE type != 'query'"""
    events = from_sql(query)
    return events

In [10]:
def batch_ingest_signals():
    events_collection = engine.get_collection("ubi_events")
    ubi_events_dataframe = get_events_dataframe()
    events_collection.write(ubi_events_dataframe)
    return events_collection

events_collection = batch_ingest_signals()

<Response [200]>
{'_shards': {'total': 2, 'successful': 1, 'failed': 0}}
Successfully written 1447146 documents


### **Step 4**: Live logging of queries and events

Queries and events must be ingested correctly and with complete data into UBI for best results. UBI stores queries seperate from other events, each in their respective collection `ubi_queries` and `ubi_events`. Live signal data collection should be hooked into the appropriate places in your stack.

Logging event data is as simple as writing an event document directly to the ubi_events collection. 

In [11]:
from  datetime import datetime

def add_example_event_to_ubi():
    collection = "products"
    event_doc = {"action_name": "purchase", #This is a name of the type of event/action that occurred
                 "client_id": "uid_000001", #This is id of the user/session taking the action
                 "message_type": "one_click_buy", #An additional action type, used for further action grouping
                 "message": "Succeeded", #An optional message string for the event
                 "query_id": "qid_000001", #The id of the query that led to this action
                 "target": "pid_000001", #Any string representing the target of the action. Normally a doc/item id?
                 "timestamp": datetime.now().microsecond} #The timestamp of the event in epoch_millis, if not passed becomes the current time

    response = requests.post(f"http://opensearch-node1:9200/ubi_events/_doc?",
                             json=event_doc)
    display(response.json())

add_example_event_to_ubi()

{'_index': 'ubi_events',
 '_id': 'SmJwPpkBhwftUxCERzq-',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 4375593,
 '_primary_term': 5}

Queries should be collected at query time utilizing the UBI extension request handler. Here is an example of ingesting query data by adding an `ubi` property to the `ext` object during a search request:

In [12]:
def execute_example_query_with_ubi():        
    collection = "products"
    query = "cable"
    ubi_extension_data = {"ubi": {"query_id": "qid_000001",
                                  "client_id": "cid_000001",
                                  "user_query": query}}
    search_request = {
        "query": {"query_string": {"query": query,
                                   "fields": ["name", "manufacturer",
                                              "long_description", "short_description"]}},
        "size": 11, 
        "fields": ["*"],
        "ext": ubi_extension_data
    }

    response = requests.post(f"http://opensearch-node1:9200/{collection}/_search?",
                             json=search_request)
    display(response.json())

execute_example_query_with_ubi()

{'took': 27,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1165, 'relation': 'eq'},
  'max_score': 7.091094,
  'hits': [{'_index': 'products',
    '_id': '50644382727',
    '_score': 7.091094,
    '_source': {'upc': '50644382727',
     'name': "Monster Cable - 50' Mini-Spool Speaker Cable",
     'manufacturer': 'Monster Cable',
     'short_description': "Navajo white speaker cable; 50' length; special LPE insulation reduces signal loss",
     'long_description': 'The Magnetic Flux Tube construction and special cable windings provide natural music reproduction with impressive clarity, bass response and dynamic range in a compact design. Special LPE insulation reduces signal loss and distortion. Paintable Navajo white jacket matches all interiors.'},
    'fields': {'short_description': ["Navajo white speaker cable; 50' length; special LPE insulation reduces signal loss"],
     'name': ["Monster Cable - 50' Mini-Spo

Notice UBI information is returned on the search response object with at least the ubi signal id linking to the ingested query. 

The following code will load all query signals into UBI by simulating user searches. This serves as a batch import of data for examples sake. Batch importing should normally just be done by batch indexing query signals directly into `ubi_queries` directly as shown earlier.


In [None]:
def execute_search(collection, signal, log=False):
    signal.pop("timestamp", None) #The timestamp of a query is the time of search and cannot be passed in
    request = {"query": signal["user_query"],
               "query_fields": ["name", "manufacturer",
                                "long_description", "short_description"],
               "return_fields": ["*"],
               "limit": 10,
               "ubi": signal | {"store_name": "aips_store"}}
    try:
        return collection.search(**request)
    except:
        pass

def search_and_log_all_query_signals():
    products_collection = engine.get_collection("products")
    ubi_queries_dataframe = get_queries_dataframe()
    for q in tqdm.tqdm(ubi_queries_dataframe.collect(), total=ubi_queries_dataframe.count()):
        execute_search(products_collection, q.asDict())

search_and_log_all_query_signals()

### Loading UBI queries and events into AIPS

If you wish to load in UBI queries/events from your Opensearch cluster to work with the book, you can do so with the following code

In [None]:
def load_ubi_events_as_aips_dataframe():
    ubi_events_collection = engine.get_collection("ubi_events")
    create_view_from_collection(ubi_events_collection, "ubi_events")
    events = from_sql("SELECT * FROM ubi_events")
    events_transformed = events.rdd.map(lambda r: 
        (r["timestamp"], r["query_id"], r["client_id"],
         r["message"], r["message_type"]))
    return events_transformed.toDF(["signal_time", "query_id", "user", "target", "type"])

def load_ubi_queries_as_aips_dataframe():
    ubi_queries_collection = engine.get_collection("ubi_queries")
    create_view_from_collection(ubi_queries_collection, "ubi_queries")
    queries = from_sql("SELECT timestamp, query_id, client_id, user_query, query FROM ubi_queries")
    queries_transformed = queries.rdd.map(lambda r: 
        (r["timestamp"], r["query_id"], r["client_id"],
         r["user_query"], "query"))
    return queries_transformed.toDF(["signal_time", "query_id", "user", "target", "type"])

def create_signals_collection_with_ubi_data():
    signals_collection = engine.create_collection("signals")
    events = load_ubi_events_as_aips_dataframe()
    queries = load_ubi_queries_as_aips_dataframe()
    signals_collection.write(queries)
    signals_collection.write(events, overwrite=False)
    return signals_collection

signals_collection = create_signals_collection_with_ubi_data()

Wiping "signals" collection
Creating "signals" collection
<Response [200]>
{'_shards': {'total': 2, 'successful': 1, 'failed': 0}}
Successfully written 725460 documents
<Response [200]>
{'_shards': {'total': 2, 'successful': 1, 'failed': 0}}
Successfully written 1447149 documents


### Creating and viewing the UBI Dashboard

The following code will import the default UBI dashboard objects. The dashboard can be viewed here

http://opensearch-aips:5601/app/dashboards


In [None]:
def import_ubi_dashboard():
    with open("./engines/opensearch/build/ubi-dashboard-objects.ndjson", "rb") as f: 
        dashboard_ndjson = f.read()
    response = requests.post(f"http://opensearch-dashboards:5601/api/saved_objects/_import?createNewCopies=true",
                            files={"file": ("request.ndjson", dashboard_ndjson)},
                            headers={"kbn-xsrf": "true",
                                     "osd-version": "2.18.0",
                                     "osd-xsrf": "osd-fetch"})
    display(response.json())

import_ubi_dashboard()

{'successCount': 6,
 'success': True,
 'successResults': [{'type': 'index-pattern',
   'id': '7d14f3e4-c873-4ff0-ba62-c5b741d2ac6b',
   'meta': {'title': 'ubi_*', 'icon': 'indexPatternApp'},
   'destinationId': '8920b2d7-957b-4a15-b179-3e82fa5a3fca'},
  {'type': 'visualization',
   'id': '1391fd2c-18f3-4b9f-85e7-799da34bcf1d',
   'meta': {'title': 'all ubi messages', 'icon': 'visualizeApp'},
   'destinationId': '4690f32b-797c-4e36-af08-ae1f5a146120'},
  {'type': 'visualization',
   'id': '789b6480-d667-11ef-96b9-a3e177a902a3',
   'meta': {'title': 'Searches', 'icon': 'visualizeApp'},
   'destinationId': '5b0ae97d-a9d0-4700-bd6b-16837f35bc00'},
  {'type': 'index-pattern',
   'id': 'b8544e15-0471-497e-a4c8-7696a83fcd84',
   'meta': {'title': 'ubi_events', 'icon': 'indexPatternApp'},
   'destinationId': 'f60c6c43-4ecb-4971-bf4d-715fa3673b7c'},
  {'type': 'visualization',
   'id': 'f2e2cc60-d667-11ef-96b9-a3e177a902a3',
   'meta': {'title': 'Event types', 'icon': 'visualizeApp'},
   'desti