# Opensearch User Behavior Insights (UBI)

### This notebook covers the basics around setting up UBI, ingesting data using the UBI plugin, and setting up a basic UBI opensearch dashboard

**Information regarding UBI:**

https://opensearch.org/docs/latest/search-plugins/ubi/index

https://github.com/opensearch-project/user-behavior-insights

In [1]:
from aips import get_engine, set_engine
from aips.spark.dataframe import from_sql
from aips.spark import create_view_from_collection
import tqdm
import aips.indexer
import requests, json
engine = get_engine("opensearch")

In [2]:
aips.indexer.build_collection(engine, "products")
aips.indexer.build_collection(engine, "signals")

Wiping "products" collection
Creating "products" collection
Loading Products
Schema: 
root
 |-- upc: string (nullable = true)
 |-- name: string (nullable = true)
 |-- manufacturer: string (nullable = true)
 |-- short_description: string (nullable = true)
 |-- long_description: string (nullable = true)

<Response [200]>
{'_shards': {'total': 2, 'successful': 1, 'failed': 0}}
Successfully written 48194 documents
Wiping "signals" collection
Creating "signals" collection
Loading data/retrotech/signals.csv
Schema: 
root
 |-- query_id: string (nullable = true)
 |-- user: string (nullable = true)
 |-- type: string (nullable = true)
 |-- target: string (nullable = true)
 |-- signal_time: timestamp (nullable = true)

<Response [200]>
{'_shards': {'total': 2, 'successful': 1, 'failed': 0}}
Successfully written 2172605 documents


<engines.opensearch.OpenSearchCollection.OpenSearchCollection at 0x7f6a94331090>

### **Step 1**: Install and configure the OpenSearch UBI plugin

To install UBI on an opensearch cluster, execute the following command on a node or during the building of an image. This command has already been run on the AIPS opensearch node.

**bin/opensearch-plugin install https://github.com/o19s/opensearch-ubi/releases/download/release-v0.0.12.1-os2.14.0/opensearch-ubi-plugin-v0.0.12.1-os2.14.0.zip --batch**

### **Step 2**: - Bulk ingesting historic signals

Historic user events and queries should be bulk ingested into the appropriate UBI collections.

Here we bulk write all AIPS queries into the `ubi_queries` collection.

In [3]:
def get_queries_dataframe():
    signals_collection = engine.get_collection("signals")
    create_view_from_collection(signals_collection, "signals")
    queries = from_sql("SELECT * FROM signals WHERE type = 'query'")
    queries_transformed = queries.rdd.map(lambda r: 
        (r["signal_time"], r["query_id"], r["user"], r["target"]))
    ubi_queries_dataframe = queries_transformed.toDF(
        ["timestamp", "query_id", "client_id", "user_query"])
    return ubi_queries_dataframe

In [4]:
def batch_ingest_queries():
    queries_collection = engine.create_collection("ubi_queries")
    ubi_queries_dataframe = get_queries_dataframe()
    queries_collection.write(ubi_queries_dataframe)
    return queries_collection

#This line commented as batch query ingestion is done in a 
#different manner later with the extension for examples sake.
#queries_collection = batch_ingest_queries()

Next we can index events into the `ubi_events` collection which is intended to hold all non-query signals

In [5]:
def get_events_dataframe():
    signals_collection = engine.get_collection("signals")
    products_collection = engine.get_collection("products")
    create_view_from_collection(signals_collection, "signals")
    create_view_from_collection(products_collection, "products")
    query = """SELECT REPLACE(type, '-', '_') AS action_name, query_id, user AS client_id,
                      signal_time AS timestamp, type AS message_type,
                      target AS target, p.name AS message
               FROM signals s
               LEFT JOIN products p ON s.target == p.upc
               WHERE type != 'query'"""
    events = from_sql(query)
    return events

In [6]:
def batch_ingest_signals():
    events_collection = engine.create_collection("ubi_events")
    ubi_events_dataframe = get_events_dataframe()
    events_collection.write(ubi_events_dataframe)
    return events_collection

events_collection = batch_ingest_signals()

Wiping "ubi_events" collection
Creating "ubi_events" collection
<Response [200]>
{'_shards': {'total': 2, 'successful': 1, 'failed': 0}}
Successfully written 1447146 documents


### **Step 3**: Live logging of queries and events

Queries and events must be ingested correctly and with complete data into UBI for best results. UBI stores queries seperate from other events, each in their respective collection `ubi_queries` and `ubi_events`. Live signal data collection should be hooked into the appropriate places in your stack.

Logging event data is as simple as writing an event document directly to the ubi_events collection. 

In [7]:
def add_example_event_to_ubi():
    collection = "products"
    event_doc = {"action_name": "purchase", #This is a name of the type of event/action that occurred
                 "client_id": "uid_000001", #This is id of the user/session taking the action
                 "message_type": "one_click_buy", #An additional action type, used for further action grouping
                 "message": "Succeeded", #An optional message string for the event
                 "query_id": "qid_000001", #The id of the query that led to this action
                 "target": "pid_000001"} #Any string representing the target of the action. Normally a doc/item id?

    response = requests.post(f"http://opensearch-node1:9200/ubi_events/_doc?",
                             json=event_doc)
    display(response.json())

add_example_event_to_ubi()

{'_index': 'ubi_events',
 '_id': 'fFAxMJkB4ljHNAOlcSCq',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 1447146,
 '_primary_term': 1}

Queries should be collected at query time utilizing the UBI extension request handler. Here is an example of ingesting query data by adding an `ubi` property to the `ext` object during a search request:

In [8]:
def execute_example_query_with_ubi():        
    collection = "products"
    query = "cable"
    ubi_extension_data = {"ubi": {"query_id": "qid_000001",
                                  "client_id": "cid_000001",
                                  "user_query": query}}
    search_request = {
        "query": {"query_string": {"query": query,
                                   "fields": ["name", "manufacturer",
                                              "long_description", "short_description"]}},
        "size": 11, 
        "fields": ["*"],
        "ext": ubi_extension_data
    }

    response = requests.post(f"http://opensearch-node1:9200/{collection}/_search?",
                             json=search_request)
    display(response.json())

execute_example_query_with_ubi()

{'took': 5,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1165, 'relation': 'eq'},
  'max_score': 7.091094,
  'hits': [{'_index': 'products',
    '_id': '50644382727',
    '_score': 7.091094,
    '_source': {'upc': '50644382727',
     'name': "Monster Cable - 50' Mini-Spool Speaker Cable",
     'manufacturer': 'Monster Cable',
     'short_description': "Navajo white speaker cable; 50' length; special LPE insulation reduces signal loss",
     'long_description': 'The Magnetic Flux Tube construction and special cable windings provide natural music reproduction with impressive clarity, bass response and dynamic range in a compact design. Special LPE insulation reduces signal loss and distortion. Paintable Navajo white jacket matches all interiors.'},
    'fields': {'short_description': ["Navajo white speaker cable; 50' length; special LPE insulation reduces signal loss"],
     'name': ["Monster Cable - 50' Mini-Spoo

Notice UBI information is returned on the search response object with at least the ubi signal id linking to the ingested query. 

The following code will load all query signals into UBI by simulating user searches. This serves as a batch import of data for examples sake. Batch importing should normally just be done by batch indexing query signals directly into `ubi_queries` directly as shown earlier.


In [9]:
def execute_search(collection, signal, log=False):
    signal.pop("timestamp", None) #The timestamp of a query is the time of search and cannot be passed in
    request = {"query": signal["user_query"],
               "query_fields": ["name", "manufacturer",
                                "long_description", "short_description"],
               "return_fields": ["*"],
               "limit": 10,
               "ubi": signal | {"store_name": "aips_store"}}
    try:
        return collection.search(**request)
    except:
        pass

def search_and_log_all_query_signals():
    products_collection = engine.get_collection("products")
    ubi_queries_dataframe = get_queries_dataframe()
    for q in tqdm.tqdm(ubi_queries_dataframe.collect(), total=ubi_queries_dataframe.count()):
        execute_search(products_collection, q.asDict())

#search_and_log_all_query_signals()

### Loading UBI queries and events into AIPS

If you wish to load in UBI queries/events from your Opensearch cluster to work with the book, you can do so with the following code

In [10]:
def load_ubi_events_as_aips_dataframe():
    ubi_events_collection = engine.get_collection("ubi_events")
    create_view_from_collection(ubi_events_collection, "ubi_events")
    events = from_sql("SELECT * FROM ubi_events")
    events_transformed = events.rdd.map(lambda r: 
        (r["timestamp"], r["query_id"], r["client_id"],
         r["message"], r["message_type"]))
    return events_transformed.toDF(["signal_time", "query_id", "user", "target", "type"])

def load_ubi_queries_as_aips_dataframe():
    ubi_queries_collection = engine.get_collection("ubi_queries")
    create_view_from_collection(ubi_queries_collection, "ubi_queries")
    queries = from_sql("SELECT * FROM ubi_queries")
    queries_transformed = queries.rdd.map(lambda r: 
        (r["timestamp"], r["query_id"], r["client_id"],
         r["user_query"], "query"))
    return queries_transformed.toDF(["signal_time", "query_id", "user", "target", "type"])

def create_signals_collection_with_ubi_data():
    signals_collection = engine.create_collection("signals")
    events = load_ubi_events_as_aips_dataframe()
    queries = load_ubi_queries_as_aips_dataframe()
    signals_collection.write(queries)
    signals_collection.write(events, overwrite=False)
    return signals_collection

signals_collection = create_signals_collection_with_ubi_data()

Wiping "signals" collection
Creating "signals" collection


Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 28.0 failed 1 times, most recent failure: Lost task 0.0 in stage 28.0 (TID 159) (ad2e8ea97b1e executor driver): java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of string
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, client_id), StringType, true), true, false, true) AS client_id#381
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, query), StringType, true), true, false, true) AS query#382
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, query_id), StringType, true), true, false, true) AS query_id#383
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 3, query_response_hit_ids), StringType, true), true, false, true) AS query_response_hit_ids#384
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 4, query_response_id), StringType, true), true, false, true) AS query_response_id#385
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, TimestampType, anyToMicros, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 5, timestamp), TimestampType, true), true, false, true) AS timestamp#386
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 6, user_query), StringType, true), true, false, true) AS user_query#387
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else newInstance(class org.apache.spark.sql.catalyst.util.ArrayBasedMapData) AS _metadata#388
	at org.apache.spark.sql.errors.QueryExecutionErrors$.expressionEncodingError(QueryExecutionErrors.scala:1237)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:210)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:193)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.ContextAwareIterator.hasNext(ContextAwareIterator.scala:39)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:1160)
	at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:1176)
	at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1213)
	at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1217)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.writeIteratorToStream(PythonUDFRunner.scala:53)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:438)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2066)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:272)
Caused by: java.lang.RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of string
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.StaticInvoke_3$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_1$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:207)
	... 22 more

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2860)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2228)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2249)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2268)
	at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:166)
	at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of string
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, client_id), StringType, true), true, false, true) AS client_id#381
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, query), StringType, true), true, false, true) AS query#382
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, query_id), StringType, true), true, false, true) AS query_id#383
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 3, query_response_hit_ids), StringType, true), true, false, true) AS query_response_hit_ids#384
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 4, query_response_id), StringType, true), true, false, true) AS query_response_id#385
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, TimestampType, anyToMicros, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 5, timestamp), TimestampType, true), true, false, true) AS timestamp#386
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 6, user_query), StringType, true), true, false, true) AS user_query#387
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else newInstance(class org.apache.spark.sql.catalyst.util.ArrayBasedMapData) AS _metadata#388
	at org.apache.spark.sql.errors.QueryExecutionErrors$.expressionEncodingError(QueryExecutionErrors.scala:1237)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:210)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:193)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.ContextAwareIterator.hasNext(ContextAwareIterator.scala:39)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:1160)
	at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:1176)
	at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1213)
	at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1217)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.writeIteratorToStream(PythonUDFRunner.scala:53)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:438)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2066)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:272)
Caused by: java.lang.RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of string
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.StaticInvoke_3$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_1$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:207)
	... 22 more


### Creating and viewing the UBI Dashboard

The following code will import the default Dashboard objects. The dashboard can be viewed here

http://opensearch-aips:5601/app/dashboards


In [None]:
def import_ubi_dashboard():
    with open("./engines/opensearch/build/ubi-dashboard-objects.ndjson", "rb") as f: 
        dashboard_ndjson = f.read()
    response = requests.post(f"http://opensearch-dashboards:5601/api/saved_objects/_import?createNewCopies=true",
                            files={"file": ("request.ndjson", dashboard_ndjson)},
                            headers={"kbn-xsrf": "true",
                                     "osd-version": "2.14.0",
                                     "osd-xsrf": "osd-fetch"})
    display(response.json())

import_ubi_dashboard()

{'successCount': 6,
 'success': True,
 'successResults': [{'type': 'index-pattern',
   'id': '7d14f3e4-c873-4ff0-ba62-c5b741d2ac6b',
   'meta': {'title': 'ubi_*', 'icon': 'indexPatternApp'},
   'destinationId': '8920b2d7-957b-4a15-b179-3e82fa5a3fca'},
  {'type': 'visualization',
   'id': '1391fd2c-18f3-4b9f-85e7-799da34bcf1d',
   'meta': {'title': 'all ubi messages', 'icon': 'visualizeApp'},
   'destinationId': '4690f32b-797c-4e36-af08-ae1f5a146120'},
  {'type': 'visualization',
   'id': '789b6480-d667-11ef-96b9-a3e177a902a3',
   'meta': {'title': 'Searches', 'icon': 'visualizeApp'},
   'destinationId': '5b0ae97d-a9d0-4700-bd6b-16837f35bc00'},
  {'type': 'index-pattern',
   'id': 'b8544e15-0471-497e-a4c8-7696a83fcd84',
   'meta': {'title': 'ubi_events', 'icon': 'indexPatternApp'},
   'destinationId': 'f60c6c43-4ecb-4971-bf4d-715fa3673b7c'},
  {'type': 'visualization',
   'id': 'f2e2cc60-d667-11ef-96b9-a3e177a902a3',
   'meta': {'title': 'Event types', 'icon': 'visualizeApp'},
   'desti