# Signals Boosting

NOTE: This notebook depends upon the the Retrotech dataset. If you have any issues, please rerun the [Setting up the Retrotech Dataset](1.setting-up-the-retrotech-dataset.ipynb) notebook.

In [2]:
from pyspark.ml.feature import IndexToString, StringIndexer

from aips import display_product_search, get_engine, set_engine
from aips.spark import create_view_from_collection, get_spark_session

engine = get_engine("weaviate")
spark = get_spark_session()
products_collection = engine.get_collection("products")

## Keyword Search with No Signals Boosting

### Listing 4.5

In [4]:
# %load -s product_search_request aips/search_requests
def product_search_request(query, param_overrides={}):
    request = {"query": query,
               "query_fields": ["name", "manufacturer", "long_description"],
               "return_fields": ["upc", "name", "manufacturer",
                                 "short_description", "score"],
               "limit": 5,
               "order_by": [("score", "desc"), ("upc", "asc")]}
    return request | param_overrides

In [5]:
query = "ipad"
request = product_search_request(query)
response = products_collection.search(**request)
display_product_search(query, response["docs"])

['"name"', '"manufacturer"', '"long_description"']


## Create Signals Boosts (Signals Aggregation)

### Listing 4.6

In [6]:
import tqdm
adsf
collection = engine.get_collection("products")
fields = collection.get_collection_field_names()
fields.append("id") 
request = {"return_fields": fields,
           "limit": 1000}
all_documents = []
while True:
    #print(request["page"])
    docs = collection.search(**request)["docs"]
    all_documents.extend(docs)
    if len(docs) != request["limit"]:
        break
    request["after"] = docs[request["limit"] - 1]["id"]
print(len(all_documents))
dataframe = spark.createDataFrame(data=all_documents)
dataframe.createOrReplaceTempView("Signals")

NameError: name 'adsf' is not defined

In [7]:
signals_collection = engine.get_collection("signals")
print("Aggregating Signals to Create Signals Boosts...")
create_view_from_collection(signals_collection, "signals")


#dataframe = spark.sql(signals_aggregation_query)
#display(dataframe.count())

Aggregating Signals to Create Signals Boosts...
2172605


In [8]:
signals_aggregation_query = """
SELECT q.target AS query, c.target AS doc, COUNT(c.target) AS boost
FROM signals c LEFT JOIN signals q ON c.query_id = q.query_id
WHERE c.type = 'click' AND q.type = 'query'
GROUP BY q.target, c.target
ORDER BY boost DESC"""

dataframe = spark.sql(signals_aggregation_query)
display(dataframe.count())

197168

In [9]:
signals_aggregation_query = """SELECT * from signals"""
clicks = "SELECT * from signals where type != 'click' and type != 'query'"
dataframe = spark.sql(clicks)
display(dataframe.count())
dataframe.show()

799462

+--------------------+-----------+--------------------+------------+-----------+-------+
|                  id|   query_id|         signal_time|      target|       type|   user|
+--------------------+-----------+--------------------+------------+-----------+-------+
|00000ff8-e249-49d...|u397190_0_1|2019-07-13T00:00:00Z|602527767062|   purchase|u397190|
|000024e4-d5ff-48a...| u25636_0_1|2019-12-16T00:00:00Z|619659000059|   purchase| u25636|
|00002dc9-0f47-450...| u49705_0_1|2019-08-27T00:00:00Z|885909398577|add-to-cart| u49705|
|00003e81-9b9b-413...|u402900_0_1|2019-09-07T00:00:00Z|751492421933|add-to-cart|u402900|
|00004dd8-1baf-4b1...|u677704_0_1|2020-03-08T00:00:00Z|885631694015|add-to-cart|u677704|
|00005d55-8256-4a7...|u251801_0_1|2019-12-18T00:00:00Z|031742351952|   purchase|u251801|
|00009fb0-d52e-4cd...|u428275_0_1|2020-02-05T00:00:00Z|883974828098|add-to-cart|u428275|
|0000c14c-a45b-4c7...|u514385_0_1|2020-05-07T00:00:00Z|023942953104|   purchase|u514385|
|0000de13-15da-459...

In [10]:
signals_boosting_collection = \
    engine.create_collection("signals_boosting", log=True)
signals_boosting_collection.write(dataframe)
print("Signals Aggregation Completed!")

Wiping "signals_boosting" collection
Creating "signals_boosting" collection
Schema: {
  "class": "signals_boosting",
  "properties": [
    {
      "name": "query",
      "dataType": [
        "text"
      ]
    },
    {
      "name": "doc",
      "dataType": [
        "text"
      ]
    },
    {
      "name": "boost",
      "dataType": [
        "int"
      ]
    }
  ]
}
Status: <Response [200]>
Response: <Response [200]>


AnalysisException: 
Cannot write to 'Signals_boosting', too many data columns:
Table columns: 'query', 'doc', 'boost'
Data columns: '__id', 'query_id', 'signal_time', 'target', 'type', 'user'
       

## Search with Signals Boosts Applied

### Listing 4.7

In [None]:
def search_for_boosts(query, collection, query_field="query"):
    boosts_request = {"query": query,
                      "query_fields": [query_field],
                      "return_fields": ["query", "doc", "boost"],
                      "limit": 10,
                      "order_by": [("boost", "desc")]}
    response = collection.search(**boosts_request)
    return response["docs"]

def create_boosts_query(boost_documents):
    print(f"Boost Documents: \n{boost_documents}")
    boosts = " ".join([f'"{b["doc"]}"^{b["boost"]}' 
                       for b in boost_documents])
    print(f"\nBoost Query: \n{boosts}\n")
    display(boost_documents)
    return boosts

In [None]:
query = "ipad"
boost_docs = search_for_boosts(query, signals_boosting_collection)
boosts_query = create_boosts_query(boost_docs)
request = product_search_request(query)
request["query_boosts"] = boosts_query

response = products_collection.search(**request)
display_product_search(query, response["docs"])

Boost Documents: 
[{'query': 'ipad', 'doc': '885909457588', 'boost': 966, 'id': '_GToPJMBEsnUwT6Zy8mT', 'score': None}, {'query': 'ipad', 'doc': '885909457595', 'boost': 205, 'id': 'OWToPJMBEsnUwT6Zy8uU', 'score': None}, {'query': 'ipad', 'doc': '885909471812', 'boost': 202, 'id': 'aWToPJMBEsnUwT6Zy8uU', 'score': None}, {'query': 'ipad', 'doc': '886111287055', 'boost': 109, 'id': '-GToPJMBEsnUwT6Zy8yV', 'score': None}, {'query': 'ipad', 'doc': '843404073153', 'boost': 73, 'id': 'CWToPJMBEsnUwT6Zy87l', 'score': None}, {'query': 'ipad', 'doc': '635753493559', 'boost': 62, 'id': 'zGToPJMBEsnUwT6Zy87l', 'score': None}, {'query': 'ipad', 'doc': '885909457601', 'boost': 62, 'id': '1GToPJMBEsnUwT6Zy87l', 'score': None}, {'query': 'ipad', 'doc': '885909472376', 'boost': 61, 'id': '-GToPJMBEsnUwT6Zy87l', 'score': None}, {'query': 'ipad', 'doc': '610839379408', 'boost': 29, 'id': 'nWToPJMBEsnUwT6ZzN5v', 'score': None}, {'query': 'ipad', 'doc': '884962753071', 'boost': 28, 'id': 'XGToPJMBEsnUwT6Z

[{'query': 'ipad',
  'doc': '885909457588',
  'boost': 966,
  'id': '_GToPJMBEsnUwT6Zy8mT',
  'score': None},
 {'query': 'ipad',
  'doc': '885909457595',
  'boost': 205,
  'id': 'OWToPJMBEsnUwT6Zy8uU',
  'score': None},
 {'query': 'ipad',
  'doc': '885909471812',
  'boost': 202,
  'id': 'aWToPJMBEsnUwT6Zy8uU',
  'score': None},
 {'query': 'ipad',
  'doc': '886111287055',
  'boost': 109,
  'id': '-GToPJMBEsnUwT6Zy8yV',
  'score': None},
 {'query': 'ipad',
  'doc': '843404073153',
  'boost': 73,
  'id': 'CWToPJMBEsnUwT6Zy87l',
  'score': None},
 {'query': 'ipad',
  'doc': '635753493559',
  'boost': 62,
  'id': 'zGToPJMBEsnUwT6Zy87l',
  'score': None},
 {'query': 'ipad',
  'doc': '885909457601',
  'boost': 62,
  'id': '1GToPJMBEsnUwT6Zy87l',
  'score': None},
 {'query': 'ipad',
  'doc': '885909472376',
  'boost': 61,
  'id': '-GToPJMBEsnUwT6Zy87l',
  'score': None},
 {'query': 'ipad',
  'doc': '610839379408',
  'boost': 29,
  'id': 'nWToPJMBEsnUwT6ZzN5v',
  'score': None},
 {'query': 'ipa

## Success!

You have now implemented your first AI-powered search algorithm: Signals Boosting. This is an overly simplistic implementation (we'll dive much deeper into signals boosting improvements in chapter 8), but it demonstrates the power of leveraging reflected intelligence quite well. We will dive into other Reflected Intelligence techniques in future chapters, such as Collaborative Filtering (in chapter 9 - Personalized Search) and Machine-learned Ranking (in chapter 10 - Learning to Rank).

Up next: Chapter 5 - [Knowledge Graph Learning](../ch05/1.open-information-extraction.ipynb)

