# Signals Boosting

NOTE: This notebook depends upon the the Retrotech dataset. If you have any issues, please rerun the [Setting up the Retrotech Dataset](1.setting-up-the-retrotech-dataset.ipynb) notebook.

In [1]:
import sys
sys.path.append('../..')
from aips import display_product_search, get_engine
from pyspark.sql import SparkSession
from aips.spark import create_view_from_collection
engine = get_engine()
spark = SparkSession.builder.appName("AIPS").getOrCreate()
products_collection = engine.get_collection("products")

## Keyword Search with No Signals Boosting

### Listing 4.5

In [2]:
# %load -s product_search_request aips/search_requests
def product_search_request(query, param_overrides={}):
    request = {"query": query,
               "query_fields": ["name", "manufacturer", "long_description"],
               "return_fields": ["upc", "name", "manufacturer",
                                 "short_description", "score"],
               "limit": 5,
               "order_by": [("score", "desc"), ("upc", "asc")]}
    return request | param_overrides

In [3]:
query = "ipad"
request = product_search_request(query)
response = products_collection.search(**request)
display_product_search(query, response["docs"])

## Create Signals Boosts (Signals Aggregation)

### Listing 4.6

In [4]:
signals_collection = engine.get_collection("signals")
print("Aggregating Signals to Create Signals Boosts...")
create_view_from_collection(signals_collection, "signals")

signals_aggregation_query = """
SELECT q.target AS query, c.target AS doc, COUNT(c.target) AS boost
FROM signals c LEFT JOIN signals q ON c.query_id = q.query_id
WHERE c.type = 'click' AND q.type = 'query'
GROUP BY query, doc
ORDER BY boost DESC"""

dataframe = spark.sql(signals_aggregation_query)
signals_boosting_collection = \
    engine.create_collection("signals_boosting")
signals_boosting_collection.write(dataframe)
print("Signals Aggregation Completed!")

Aggregating Signals to Create Signals Boosts...
Wiping "signals_boosting" collection
Creating "signals_boosting" collection
Status: Success
Successfully written 197168 documents
Signals Aggregation Completed!


## Search with Signals Boosts Applied

### Listing 4.7

In [11]:
def search_for_boosts(query, collection, query_field="query"):
    boosts_request = {"query": query,
                      "query_fields": [query_field],
                      "return_fields": ["query", "doc", "boost"],
                      "limit": 10,
                      "order_by": [("boost", "desc")]}
    response = collection.search(**boosts_request)
    return response["docs"]

def create_boosts_query(boost_documents):
    print(f"Boost Documents: \n{boost_documents}")
    boosts = " ".join([f'"{b["doc"]}"^{b["boost"]}' 
                       for b in boost_documents])
    print(f"\nBoost Query: \n{boosts}\n")
    display(boost_documents)
    return boosts

In [12]:
query = "ipad"
boost_docs = search_for_boosts(query, signals_boosting_collection)
boosts_query = create_boosts_query(boost_docs)
request = product_search_request(query)
request["query_boosts"] = boosts_query

response = products_collection.search(**request)
display_product_search(query, response["docs"])

Boost Documents: 
[{'query': 'ipad', 'doc': '885909457588', 'boost': 966}, {'query': 'ipad', 'doc': '885909457595', 'boost': 205}, {'query': 'ipad', 'doc': '885909471812', 'boost': 202}, {'query': 'ipad', 'doc': '886111287055', 'boost': 109}, {'query': 'ipad', 'doc': '843404073153', 'boost': 73}, {'query': 'ipad', 'doc': '635753493559', 'boost': 62}, {'query': 'ipad', 'doc': '885909457601', 'boost': 62}, {'query': 'ipad', 'doc': '885909472376', 'boost': 61}, {'query': 'ipad', 'doc': '610839379408', 'boost': 29}, {'query': 'ipad', 'doc': '884962753071', 'boost': 28}]

Boost Query: 
"885909457588"^966 "885909457595"^205 "885909471812"^202 "886111287055"^109 "843404073153"^73 "635753493559"^62 "885909457601"^62 "885909472376"^61 "610839379408"^29 "884962753071"^28



[{'query': 'ipad', 'doc': '885909457588', 'boost': 966},
 {'query': 'ipad', 'doc': '885909457595', 'boost': 205},
 {'query': 'ipad', 'doc': '885909471812', 'boost': 202},
 {'query': 'ipad', 'doc': '886111287055', 'boost': 109},
 {'query': 'ipad', 'doc': '843404073153', 'boost': 73},
 {'query': 'ipad', 'doc': '635753493559', 'boost': 62},
 {'query': 'ipad', 'doc': '885909457601', 'boost': 62},
 {'query': 'ipad', 'doc': '885909472376', 'boost': 61},
 {'query': 'ipad', 'doc': '610839379408', 'boost': 29},
 {'query': 'ipad', 'doc': '884962753071', 'boost': 28}]

## Success!

You have now implemented your first AI-powered search algorithm: Signals Boosting. This is an overly simplistic implementation (we'll dive much deeper into signals boosting improvements in chapter 8), but it demonstrates the power of leveraging reflected intelligence quite well. We will dive into other Reflected Intelligence techniques in future chapters, such as Collaborative Filtering (in chapter 9 - Personalized Search) and Machine-learned Ranking (in chapter 10 - Learning to Rank).

Up next: Chapter 5 - [Knowledge Graph Learning](../ch05/1.open-information-extraction.ipynb)

