# [ Chapter 12 - Overcoming Bias in Learned Relevance Models ]

## Chapter 12 setup

In chapter 12, we continue our work on a Learning to Rank solution. Evolving from a purely offline use of click-based training data to trying to explore potentially relevant items the users may find valuable. 

To setup, we

1. Fetch the retrotech data
2. Enable LTR
3. Define a few fields (different ways of analyzing the underlying retrotech text)
4. Define a list of 'promoted' products that our store wants to make prominent
5. Insert the retrotech product data via spark

In [1]:
from aips import get_ltr_engine, get_engine, display_product_search
import aips.indexer

engine = get_engine()

In [2]:
products_collection = aips.indexer.build_collection(engine, "products_with_promotions")
get_ltr_engine(products_collection).enable_ltr()

Wiping "products_with_promotions" collection
Creating "products_with_promotions" collection
Status: Success
Loading Products
Schema: 
root
 |-- upc: string (nullable = true)
 |-- name: string (nullable = true)
 |-- manufacturer: string (nullable = true)
 |-- short_description: string (nullable = true)
 |-- long_description: string (nullable = true)
 |-- has_promotion: boolean (nullable = true)

Adding LTR QParser for products_with_promotions collection
Adding LTR Doc Transformer for products_with_promotions collection
Successfully written 48194 documents
Adding LTR QParser for products_with_promotions collection
Adding LTR Doc Transformer for products_with_promotions collection


In [3]:
! cd data/retrotech/ && head products.csv

"upc","name","manufacturer","short_description","long_description"
"096009010836","Fists of Bruce Lee - Dolby - DVD", , , 
"043396061965","The Professional - Widescreen Uncut - DVD", , , 
"085391862024","Pokemon the Movie: 2000 - DVD", , , 
"067003016025","Summerbreeze - CD","Nettwerk", , 
"731454813822","Back for the First Time [PA] - CD","Def Jam South", , 
"024543008200","Big Momma's House - Widescreen - DVD", , , 
"031398751823","Kids - DVD", , , 
"037628413929","20 Grandes Exitos - CD","Sony Discos Inc.", , 
"060768972223","Power Of Trinity (Box) - CD","Sanctuary Records", , 


In [4]:
query = "Transformers"
request = {
    "query": query,
    "query_fields": ["name", "manufacturer", "long_description"],
    "return_fields": ["upc", "name", "manufacturer", "score"],
    "filters": [("has_promotion", True)],
    "limit": 5,
    "order_by": [("score", "desc"), ("upc", "asc")],
    "log": True
}

response = products_collection.search(**request)
print(response["docs"])
display_product_search(query, response["docs"])

{
  "query": "Transformers",
  "limit": 5,
  "params": {
    "qf": [
      "name",
      "manufacturer",
      "long_description"
    ],
    "log": true
  },
  "fields": [
    "upc",
    "name",
    "manufacturer",
    "score"
  ],
  "filter": [
    "has_promotion:True"
  ],
  "sort": "score desc, upc asc"
}
{
  "responseHeader": {
    "zkConnected": true,
    "status": 0,
    "QTime": 6
  },
  "response": {
    "numFound": 5,
    "start": 0,
    "maxScore": 3.3835273,
    "numFoundExact": true,
    "docs": [
      {
        "upc": "97360722345",
        "name": "Transformers/Transformers: Revenge of the Fallen: Two-Movie Mega Collection [2 Discs] - Widescreen - DVD",
        "manufacturer": " ",
        "score": 3.3835273
      },
      {
        "upc": "97360724240",
        "name": "Transformers: Revenge of the Fallen - Widescreen - DVD",
        "manufacturer": " ",
        "score": 3.1457326
      },
      {
        "upc": "400192926087",
        "name": "Transformers: Dark of the

## Download query sessions

Download simulated raw clickstream data

In [5]:
aips.indexer.download_data_files("signals")
! cd data/retrotech && head signals.csv

"query_id","user","type","target","signal_time"
"u2_0_1","u2","query","nook","2019-07-31 08:49:07.3116"
"u2_1_2","u2","query","rca","2020-05-04 08:28:21.1848"
"u3_0_1","u3","query","macbook","2019-12-22 00:07:07.0152"
"u4_0_1","u4","query","Tv antenna","2019-08-22 23:45:54.1030"
"u5_0_1","u5","query","AC power cord","2019-10-20 08:27:00.1600"
"u6_0_1","u6","query","Watch The Throne","2019-09-18 11:59:53.7470"
"u7_0_1","u7","query","Camcorder","2020-02-25 13:02:29.3089"
"u9_0_1","u9","query","wireless headphones","2020-04-26 04:26:09.7198"
"u10_0_1","u10","query","Xbox","2019-09-13 16:26:12.0132"


In [6]:
!ls data/repositories/retrotech/sessions/

'blue ray_sessions.gz'	 'lcd tv_sessions.gz'
 bluray_sessions.gz	  macbook_sessions.gz
 dryer_sessions.gz	  nook_sessions.gz
 headphones_sessions.gz  'star trek_sessions.gz'
 ipad_sessions.gz	 'star wars_sessions.gz'
 iphone_sessions.gz	 'transformers dark of the moon_sessions.gz'
 kindle_sessions.gz


Up next: [A/B Testing Simulation to Active Learning](1.ab-testing-to-active-learning.ipynb)