# A/B Testing Simulation to Active Learning

In this notebook, users have a hidden preference for a single query. We use this to explore A/B testing to see whether a given LTR model actually gives the users what they want.

Then we ask, much like in real life, how can we learn what the user _actually_ wants? We employe active learning to try to escape the 'echo chamber' of presentation bias we learned about at the end of chapter 11. After all users can't click on results that never show up in their search results!

## 🚨 We're putting it all together in this chapter

As this chapter puts together everything from chapters 10 and 11, much of the setup code below wraps up a lot of chapter 11 and 10 into a 'single function' so we can very easily run through the steps in 'one liners'

### Getting training data (Ch 11)

Chapter 11 is all about turning raw clickstream data into search training data (aka judgments). This involves overcoming biases in how users percieve search. But here we put that in one function call `calculate_sdbn`.

### Train a model (Ch 10)

Chapter 10 is about training an LTR model, including interacting with Solr to extract features, how a ranking model works, how to train a model, and how to perform a good test/train split for search. But here we similarly wrap that up into a handful of function calls, `split_training_data`, and `evaluate_model`.

*long story short, if you see a reference to chapter 10 and 11, it's probably omited from chapter 12* - don't expect it to be covered in chapter 12 extensively.


## Setup - gather some sessions (omitted)

To get started, we first load a set of simulated search sessions for all queries. 

Much of this setup is omitted from the chapter. This first part is just loading and synthesizing a bunch of clickstream sessions, like we used in chapter 11.

In [1]:
import sys

sys.path.append('../..')
import glob
import time

import numpy
import pandas
from aips import *
import random; random.seed(0)

engine = get_engine()
products_collection = engine.get_collection("products")
ltr = get_ltr_engine(products_collection)

In [2]:
signals_upcs_to_omit = [600603132872, 600603125065, 600603141003, 600603139758,
                        600603133237, 600603123061, 600603140631, 600603124570,
                        600603132827, 600603135101]

def all_sessions():
    sessions = pandas.concat([pandas.read_csv(f, compression='gzip')
                          for f in glob.glob('retrotech/sessions/*_sessions.gz')])
    sessions = sessions.sort_values(['query', 'sess_id', 'rank'])
    sessions = sessions.rename(columns={'clicked_doc_id': 'doc_id'})
    return sessions[~sessions["doc_id"].isin(signals_upcs_to_omit)]
    
sessions = all_sessions()
sessions

Unnamed: 0,sess_id,query,rank,doc_id,clicked
1,50002,blue ray,1.0,827396513927,False
2,50002,blue ray,2.0,24543672067,False
3,50002,blue ray,3.0,719192580374,False
4,50002,blue ray,4.0,885170033412,True
5,50002,blue ray,5.0,58231300826,False
...,...,...,...,...,...
74995,5001,transformers dark of the moon,10.0,47875841369,False
74996,5001,transformers dark of the moon,11.0,97363560449,False
74997,5001,transformers dark of the moon,12.0,93624956037,False
74998,5001,transformers dark of the moon,13.0,97363532149,False


In [3]:
sessions["query"].unique()

array(['blue ray', 'bluray', 'dryer', 'headphones', 'ipad', 'iphone',
       'kindle', 'lcd tv', 'macbook', 'nook', 'star trek', 'star wars',
       'transformers dark of the moon'], dtype=object)

## Setup Part 2 - Add some more query sessions (omitted)

Here we duplicate the simulated queries from above, but we flip a handful of the clicks. This just fills out our data a bit more, gives a bit more data to work with.

In [4]:
random.seed(0)

def copy_query_sessions(sessions, src_query, dest_query, flip=False):
    new_sessions = sessions[sessions["query"] == src_query].copy()  
    new_sessions["draw"] = numpy.random.rand(len(new_sessions), 1)
    new_sessions.loc[new_sessions["clicked"] & (new_sessions["draw"] < 0.04), "clicked"] = False
    new_sessions["query"] = dest_query
    return pandas.concat([sessions, new_sessions.drop("draw", axis=1)])


sessions = copy_query_sessions(sessions, "transformers dark of the moon", "transformers dark of moon")
sessions = copy_query_sessions(sessions, "transformers dark of the moon", "dark of moon")
sessions = copy_query_sessions(sessions, "transformers dark of the moon", "dark of the moon")
sessions = copy_query_sessions(sessions, "headphones", "head phones")
sessions = copy_query_sessions(sessions, "lcd tv", "lcd television")
sessions = copy_query_sessions(sessions, "lcd tv", "television, lcd")
sessions = copy_query_sessions(sessions, "macbook", "apple laptop")
sessions = copy_query_sessions(sessions, "iphone", "apple iphone")
sessions = copy_query_sessions(sessions, "kindle", "amazon kindle")
sessions = copy_query_sessions(sessions, "kindle", "amazon ereader")
sessions = copy_query_sessions(sessions, "blue ray", "blueray")

sessions

Unnamed: 0,sess_id,query,rank,doc_id,clicked
1,50002,blue ray,1.0,827396513927,False
2,50002,blue ray,2.0,24543672067,False
3,50002,blue ray,3.0,719192580374,False
4,50002,blue ray,4.0,885170033412,True
5,50002,blue ray,5.0,58231300826,False
...,...,...,...,...,...
149994,55001,blueray,24.0,36725617605,False
149995,55001,blueray,25.0,22265004517,False
149996,55001,blueray,26.0,885170038875,False
149997,55001,blueray,27.0,786936817232,False


In [5]:
sessions["query"].unique()

array(['blue ray', 'bluray', 'dryer', 'headphones', 'ipad', 'iphone',
       'kindle', 'lcd tv', 'macbook', 'nook', 'star trek', 'star wars',
       'transformers dark of the moon', 'transformers dark of moon',
       'dark of moon', 'dark of the moon', 'head phones',
       'lcd television', 'television, lcd', 'apple laptop',
       'apple iphone', 'amazon kindle', 'amazon ereader', 'blueray'],
      dtype=object)

## Setup Part 3 - Our test query, `transformers dvd`, with hidden, 'true' preferences

We add a new query to our set of queries `transformers dvd` and we note the users' hidden preferences in the variables `desired_movies` as well as what they consider mediocre `meh_transformers_movies` and not at all relevant `irrelevant_transformers_products`. Each holds the UPC of the associated product.

This simulates biased sessions in the data, as if the user never actually sees (and hence never clicks) their actual desired item. If the users desired results are shown, those results get a higher probability of click. Otherwise there is a lower probability of clicks.

In [6]:
next_sess_id = sessions["sess_id"].max()

# For some reason, the sessions only capture examines on the 'dubbed' transformers movies
# ie the Japanese shows brought to an English-speaking market. But we'll see this is not what the 
# user wants (ie presentation bias). These are 'meh' mildly interesting. There are also many many
# completely irrelevant movies.

# What the user wants, but never visible! Never gets clicked!

# These are the widescreen transformers dvds of the hollywood movies
desired_transformers_movies = ["97360724240", "97360722345", "826663114164"]
# Other transformer movies
meh_transformers_movies = ["97363455349", "97361312743", "97361372389",
                           "97361312804", "97363532149", "97363560449"]
# Bunch of random merchandise
irrelevant_transformers_products = ["708056579739", "93624995012", "47875819733", "47875839090", "708056579746",
                                    "47875332911", "47875842328", "879862003524", "879862003517", "93624974918"] 


displayed_transformer_products = meh_transformers_movies + irrelevant_transformers_products

new_sessions = []
for i in range(0, 5000):
    random.shuffle(displayed_transformer_products)

    # shuffle each session
    for rank, upc in enumerate(displayed_transformer_products):
        draw = random.random()        
        clicked = ((upc in meh_transformers_movies and draw < 0.13) or
                   (upc in irrelevant_transformers_products and draw < 0.005))

        new_sessions.append({"sess_id": next_sess_id + i, 
                             "query": "transformers dvd", 
                             "rank": rank,
                             "clicked": clicked,
                             "doc_id": upc})


sessions = pandas.concat([sessions, pandas.DataFrame(new_sessions)])
sessions

Unnamed: 0,sess_id,query,rank,doc_id,clicked
1,50002,blue ray,1.0,827396513927,False
2,50002,blue ray,2.0,24543672067,False
3,50002,blue ray,3.0,719192580374,False
4,50002,blue ray,4.0,885170033412,True
5,50002,blue ray,5.0,58231300826,False
...,...,...,...,...,...
79995,65000,transformers dvd,11.0,47875842328,False
79996,65000,transformers dvd,12.0,879862003517,False
79997,65000,transformers dvd,13.0,97361372389,False
79998,65000,transformers dvd,14.0,93624995012,False


## Setup 4 - chapter 11 In One Function (omitted) 

Wrapping up Chapter 11 in a single function `generate_training_data`. 

This function computes a relevance grade out of raw clickstream data. Recall that the SDBN (Simplified Dynamic Bayesian Network) click model we learned about in chapter 11 helps overcome position bias. We also use a beta prior so that a single click doesn't count as much as an observation with hundreds.

In [7]:
#%load -s calculate_ctr,calculate_average_rank,caclulate_examine_probability,calculate_clicked_examined,calculate_grade,calculate_prior,calculate_sdbn ../ltr/sdbn_functions.py
def calculate_ctr(sessions):
    click_counts = sessions.groupby("doc_id")["clicked"].sum()
    sess_counts = sessions.groupby("doc_id")["sess_id"].nunique()
    ctrs = click_counts / sess_counts
    return ctrs.sort_values(ascending=False)

def calculate_average_rank(sessions):
    avg_rank = sessions.groupby("doc_id")["rank"].mean()
    return avg_rank.sort_values(ascending=True)

def caclulate_examine_probability(sessions):
    last_click_per_session = sessions.groupby(["clicked", "sess_id"])["rank"].max()[True]
    sessions["last_click_rank"] = last_click_per_session
    sessions["examined"] = sessions["rank"] <= sessions["last_click_rank"]
    return sessions

def calculate_clicked_examined(sessions):
    sessions = caclulate_examine_probability(sessions)
    return sessions[sessions["examined"]] \
        .groupby("doc_id")[["clicked", "examined"]].sum()

def calculate_grade(sessions):
    sessions = calculate_clicked_examined(sessions)
    sessions["grade"] = sessions["clicked"] / sessions["examined"]
    return sessions.sort_values("grade", ascending=False)

def calculate_prior(sessions, prior_grade, prior_weight):
    sessions = calculate_grade(sessions)
    sessions["prior_a"] = prior_grade * prior_weight
    sessions["prior_b"] = (1 - prior_grade) * prior_weight
    return sessions

def calculate_sdbn(sessions, prior_grade, prior_weight):
    sessions = calculate_prior(sessions, prior_grade, prior_weight)
    sessions["posterior_a"] = (sessions["prior_a"] + 
                               sessions["clicked"])
    sessions["posterior_b"] = (sessions["prior_b"] + 
                               sessions["examined"] - sessions["clicked"])
    sessions["beta_grade"] = (sessions["posterior_a"] /
      (sessions["posterior_a"] + sessions["posterior_b"]))
    return sessions.sort_values("beta_grade", ascending=False)

def generate_training_data(sessions, prior_grade=0.2, prior_weight=10):
    all_sdbn = pandas.DataFrame()
    for query in sessions["query"].unique():        
        query_sessions = sessions[sessions["query"] == query].copy().set_index("sess_id")
        query_sessions = calculate_sdbn(query_sessions, prior_grade, prior_weight)
        query_sessions["query"] = query
        all_sdbn = pandas.concat([all_sdbn, query_sessions])
    return all_sdbn[["query", "clicked", "examined", "grade", "beta_grade"]].reset_index().set_index(["query", "doc_id"])

## Listing 12.1 Generating the sdbn training data

We kickoff with the data we left off with in chapter 11.

In this listing we user our "chapter 11 in one function" `generate_training_data` to rebuild training data.

In [8]:
training_data = generate_training_data(sessions,
                                       prior_grade=0.2,
                                       prior_weight=10)

training_data

Unnamed: 0_level_0,Unnamed: 1_level_0,clicked,examined,grade,beta_grade
query,doc_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
blue ray,27242815414,42,42,1.000000,0.846154
blue ray,827396513927,1304,3359,0.388211,0.387652
blue ray,883929140855,140,506,0.276680,0.275194
blue ray,885170033412,568,2147,0.264555,0.264256
blue ray,24543672067,665,2763,0.240680,0.240534
...,...,...,...,...,...
transformers dvd,47875819733,24,1679,0.014294,0.015394
transformers dvd,708056579739,23,1659,0.013864,0.014979
transformers dvd,879862003524,23,1685,0.013650,0.014749
transformers dvd,93624974918,19,1653,0.011494,0.012628


## Chapter 10 Functions (omitted from book)

Now with the chapter 11 setup out of the way, we'll need to give Chapter 10's code a similar treatment, wrapping that LTR system into a black box.

All of the following are support functions for the chapter:

1. Convert the sdbn dataframe into individual `Judgment` objects needed for training the model from chapter 10
2. Pairwise transformation of the data
3. Normalization of the data
4. Training the model
5. Uploading the model to Solr

All of these steps are covered in Chapter 10.

In [9]:

import copy
import numpy
from ltr.judgments import judgments_to_nparray
from sklearn import svm
import json
from itertools import groupby
from ltr.log import FeatureLogger
from itertools import groupby
from ltr.judgments import judgments_writer

from ltr.judgments import Judgment

def as_judgments(training_data):
    """Turn pandas dataframe into ltr judgments objects."""        
    qid_map = {}
    judgments = []
    next_qid = 0
    for datum in training_data.reset_index().to_dict(orient="records"):       
        if datum["query"] not in qid_map:
            qid_map[datum["query"]] = next_qid
            next_qid += 1
        qid = qid_map[datum["query"]]

        judgments.append(Judgment(doc_id=datum["doc_id"],
                        keywords=datum["query"],
                        qid=qid,
                        grade=datum["beta_grade"]))
        
    return judgments

def normalize_features(logged_judgments):
    num_features = len(logged_judgments[0].features)
    means = [numpy.mean([j.features[i] for j in logged_judgments])
             for i in range(0, num_features)]    
    
    std_devs = [numpy.std([j.features[i] for j in logged_judgments])
                for i in range(0, num_features)]
    
    normed_judgments = copy.deepcopy(logged_judgments)
    for j in normed_judgments:
        for i, score in enumerate(j.features):
            j.features[i] = (score - means[i]) / std_devs[i]

    return means, std_devs, normed_judgments

def pairwise_transform(normed_judgments):        
    predictor_deltas = []
    feature_deltas = []
    for qid, grouped_judgments in groupby(normed_judgments, key=lambda j: j.qid):
        query_judgments = list(grouped_judgments)
        for judgment1 in query_judgments:
            for judgment2 in query_judgments:
                j1_features = numpy.array(judgment1.features)
                j2_features = numpy.array(judgment2.features)
                
                if judgment1.grade > judgment2.grade:
                    predictor_deltas.append(1)
                    feature_deltas.append(j1_features - j2_features)
                elif judgment1.grade < judgment2.grade:
                    predictor_deltas.append(-1)
                    feature_deltas.append(j1_features - j2_features)

    return numpy.array(feature_deltas), numpy.array(predictor_deltas)

def write_judgments(judgments, dest="retrotech_judgments.txt"):
    with judgments_writer(open(dest, "wt")) as writer:
        for judgment in judgments:
            writer.write(judgment)

as_judgments(training_data)

  from tkinter.tix import Tree


[Judgment(grade=0.8461538461538461,qid=0,keywords=blue ray,doc_id=27242815414,features=[],weight=1),
 Judgment(grade=0.38765212229148116,qid=0,keywords=blue ray,doc_id=827396513927,features=[],weight=1),
 Judgment(grade=0.2751937984496124,qid=0,keywords=blue ray,doc_id=883929140855,features=[],weight=1),
 Judgment(grade=0.26425591098748263,qid=0,keywords=blue ray,doc_id=885170033412,features=[],weight=1),
 Judgment(grade=0.24053371799495132,qid=0,keywords=blue ray,doc_id=24543672067,features=[],weight=1),
 Judgment(grade=0.2396825396825397,qid=0,keywords=blue ray,doc_id=813774010904,features=[],weight=1),
 Judgment(grade=0.23370786516853934,qid=0,keywords=blue ray,doc_id=786936817232,features=[],weight=1),
 Judgment(grade=0.2291296625222025,qid=0,keywords=blue ray,doc_id=36725617605,features=[],weight=1),
 Judgment(grade=0.21705426356589147,qid=0,keywords=blue ray,doc_id=36725608443,features=[],weight=1),
 Judgment(grade=0.21266968325791855,qid=0,keywords=blue ray,doc_id=22265052211,fe

## Also Chapter 10 - Perform a test / train split on the SDBN data (omitted)

This function is broken out from the model training. It lets us train a model on one set of data (reusing the chapter 10 training code), reserving test queries for evaluation.

In [10]:
from math import floor

def split_training_data(training_data, train_proportion=0.8):
    """Split queries in training_data into train / test split with `train` proportion going to training set."""
    queries = training_data.index.get_level_values('query').unique().copy().tolist()
    random.shuffle(queries)
    num_queries = len(queries)
    split_point = floor(num_queries * train_proportion)
    
    train_queries = queries[:split_point]
    test_queries = queries[split_point:]
    return training_data.loc[train_queries, :], training_data.loc[test_queries]

## Chapter 10 - Evaluate the model on the test set (omitted)

This function computes the model's performance on a set of test queries. The `test_data` is the control set not used to train the model. We compute the precision of these queries

In [11]:
def train_svm_model(model_name, features, logged_judgments):
    means, std_devs, normed_judgments = normalize_features(logged_judgments)
    feature_deltas, predictor_deltas = pairwise_transform(normed_judgments)

    model = svm.LinearSVC(max_iter=10000, verbose=1)
    model.fit(feature_deltas, predictor_deltas) 

    feature_names = [ftr["name"] for ftr in features]
    linear_model = ltr.generate_model(model_name, feature_names,
                                      means, std_devs, model.coef_[0])

    return linear_model

def train_and_upload_model(training_data, model_name, features, log=False):
    """Train a RankSVM model via Solr, store in Solr."""
    judgments = as_judgments(training_data)
    ltr.delete_feature_store(model_name, log=log)
    ltr.delete_model(model_name)
    ltr.upload_features(features, model_name, log=log)
    ftr_logger = FeatureLogger(engine, products_collection, feature_set=model_name,
                               id_field="upc")
            
    for qid, query_judgments in groupby(judgments, key=lambda j: j.qid):
        ftr_logger.log_for_qid(judgments=query_judgments, 
                               qid=qid, log=False)

    linear_model = train_svm_model(model_name, features, ftr_logger.logged)
    ltr.upload_model(linear_model, log=log)
    return linear_model

In [12]:
def evaluate_model(test_data, model_name, training_data, limit=10, log=False):
    queries = test_data.index.get_level_values("query").unique()
    query_results = {}
    
    for query in queries:
        response = ltr.search_with_model(model_name, query=query,
                                         limit=limit, rerank=60000, log=log)
    
        results = pandas.DataFrame(response["docs"]).reset_index()
        judgments = training_data.loc[query, :].copy().reset_index()
        judgments["doc_id"] = judgments["doc_id"].astype(str)
        if len(results) == 0:
            print(f"No Results for {query}")
            query_results[query] = 0
        else:
            graded_results = results.merge(judgments, left_on="upc",
                                           right_on="doc_id", how="left")
            graded_results[["clicked", "examined", "grade", "beta_grade"]] = graded_results[["clicked", "examined", "grade", "beta_grade"]].fillna(0)
            graded_results = graded_results.drop("doc_id", axis=1)
            if log:
                print(graded_results.drop(["index", "rank", "manufacturer", "short_description",
                                           "long_description", "examined", "grade", "clicked"], axis=1))

            query_results[query] = (graded_results["beta_grade"].sum() / limit)
    return query_results

## Listing 12.2 - model training

We wrap all the important decisions from chapter 10 in a few lines 

In [13]:
def train_and_evaluate_model(sessions, model_name, features, log=False):
    training_data = generate_training_data(sessions)
    train, test = split_training_data(training_data, 0.8)
    train_and_upload_model(train, model_name, features=features, log=log)
    evaluation = evaluate_model(test, model_name, training_data, log=log)
    return evaluation

In [14]:
random.seed(1234)
feature_set = [
    ltr.generate_query_feature(feature_name="long_description_bm25",
                               field_name="long_description"),
    ltr.generate_query_feature(feature_name="short_description_constant",
                               field_name="short_description",
                               constant_score=True)]

evaluation = train_and_evaluate_model(sessions, "ltr_model_variant_1", feature_set)
print(json.dumps(feature_set, indent=2))
print(json.dumps(evaluation, indent=2))

[LibLinear][
  {
    "name": "long_description_bm25",
    "class": "org.apache.solr.ltr.feature.SolrFeature",
    "params": {
      "q": "long_description:(${keywords})"
    },
    "store": "ltr_model_variant_1"
  },
  {
    "name": "short_description_constant",
    "class": "org.apache.solr.ltr.feature.SolrFeature",
    "params": {
      "q": "short_description:(${keywords})^=1"
    },
    "store": "ltr_model_variant_1"
  }
]
{
  "dryer": 0.03753076750950996,
  "blue ray": 0.0,
  "headphones": 0.019262295081967213,
  "dark of moon": 0.0,
  "transformers dvd": 0.0
}


## Listing 12.3

Train a model that hypothetically performs better offline called `ltr_model_variant_2`

In [15]:
random.seed(1234)

feature_set = [
    ltr.generate_fuzzy_query_feature(feature_name="name_fuzzy", 
                                     field_name="name"),
    ltr.generate_bigram_query_feature(feature_name="name_bigram",
                                      field_name="name"),
    ltr.generate_bigram_query_feature(feature_name="short_description_bigram",
                                      field_name="short_description")
]

evaluation = train_and_evaluate_model(sessions, "ltr_model_variant_2", feature_set)
print(json.dumps(evaluation, indent=2))

[LibLinear]{
  "dryer": 0.07068309073137659,
  "blue ray": 0.0,
  "headphones": 0.06540945492120899,
  "dark of moon": 0.25630929090406646,
  "transformers dvd": 0.10077083021678328
}


## Simulate a user querying, clicking, purchasing (omitted)

This function simulates a user performing a query and possibly taking an action as they scan down the results.

In [16]:
def simulate_live_user_session(query, model_name,
                               desired_probability=0.15,
                               indifferent_probability=0.03,
                               uninterested_probability=0.01,
                               quit_per_result_probability=0.2):
    """Simulates a user 'query' where purchase probability depends on if 
       products upc is in one of three sets.
       
       Users purchase a single product per session.    
       
       Users quit with `quit_per_result_probability` after scanning each rank
       
       """   
    desired_products = ["97360724240", "97360722345", "826663114164"]
    indifferent_products = ["97363455349", "97361312743", "97361372389",
                            "97361312804", "97363532149", "97363560449"]
    
    response = ltr.search_with_model(model_name, query=query, rerank=60000, limit=10)

    results = pandas.DataFrame(response["docs"]).reset_index()
    for doc in results.to_dict(orient="records"): 
        draw = random.random()
        
        if doc["upc"] in desired_products:
            if draw < desired_probability:
                return True
        elif doc["upc"] in indifferent_products:
            if draw < indifferent_probability:
                return True
        elif draw < uninterested_probability:
            return True
        if random.random() < quit_per_result_probability:
            return False
        
    return False

## Listing 12.4 - Simulated A/B test on just `transformers dvd` query

Here we simulate 1000 users being served two rankings for `transformers dvd` and based on the hidden preferences here (`wants_to_purchase` and `might_purchase`) we see which performs better with conversions.

In [17]:
def a_b_test(query, model_a, model_b):
    """Randomly assign this user to a or b"""
    draw = random.random()
    model_name = model_a if draw < 0.5 else model_b
    
    purchase_made = simulate_live_user_session(query, model_name)
    return (model_name, purchase_made)

def simulate_user_a_b_test(query, model_a, model_b, number_of_users=1000):
    purchases = {model_a: 0, model_b: 0}
    for _ in range(number_of_users): 
        model_name, purchase_made = a_b_test(query, model_a, model_b)
        if purchase_made:
            purchases[model_name] += 1
    return purchases

In [18]:
random.seed(1234)

simulate_user_a_b_test("transformers dvd",
                       "ltr_model_variant_1",
                       "ltr_model_variant_2")

{'ltr_model_variant_1': 21, 'ltr_model_variant_2': 18}

## New helper: show the features for each SDBN entry (omitted)

This function shows us the logged features of each training row for the given sdbn data for debugging.

So not just

| query   | doc      | grade
|---------|----------|---------
|transformers dvd | 1234 | 1.0

But also a recording of the matches that occured

| query           | doc      | grade    | short_desc_match  | long_desc_match |...
|-----------------|----------|----------|-------------------|-----------------|---
|transformers dvd | 1234     | 1.0      | 0.0               | 1.0             |...

In [19]:
def generate_logged_judgments(training_data, features, model_name):
    """Log features alongside training_data into a dataframe"""
    judgments = as_judgments(training_data)
    ltr.delete_feature_store(model_name)
    ltr.upload_features(features, model_name)

    ftr_logger = FeatureLogger(engine, index=products_collection,
                               feature_set=model_name, id_field="upc")

    for qid, query_judgments in groupby(judgments, key=lambda j: j.qid):
        ftr_logger.log_for_qid(judgments=query_judgments,
                               qid=qid, log=False)
        
    logged_judgments = ftr_logger.logged
    feature_data, predictors, doc_ids = judgments_to_nparray(logged_judgments)
    logged_judgments_dataframe = pandas.concat([pandas.DataFrame(predictors),
                                                pandas.DataFrame(feature_data),
                                                pandas.DataFrame(doc_ids)], 
                                                axis=1,
                                                ignore_index=True)
    
    qid_map = {j.qid: j.keywords for j in logged_judgments}
    qid_map = pandas.DataFrame(qid_map.values()).reset_index() \
                         .rename(columns={"index": "qid", 0: "query"})
    
    feature_names = [f["name"] for f in features]
    columns = {i: name for i, name in enumerate(["grade", "qid"] + feature_names + ["doc_id"])}

    logged_judgments_dataframe = logged_judgments_dataframe.rename(columns=columns)
    logged_judgments_dataframe = logged_judgments_dataframe.merge(qid_map, how="left", on="qid")
    ordered_columns = ["doc_id", "query", "grade"] + feature_names
    #logged_judgments_dataframe['grade'] = logged_judgments_dataframe['grade'] / 10.0 
    
    return logged_judgments_dataframe[ordered_columns].set_index("doc_id").sort_values("grade", ascending=False)

## Listing 12.5 - Output matches for one feature set

Another way of formulating `presentation_bias` is to look at the kinds of documents not being shown to users, so we can strategically show those to users. Below we show the value of each feature in `explore_feature_set` for each document in the sdbn judgments.

In [20]:
def get_latest_explore_features():
    return [
        ltr.generate_query_feature(feature_name="long_description_match",
                                   field_name="long_description",
                                   constant_score=True),
        ltr.generate_query_feature(feature_name="short_description_match",
                                   field_name="short_description",
                                   constant_score=True),
        ltr.generate_query_feature(feature_name="name_match",
                                   field_name="name",
                                   constant_score=True),
        ltr.generate_query_feature(feature_name="has_promotion",
                                   field_name="has_promotion",
                                   value="true",
                                   constant_score=True)]

def get_logged_transformers_judgments(sessions, features):
    training_data = generate_training_data(sessions)
    logged_judgments = generate_logged_judgments(training_data, features, "explore")
    logged_judgments = logged_judgments[logged_judgments["query"] == "transformers dvd"]
    return logged_judgments

In [21]:
explore_features = get_latest_explore_features()
logged_transformers_judgments = get_logged_transformers_judgments(sessions, explore_features)
logged_transformers_judgments

Unnamed: 0_level_0,query,grade,long_description_match,short_description_match,name_match,has_promotion
doc_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
97363560449,transformers dvd,0.347137,0.0,0.0,1.0,0.0
97361312804,transformers dvd,0.344041,0.0,0.0,1.0,0.0
97361312743,transformers dvd,0.34216,0.0,0.0,1.0,0.0
97363455349,transformers dvd,0.342065,0.0,0.0,1.0,0.0
97361372389,transformers dvd,0.323484,0.0,0.0,1.0,0.0
97363532149,transformers dvd,0.322664,0.0,0.0,1.0,0.0
879862003517,transformers dvd,0.022834,0.0,1.0,1.0,0.0
93624995012,transformers dvd,0.020202,0.0,0.0,1.0,0.0
47875842328,transformers dvd,0.01853,1.0,0.0,1.0,1.0
708056579746,transformers dvd,0.016726,1.0,0.0,1.0,0.0


## Listing 12.6 - Train Gaussian Process Regressor

We train data on just the `transformers_training_data`. 

NOTE we could also train on the full sdbn training data, and see globally what's missing. However it's often convenient to zero in on specific queries to round out their training data.

In [22]:
from sklearn.gaussian_process import GaussianProcessRegressor

def train_gpr(logged_judgments, feature_names):
    feature_data = logged_judgments[feature_names]
    grades = logged_judgments["grade"]
    gpr = GaussianProcessRegressor()
    gpr.fit(feature_data, grades)
    return gpr

In [23]:
feature_names = [f["name"] for f in explore_features]
train_gpr(logged_transformers_judgments, feature_names)

## Listing 12.7: Predict on every value

Here `gpr` predicts on every possible feature value. This lets us analyze which set of feature values to use when exploring with users.

In [24]:
def calculate_prediction_std_dev(logged_judgments, feature_names):
    index = pandas.MultiIndex.from_product([[0, 1]] * 4, names=feature_names)
    with_prediction = pandas.DataFrame(index=index).reset_index()

    gpr = train_gpr(logged_judgments, feature_names)
    predictions_with_std = gpr.predict(
        with_prediction[feature_names],
        return_std=True)
    with_prediction["predicted_grade"] = predictions_with_std[0]
    with_prediction["predicted_stddev"] = predictions_with_std[1]
   
    return  with_prediction.sort_values("predicted_stddev", ascending=True)

In [25]:
calculate_prediction_std_dev(logged_transformers_judgments, feature_names)

Unnamed: 0,long_description_match,short_description_match,name_match,has_promotion,predicted_grade,predicted_stddev
2,0,0,1,0,0.256798,4e-06
10,1,0,1,0,0.014674,5e-06
14,1,1,1,0,0.014864,7e-06
6,0,1,1,0,0.022834,1e-05
11,1,0,1,1,0.01853,1e-05
3,0,0,1,1,0.161596,0.632121
15,1,1,1,1,0.014856,0.632121
7,0,1,1,1,0.017392,0.739305
0,0,0,0,0,0.155756,0.79506
8,1,0,0,0,0.0089,0.79506


## Listing 12.8 - Calculate Expected Improvement


We use [Expected Improvement](https://distill.pub/2020/bayesian-optimization/) scoring to select candidates for exploration within the `transformers dvd` query.

In [26]:
from scipy.stats import norm

def calculate_expected_improvement(logged_judgments, feature_names, theta=0.6):
    data = calculate_prediction_std_dev(logged_judgments, feature_names)
    data["opportunity"] = (data["predicted_grade"] - logged_judgments["grade"].mean() - theta)

    data["prob_of_improvement"] = (
        norm.cdf(data["opportunity"] /
                 data["predicted_stddev"]))

    data["expected_improvement"] = (
        data["opportunity"] * data["prob_of_improvement"] + 
        data["predicted_stddev"] *
        norm.pdf(data["opportunity"] /
                 data["predicted_stddev"]))
    
    return data.sort_values("expected_improvement", ascending=False)

In [27]:
calculate_expected_improvement(logged_transformers_judgments, feature_names)

Unnamed: 0,long_description_match,short_description_match,name_match,has_promotion,predicted_grade,predicted_stddev,opportunity,prob_of_improvement,expected_improvement
1,0,0,0,1,0.098013,0.882676,-0.638497,0.234728,0.121201
5,0,1,0,1,0.010549,0.912794,-0.725962,0.213214,0.110633
0,0,0,0,0,0.155756,0.79506,-0.580755,0.232556,0.107853
13,1,1,0,1,0.009011,0.882676,-0.7275,0.204914,0.101653
4,0,1,0,0,0.013849,0.79506,-0.722661,0.181691,0.078549
9,1,0,0,1,0.011239,0.79506,-0.725272,0.180826,0.078076
12,1,1,0,0,0.009016,0.79506,-0.727495,0.180091,0.077675
8,1,0,0,0,0.0089,0.79506,-0.72761,0.180053,0.077654
7,0,1,1,1,0.017392,0.739305,-0.719118,0.165353,0.064866
3,0,0,1,1,0.161596,0.632121,-0.574914,0.181543,0.062387


## Create a query to fetch `explore` docs (omitted)

Based on the selected features from the GaussianProcessRegressor, we create a query to fetch a doc that contains those features.

In [28]:
def search_for_explore_candidate(explore_vector, query=""):
    feature_config = {
        "long_description_match": {"field": "long_description", "value": query},
        "short_description_match": {"field": "short_description", "value": query},
        "name_match": {"field": "name", "value": query},
        "long_description_bm25": {"field": "long_description", "value": query},
        "manufacturer_match": {"field": "manufacturer", "value": query},
        "has_promotion": {"field": "has_promotion", "value": "true"}
    }
    explore_candidates = ltr.get_explore_candidate(query, explore_vector, feature_config)
    if explore_candidates:
        return explore_candidates[0]

## Listing 12.9 - Find document to explore from Solr

Here we fetch a document that matches the properties of something missing from our training set to display to the user

In [29]:
def explore(query, logged_judgments, features):
    """Explore according to the provided explore vector, select
       a random doc from that group."""
    feature_names = [f["name"] for f in features]
    prediction_data = calculate_expected_improvement(logged_judgments, feature_names)
    explore_vector = prediction_data.head().iloc[0][feature_names]
    return search_for_explore_candidate(explore_vector, query)

In [30]:
random.seed(0)

explore_features = get_latest_explore_features()
logged_judgments = get_logged_transformers_judgments(sessions, explore_features)
explore("transformers dvd", logged_judgments, explore_features)

{'upc': '826663114164',
 'name': 'Transformers: The Complete Series [25th Anniversary Matrix of Leadership Edition] [16 Discs] - DVD',
 'manufacturer': ' ',
 'short_description': ' ',
 'long_description': ' ',
 'has_promotion': True}

## New heavily clicked doc is promoted!

```
{"upc": "826663114164",
 "name": "Transformers: The Complete Series [25th Anniversary Matrix of Leadership Edition] [16 Discs] - DVD",
 "manufacturer": " ",
 "short_description": " ",
 "long_description": " ",
 "has_promotion": True}
```

## Simulate new sessions with the new data

We simulate new sessions, if the upc is in `might_purchase` or `wants_to_purchase`, we set it to 'clicked' with a given probability.

In [31]:
def generate_simulated_exploration_sessions(query, sessions,
                                            logged_judgments, features, n=500):
    """Conducts N (500) searches with the query and returns session data with
       simulated the simulated user behavior"""
    wants_to_purchase = [97360724240, 97360722345, 826663114164, 97360810042, 93624956037]
    might_purchase = [97363455349, 97361312743, 97361372389,
                      97361312804, 97363532149, 97363560449]
    explore_on_rank = 2.0
    with_explore_sessions = sessions.copy()
    query_sessions = with_explore_sessions[with_explore_sessions["query"] == query]
    for i in range(0, n):
        explore_doc = explore(query, logged_judgments, features)
        if explore_doc:
            explore_upc = int(explore_doc["upc"])
            sess_ids = list(set(query_sessions["sess_id"].tolist()))
            random.shuffle(sess_ids)
            new_session = query_sessions[query_sessions["sess_id"] == sess_ids[0]].copy()
            new_session["sess_id"] = 100000 + i
            new_session.loc[new_session["rank"] == explore_on_rank, "doc_id"] = explore_upc
            draw = random.random()
            click = ((explore_upc in wants_to_purchase and draw < 0.8) or
                     (explore_upc in might_purchase and draw < 0.5) or
                     draw < 0.01)
            if click:
                print(f"Search {i} resulted in a click on {explore_upc}")
            new_session.loc[new_session["rank"] == explore_on_rank, "clicked"] = click
            
            with_explore_sessions = pandas.concat([with_explore_sessions, new_session])
        else:
            print(f"Search {i} no docs")
            
    return with_explore_sessions

## Listing 12.10 - Update judgments from new sessions

Have we added any new docs that appear to be getting more clicks?

In [32]:
random.seed(1234)

query = "transformers dvd"
sessions_with_exploration = generate_simulated_exploration_sessions(
    query, sessions, logged_transformers_judgments, explore_features)
training_data_with_exploration = generate_training_data(sessions_with_exploration)
training_data_with_exploration.loc["transformers dvd"]

Search 2 resulted in a click on 97360722345
Search 3 resulted in a click on 97360722345
Search 4 resulted in a click on 97360724240
Search 5 resulted in a click on 97360724240
Search 10 resulted in a click on 97360724240
Search 15 resulted in a click on 826663114164
Search 21 resulted in a click on 97360724240
Search 27 resulted in a click on 97360722345
Search 32 resulted in a click on 97360722345
Search 36 resulted in a click on 74108007469
Search 41 resulted in a click on 826663114164
Search 49 resulted in a click on 97360724240
Search 51 resulted in a click on 826663114164
Search 55 resulted in a click on 97360722345
Search 56 resulted in a click on 97360724240
Search 60 resulted in a click on 97360724240
Search 61 resulted in a click on 97360722345
Search 62 resulted in a click on 97360724240
Search 66 resulted in a click on 97360722345
Search 71 resulted in a click on 826663114164
Search 75 resulted in a click on 826663114164
Search 92 resulted in a click on 97360724240
Search 95

Unnamed: 0_level_0,clicked,examined,grade,beta_grade
doc_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
826663114164,41,41,1.0,0.843137
97360724240,38,43,0.883721,0.754717
97360722345,23,27,0.851852,0.675676
97363455349,731,2119,0.344974,0.344293
97361312804,726,2105,0.344893,0.344208
97363560449,733,2129,0.344293,0.343619
97361312743,708,2075,0.341205,0.340528
97363532149,692,2096,0.330153,0.329535
97361372389,673,2086,0.322627,0.322042
400192926087,1,21,0.047619,0.096774


## Listing 12.11 - Rebuild model using updated judgments

After showing the new document to users, we can rebuild the model using judgments that cover this feature blindspot.

In [33]:
random.seed(1234)

promotion_feature_set = [
    ltr.generate_fuzzy_query_feature(feature_name="name_fuzzy",
                                     field_name="name"),
    ltr.generate_bigram_query_feature(feature_name="name_bigram",
                                      field_name="name"),
    ltr.generate_bigram_query_feature(feature_name="short_description_bigram",
                                      field_name="short_description"),
    ltr.generate_query_feature(feature_name="has_promotion",
                               field_name="has_promotion",
                               value="true",
                               constant_score=True)]

print(json.dumps(promotion_feature_set, indent=2))
train_and_evaluate_model(sessions_with_exploration, "ltr_model_variant_3",
                         promotion_feature_set)

[
  {
    "name": "name_fuzzy",
    "class": "org.apache.solr.ltr.feature.SolrFeature",
    "params": {
      "q": "name_ngram:(${keywords})"
    }
  },
  {
    "name": "name_bigram",
    "class": "org.apache.solr.ltr.feature.SolrFeature",
    "params": {
      "q": "{!edismax qf=name pf2=name}(${keywords})"
    }
  },
  {
    "name": "short_description_bigram",
    "class": "org.apache.solr.ltr.feature.SolrFeature",
    "params": {
      "q": "{!edismax qf=short_description pf2=short_description}(${keywords})"
    }
  },
  {
    "name": "has_promotion",
    "class": "org.apache.solr.ltr.feature.SolrFeature",
    "params": {
      "q": "has_promotion:true^=1"
    }
  }
]
[LibLinear]

{'dryer': 0.12737002598513025,
 'blue ray': 0.08461538461538462,
 'headphones': 0.12110565745285455,
 'dark of moon': 0.14680203327814995,
 'transformers dvd': 0.2584434491113084}

In [34]:
ltr.search_with_model("ltr_model_variant_3",
                      query="transformers dvd",
                      rerank=60000,
                      limit=5)["docs"]

[{'upc': '97360722345',
  'name': 'Transformers/Transformers: Revenge of the Fallen: Two-Movie Mega Collection [2 Discs] - Widescreen - DVD',
  'manufacturer': ' ',
  'short_description': ' ',
  'long_description': ' ',
  'rank': 0},
 {'upc': '97360724240',
  'name': 'Transformers: Revenge of the Fallen - Widescreen - DVD',
  'manufacturer': ' ',
  'short_description': ' ',
  'long_description': ' ',
  'rank': 1},
 {'upc': '400192926087',
  'name': 'Transformers: Dark of the Moon - Original Soundtrack - CD',
  'manufacturer': 'Reprise',
  'short_description': ' ',
  'long_description': ' ',
  'rank': 2},
 {'upc': '826663114164',
  'name': 'Transformers: The Complete Series [25th Anniversary Matrix of Leadership Edition] [16 Discs] - DVD',
  'manufacturer': ' ',
  'short_description': ' ',
  'long_description': ' ',
  'rank': 3},
 {'upc': '47875842328',
  'name': 'Transformers: Dark of the Moon Stealth Force Edition - Nintendo Wii',
  'manufacturer': 'Activision',
  'short_description':

## Listing 12.12 - Rerun A/B test on new `promotion` model

In [35]:
random.seed(1234)

simulate_user_a_b_test(query="transformers dvd",
                       model_a="ltr_model_variant_1",
                       model_b="ltr_model_variant_3",
                       number_of_users=1000) 

{'ltr_model_variant_1': 21, 'ltr_model_variant_3': 145}

## Listing 12.13 - Fully Automated LTR Loop

These lines expand Listing 12.13 from the book (the book content is a truncated form of what's below). You could put this in a loop and constantly try new features to try to get closer at a generalized ranking solution of what users actually want.

In [None]:
random.seed(1234)
ltr.delete_feature_store("aips_feature_store")

def get_exploit_features():
    return [
        ltr.generate_fuzzy_query_feature("name_fuzzy", "name"),
        ltr.generate_query_feature("long_description_bm25", "long_description"),
        ltr.generate_query_feature("short_description_match", "short_description", True)]

def gather_latest_sessions(query, sessions, model_name, features):
    """For the sake of the examples, returns a static list of session data.
       In a production environment, this would the most up to date user interactions"""
    training_data = generate_training_data(sessions)
    logged_judgments = generate_logged_judgments(training_data, features, model_name)
    latest_sessions = generate_simulated_exploration_sessions(query,
                                                              sessions,
                                                              logged_judgments,
                                                              features)
    return latest_sessions

def is_improvement(evaluation1, evaluation2):
    #Model comparison is stubbed out
    return True
    
def wait_for_more_sessions(t):
    time.sleep(t)

def ltr_retraining_loop(latest_sessions, iterations=sys.maxsize,
                        retrain_frequency=60 * 60 * 24):
    for i in range(0, iterations):
        training_data = generate_training_data(latest_sessions)
        train, test = split_training_data(training_data)
        if i == 0:
            exploit_features = get_exploit_features()
            train_and_upload_model(train,
                                   "exploit",
                                   exploit_features)
        else:
            previous_explore_model_name = f"explore_variant_{i-1}"
            exploit_model_evaluation = evaluate_model(test, "exploit", training_data, log=True)
            explore_model_evaluation = evaluate_model(test, previous_explore_model_name, training_data, log=True)
            print(f"Exploit evaluation: {exploit_model_evaluation}")
            print(f"Explore evaluation: {explore_model_evaluation}")
            if is_improvement(explore_model_evaluation, exploit_model_evaluation):
                print("Promoting previous explore model")
                train_and_upload_model(train,
                                      "exploit",
                                       explore_features)
                
        explore_features = get_latest_explore_features()
        train_and_upload_model(train,
                               f"explore_variant_{i}",
                               explore_features)
        
        wait_for_more_sessions(retrain_frequency)
        latest_sessions = gather_latest_sessions("transformers dvd", latest_sessions,
                                                 f"explore_variant_{i}", explore_features)

ltr_retraining_loop(sessions, 5, 0)

Up next: [Chapter 13: Semantic Search with Dense Vectors](../ch13/1.setting-up-the-outdoors-dataset.ipynb)