# A/B Testing Simulation to Active Learning

In this notebook, users have a hidden preference for a single query. We use this to explore A/B testing to see whether a given LTR model actually gives the users what they want.

Then we ask, much like in real life, how can we learn what the user _actually_ wants? We employe active learning to try to escape the 'echo chamber' of presentation bias we learned about at the end of chapter 11. After all users can't click on results that never show up in their search results!

## 🚨 We're putting it all together in this chapter

As this chapter puts together everything from chapters 10 and 11, much of the setup code below wraps up a lot of chapter 11 and 10 into a 'single function' so we can very easily run through the steps in 'one liners'

### Getting training data (Ch 11)

Chapter 11 is all about turning raw clickstream data into search training data (aka judgments). This involves overcoming biases in how users percieve search. But here we put that in one function call `sessions_to_sdbn`.

### Train a model (Ch 10)

Chapter 10 is about training an LTR model, including interacting with Solr to extract features, how a ranking model works, how to train a model, and how to perform a good test/train split for search. But here we similarly wrap that up into a handful of function calls, `test_train_split`, `ranksvm_ltr`, and `eval_model`.

*long story short, if you see a reference to chapter 10 and 11, it's probably omited from chapter 12* - don't expect it to be covered in chapter 12 extensively.


## Setup - gather some sessions (omitted)

To get started, we first load a set of simulated search sessions for all queries. 

Much of this setup is omitted from the chapter. This first part is just loading and synthesizing a bunch of clickstream sessions, like we used in chapter 11.

In [None]:
import numpy as np
import pandas as pd
import random; random.seed(0)
import glob

import requests
import sys
sys.path.append('..')
from aips import *
from ltr.client.solr_client import SolrClient
engine = get_engine()
client = SolrClient(solr_base=SOLR_URL)

In [None]:
def all_sessions():
    sessions = pd.concat([pd.read_csv(f, compression='gzip')
                          for f in glob.glob('ch12/retrotech/*_sessions.gz')])
    return sessions.rename(columns={'clicked_doc_id': 'doc_id'})

sessions = all_sessions()
sessions

In [None]:
sessions["query"].unique()

## Setup Part 2 - Add some more query sessions (omitted)

Here we duplicate the simulated queries from above, but we flip a handful of the clicks. This just fills out our data a bit more, gives a bit more data to work with.

In [None]:
random.seed(0)

def copy_query_sessions(sessions, src_query, dest_query, flip=False):
    new_sessions = sessions[sessions["query"] == src_query].copy()  
    new_sessions["draw"] = np.random.rand(len(new_sessions), 1)
    new_sessions.loc[new_sessions["clicked"] & (new_sessions["draw"] < 0.04), "clicked"] = False
    new_sessions["query"] = dest_query
    return pd.concat([sessions, new_sessions.drop("draw", axis=1)])


sessions = copy_query_sessions(sessions, "transformers dark of the moon", "transformers dark of moon")
sessions = copy_query_sessions(sessions, "transformers dark of the moon", "dark of moon")
sessions = copy_query_sessions(sessions, "transformers dark of the moon", "dark of the moon")
sessions = copy_query_sessions(sessions, "headphones", "head phones")
sessions = copy_query_sessions(sessions, "lcd tv", "lcd television")
sessions = copy_query_sessions(sessions, "lcd tv", "television, lcd")
sessions = copy_query_sessions(sessions, "macbook", "apple laptop")
sessions = copy_query_sessions(sessions, "iphone", "apple iphone")
sessions = copy_query_sessions(sessions, "kindle", "amazon kindle")
sessions = copy_query_sessions(sessions, "kindle", "amazon ereader")
sessions = copy_query_sessions(sessions, "blue ray", "blueray")

sessions

In [None]:
sessions["query"].unique()

## Setup Part 3 - Our test query, `transformers dvd`, with hidden, 'true' preferences

We add a new query to our set of queries `transformers dvd` and we note the users' hidden preferences in the variables `desired_movies` as well as what they consider mediocre `meh_transformers_movies` and not at all relevant `irrelevant_transformers_products`. Each holds the UPC of the associated product.

This simulates biased sessions in the data, as if the user never actually sees (and hence never clicks) their actual desired item. If the users desired results are shown, those results get a higher probability of click. Otherwise there is a lower probability of clicks.

In [None]:
next_sess_id = sessions["sess_id"].max()

# For some reason, the sessions only capture examines on the "dubbed" transformers movies
# ie the Japanese shows brought to an English-speaking market. But we'll see this is not what the 
# user wants (ie presentation bias). These are "meh" mildly interesting. There are also many many
# completely irrelevant movies.

# What the user wants, but never visible! Never gets clicked!
# These are the widescreen transformers dvds of the hollywood movies
desired_transformers_movies = ["97360724240", "97360722345", "97368920347"] 

# Bunch of random merchandise
irrelevant_transformers_products = ["708056579739", "93624995012", "47875819733", "47875839090", "708056579746",
                                     "47875332911", "47875842328", "879862003524", "879862003517", "93624974918"] 

# Other transformer movies
meh_transformers_movies = ["97363455349", "97361312743", "97361372389", "97361312804", "97363532149", "97363560449"]

displayed_transformer_products = meh_transformers_movies + irrelevant_transformers_products

new_sessions = []
for i in range(0,5000):
    random.shuffle(displayed_transformer_products)

    # shuffle each session
    for rank, upc in enumerate(displayed_transformer_products):
        draw = random.random()        
        clicked = upc in meh_transformers_movies and draw < 0.13 or \
                  upc in irrelevant_transformers_products and draw < 0.005 or \
                  upc in desired_transformers_movies and draw < 0.65 \

        new_sessions.append({"sess_id": next_sess_id + i, 
                             "query": "transformers dvd", 
                             "rank": rank,
                             "clicked": clicked,
                             "doc_id": upc})


sessions = pd.concat([sessions, pd.DataFrame(new_sessions)])
sessions

## Setup 4 - chapter 11 In One Function (omitted) 

Wrapping up Chapter 11 in a single function `sessions_to_sdbn`. 

This function computes a relevance grade out of raw clickstream data. Recall that the SDBN (Simplified Dynamic Bayesian Network) click model we learned about in chapter 11 helps overcome position bias. We also use a beta prior so that a single click doesn't count as much as an observation with hundreds.

In [None]:
def sessions_to_sdbn(sessions, prior_weight=10, prior_grade=0.2) -> pd.DataFrame:
    """ Compute SDBN of the provided query as a dataframe.
        Where we left off at end of 'overcoming confidence bias' 
        """
    all_sdbn = pd.DataFrame()
    for query in sessions["query"].unique():
        sdbn_sessions = sessions[sessions["query"] == query].copy().set_index("sess_id")

        last_click_per_session = sdbn_sessions.groupby(["clicked", "sess_id"])["rank"].max()[True]

        sdbn_sessions["last_click_rank"] = last_click_per_session
        sdbn_sessions["examined"] = sdbn_sessions["rank"] <= sdbn_sessions["last_click_rank"]

        sdbn = sdbn_sessions[sdbn_sessions["examined"]].groupby("doc_id")[["clicked", "examined"]].sum()
        sdbn["grade"] = sdbn["clicked"] / sdbn["examined"]
        sdbn["query"] = query

        sdbn = sdbn.sort_values("grade", ascending=False)

        sdbn["prior_a"] = prior_grade*prior_weight
        sdbn["prior_b"] = (1-prior_grade)*prior_weight

        sdbn["posterior_a"] = sdbn["prior_a"] +  sdbn["clicked"]
        sdbn["posterior_b"] = sdbn["prior_b"] + (sdbn["examined"] - sdbn["clicked"])

        sdbn["beta_grade"] = sdbn["posterior_a"] / (sdbn["posterior_a"] + sdbn["posterior_b"])

        sdbn.sort_values("beta_grade", ascending=False)
        all_sdbn = pd.concat([all_sdbn, sdbn])
    return all_sdbn[["query", "clicked", "examined", "grade", "beta_grade"]].reset_index().set_index(["query", "doc_id"])



## Listing 12.1 Use Convert Raw Sessions to SDBN

We kickoff with the data we left off with in chapter 11.

In this listing we user our "chapter 11 in one function" `sessions_to_sdbn` to rebuild training data.

In [None]:
sdbn = sessions_to_sdbn(sessions,
                        prior_weight=10,
                        prior_grade=0.2)
sdbn

## Chapter 10 Functions (omitted from book)

Now with the chapter 11 setup out of the way, we'll need to give Chapter 10's code a similar treatment, wrapping that LTR system into a black box.

All of the following are support functions for the chapter:

1. Convert the sdbn dataframe into individual `Judgment` objects needed for training the model from chapter 10
2. Pairwise transformation of the data
3. Normalization of the data
4. Training the model
5. Uploading the model to Solr

All of these steps are covered in Chapter 10.

In [None]:
import requests
import numpy as np
from ltr.judgments import judgments_from_file, judgments_to_nparray
from sklearn import svm
import json
import math
from itertools import groupby
from ltr.log import FeatureLogger
from ltr.judgments import judgments_open
from itertools import groupby
from ltr import download
from ltr.judgments import judgments_writer

from ltr.judgments import Judgment

def sdbn_to_judgments(sdbn):
    """Turn pandas dataframe into ltr judgments objects."""
    judgments = []
    queries = {}
    next_qid = 0
    for row_dict in sdbn.reset_index().to_dict(orient="records"):
        # Round grade to 10ths, Map 0.3 -> 3, etc
        grade = round(row_dict['beta_grade'], 1) * 10
        qid = -1
        if row_dict['query'] in queries:
            qid = queries[row_dict['query']]
        else:
            queries[row_dict['query']] = next_qid
            qid = next_qid
            next_qid += 1
        assert qid != -1
        
        judgments.append(Judgment(doc_id=row_dict['doc_id'],
                                  keywords=row_dict['query'],
                                  qid=qid,
                                  grade=int(grade))
                        )
    return judgments


sdbn_to_judgments(sdbn)


def write_judgments(judgments, dest='retrotech_judgments.txt'):
    with judgments_writer(open(dest, 'wt')) as writer:
        for judgment in judgments:
            writer.write(judgment)
            
write_judgments(sdbn_to_judgments(sdbn))
!cat retrotech_judgments.txt


def normalize_features(logged_judgments):
    all_features = []
    means = [0] * len(logged_judgments[0].features)
    for judgment in logged_judgments:
        for idx, f in enumerate(judgment.features):
            means[idx] += f
        all_features.append(judgment.features)
    
    for i in range(len(means)):
        means[i] /= len(logged_judgments)
      
    std_devs = [0.0] * len(logged_judgments[0].features)
    for judgment in logged_judgments:
        for idx, f in enumerate(judgment.features):
            std_devs[idx] += (f - means[idx])**2
            
    for i in range(len(std_devs)):
        std_devs[i] /= len(logged_judgments)
        std_devs[i] = math.sqrt(std_devs[i])
        
    # Normalize!
    normed_judgments = []
    for judgment in logged_judgments:
        normed_features = [0.0] * len(judgment.features)
        for idx, f in enumerate(judgment.features):
            normed = 0.0
            if std_devs[idx] > 0: 
                normed = (f - means[idx]) / std_devs[idx]
            normed_features[idx] = normed
        normed_judgment=Judgment(qid=judgment.qid,
                                 keywords=judgment.keywords,
                                 doc_id=judgment.doc_id,
                                 grade=judgment.grade,
                                 features=normed_features)
        normed_judgment.old_features=judgment.features
        normed_judgments.append(normed_judgment)

    return means, std_devs, normed_judgments


def pairwise_transform(normed_judgments, weigh_difference = True):
        
    predictor_deltas = []
    feature_deltas = []
    
    # For each query's judgments
    for qid, query_judgments in groupby(normed_judgments, key=lambda j: j.qid):

        # Annoying issue consuming python iterators, we ensure we have two
        # full copies of each query's judgments
        query_judgments_copy_1 = list(query_judgments) 
        query_judgments_copy_2 = list(query_judgments_copy_1)

        # Examine every judgment combo for this query, 
        # if they're different, store the pairwise difference:
        # +1 if judgment1 more relevant
        # -1 if judgment2 more relevant
        for judgment1 in query_judgments_copy_1:
            for judgment2 in query_judgments_copy_2:
                
                j1_features=np.array(judgment1.features)
                j2_features=np.array(judgment2.features)
                
                if judgment1.grade > judgment2.grade:
                    diff = judgment1.grade - judgment2.grade if weigh_difference else 1.0
                    predictor_deltas.append(+1)
                    feature_deltas.append(diff * (j1_features-j2_features))
                elif judgment1.grade < judgment2.grade:
                    diff = judgment2.grade - judgment1.grade if weigh_difference else 1.0
                    predictor_deltas.append(-1)
                    feature_deltas.append(diff * (j1_features-j2_features))

    # For training purposes, we return these as numpy arrays
    return np.array(feature_deltas), np.array(predictor_deltas)

def upload_model(model, model_name, means, std_devs, feature_set):
    linear_model = {
      "store": "aips_feature_store",
      "class": "org.apache.solr.ltr.model.LinearModel",
      "name": model_name,
      "features": [
      ],
      "params": {
          "weights": {
          }
      }
    }

    ftr_model = {}
    ftr_names = [ftr['name'] for ftr in feature_set]
    for idx, ftr_name in enumerate(ftr_names):
        config = {
            "name": ftr_name,
            "norm": {
                "class": "org.apache.solr.ltr.norm.StandardNormalizer",
                "params": {
                    "avg": str(means[idx]),
                    "std": str(std_devs[idx])
                }
            }
        }
        linear_model['features'].append(config)
        linear_model['params']['weights'][ftr_name] =  model.coef_[0][idx] 

    # Delete old model
    resp = requests.delete(f"{SOLR_URL}/products/schema/model-store/{model_name}")

    # Upload the model
    resp = requests.put(f"{SOLR_URL}/products/schema/model-store", json=linear_model)
    resp.text
    requests.get(f"{SOLR_URL}/admin/collections?action=RELOAD&name=products&wt=xml")


    
## TODO - can't easily to test/train split on these few queries
##   make more queries?

def ranksvm_ltr(sdbn, model_name, feature_set):
    """Train a RankSVM model via Solr, store in Solr."""
    judgments = sdbn_to_judgments(sdbn)
    judgments_path = 'retrotech_judgments.txt'
    write_judgments(judgments, judgments_path)
    
    # For more on this code, review Chapter 10
    requests.delete(f"{SOLR_URL}/products/schema/feature-store/aips_feature_store")
    
    resp = requests.put(f"{SOLR_URL}/products/schema/feature-store",
                    json=feature_set)

    ftr_logger=FeatureLogger(client, index='products', feature_set="aips_feature_store", id_field='upc')

    with judgments_open(judgments_path) as judgment_list:
        for qid, query_judgments in groupby(judgments, key=lambda j: j.qid):
            ftr_logger.log_for_qid(judgments=query_judgments, 
                                   qid=qid,
                                   keywords=judgment_list.keywords(qid))

    logged_judgments = ftr_logger.logged
    means, std_devs, normed_judgments = normalize_features(logged_judgments)
    feature_deltas, predictor_deltas = pairwise_transform(normed_judgments)

    model = svm.LinearSVC(max_iter=10000, verbose=1)
    model.fit(feature_deltas, predictor_deltas)  
    upload_model(model, model_name, means, std_devs, feature_set)


## Also Chapter 10 - Perform a test / train split on the SDBN data (omitted)

This function is broken out from the model training. It lets us train a model on one set of data (reusing the chapter 10 training code), reserving test queries for evaluation.

In [None]:
from math import floor

def test_train_split(sdbn, train):
    """Split queries in sdbn into train / test split with `train` proportion going to training set."""
    queries = sdbn.index.get_level_values('query').unique().copy().tolist()
    random.shuffle(queries)
    num_queries = len(queries)
    split_point = floor(num_queries * train)
    
    train_queries = queries[:split_point]
    test_queries = queries[split_point:]
    return sdbn.loc[train_queries, :], sdbn.loc[test_queries]


## Chapter 10 - Search Code (omitted)

Also from Chapter 10, a simple function to search using the LTR model and return a list of search results.

In [None]:
def search_with_model(query, model_name, at=10, log=False):
    """ Search using test_model LTR model (see rq to and qf params below). """
    fuzzy_kws = "~" + ' ~'.join(query.split())
    squeezed_kws = "".join(query.split())
    
    rq = \
        "{!ltr reRankDocs=60000 reRankWeight=10.0 model=" + model_name \
        + " efi.fuzzy_keywords=\"" + fuzzy_kws + "\" " \
        + "efi.squeezed_keywords=\"" + squeezed_kws +"\" " \
        + "efi.keywords=\"" + query + "\"}"

    request = {
            "fields": ["upc", "name", "manufacturer", "score"],
            "limit": at,
            "params": {
              "rq": rq,
              "qf": "name name_ngram upc manufacturer shortDescription longDescription",
              "defType": "edismax",
              "q": query
            }
        }
    
    if log:
        print(request)

    resp = requests.post(f"{SOLR_URL}/products/select", 
                                   json=request).json()
        
    if log:
        print(resp)
        
    search_results = resp['response']['docs']

    for rank, result in enumerate(search_results):
        result['rank'] = rank
        
    return search_results

def search_and_grade(query, model_name, sdbn, desired=[]):
    results = search_with_model(query, model_name, at=10)
    results = pd.DataFrame(results)
    results['desired'] = False
    for upc in desired:
        results.loc[results['upc'] == upc, 'desired'] = True
        
    sdbn_query = sdbn.loc[query].copy().reset_index()
    return results.merge(sdbn_query, left_on='upc', right_on='doc_id', how='left')

## Chapter 10 - Evaluate the model on the test set (omitted)

This function computes the model's performance on a set of test queries. The model was not trained on the queries in `test`. We compute the precision of these queries

In [None]:
def eval_model(test, model_name, sdbn, at=10):
    queries = test.index.get_level_values("query").unique()
    
    query_results = {}
    
    for query in queries:
        search_results = search_with_model(query, model_name, at=at)

        results = pd.DataFrame(search_results).reset_index()
        judgments = sdbn.loc[query, :].copy().reset_index()
        judgments["doc_id"] = judgments["doc_id"].astype(str)
        if len(results) == 0:
            print(f"No Results for {query}")
            query_results[query] = 0
        else:
            graded_results = results.merge(judgments, left_on="upc", right_on="doc_id", how="left")
            print(graded_results)
            graded_results[["clicked", "examined", "grade", "beta_grade"]] = graded_results[["clicked", "examined", "grade", "beta_grade"]].fillna(0)
            grade_results = graded_results.drop("doc_id", axis=1)

            query_results[query] = (graded_results["beta_grade"].sum() / at)
    return query_results

## Listing 12.2 - model training

We wrap all the important decisions from chapter 10 in a few lines 

In [None]:
random.seed(1234)

feature_set = [
{
  "name": "long_description_bm25",
  "store": "aips_feature_store",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q" : "longDescription:(${keywords})"}
},
{
  "name": "short_description_constant",
  "store": "aips_feature_store",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "shortDescription:(${keywords})^=1"}
}]

train, test = test_train_split(sdbn, train=0.8)
ranksvm_ltr(train, "click_model_basic", feature_set=feature_set)
eval_model(test, "click_model_basic", sdbn=sdbn)

In [None]:
# # What the user wants, but never visible! Never gets clicked!
# These are the widescreen transformers dvds of the hollywood movies
desired_movies = ["97360724240", "97360722345", "97368920347"] 
result = search_and_grade('transformers dvd', "click_model_basic", sdbn, desired_movies)
upcs1 = result['upc']
result

## Listing 12.3

Train a model that performs better offline called `test2`

In [None]:
random.seed(1234)

feature_set_improved = [
{
  "name": "name_fuzzy",
  "store": "aips_feature_store",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q" : "name_ngram:(${keywords})"}
},
{
  "name": "name_pf2",
  "store": "aips_feature_store",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "{!edismax qf=name name pf2=name}(${keywords})"}
},
{
  "name": "shortDescription_pf2",
  "store": "aips_feature_store",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": { 
    "q": "{!edismax qf=shortDescription pf2=shortDescription}(${keywords})"
  }
}]

sdbn = sessions_to_sdbn(sessions) # chapter 11: generate training data

train, test = test_train_split(sdbn, train=0.8)
ranksvm_ltr(train, "click_model_improved", feature_set_improved) # chapter 10: train the model -> the 'LTR engine'
eval_model(test, "click_model_improved", sdbn)

## Simulate a user querying, clicking, purchasing (omitted)

This function simulates a user performing a query and possibly taking an action as they scan down the results.

In [None]:
def simulate_user_purchase(query, model_name, desired_products, indifferent_products,
                           desired_probability=0.15,
                           indifferent_probability=0.03,
                           uninterested_probability=0.01,
                           quit_per_result_probability=0.2):
    """Simulates a user 'query' where purchase probability depends on if 
       products upc is in one of three sets.
       
       Users purchase a single product per session.    
       
       Users quit with `quit_per_rank_prod` after scanning each rank
       
       """   
    search_results = search(query, model_name, at=10)

    results = pd.DataFrame(search_results).reset_index()
    for doc in results.to_dict(orient="records"): 
        draw = random.random()
        
        if doc["upc"] in desired_products:
            if draw < desired_probability:
                return True
        elif doc["upc"] in indifferent_products:
            if draw < indifferent_probability:
                return True
        elif draw < uninterested_probability:
            return True
        if random.random() < quit_per_result_probability:
            return False
        
    return False


## Listing 12.4 - Simulated A/B test on just `transformers dvd` query

Here we pretend 1000 users were served two rankings for `transformers dvd` and based on the hidden preferences here (`wants_to_purchase` and `might_purchase`) we see which performs better with conversions.

In [None]:
random.seed(1234)

wants_to_purchase = ["97360724240", "97363560449", "97363532149",
                     "97360810042"]
might_purchase = ["97361312743", "97363455349", "97361372389"]

def model_a_or_b(query, model_a, model_b):
    """Randomly assign this user to a or b"""
    draw = random.random()
    model_name = model_a if draw < 0.5 else model_b
    
    purchase_made = simulate_user_purchase(query, model_name, 
                                           wants_to_purchase,
                                           might_purchase)
    return (model_name, purchase_made)

number_of_users = 1000
purchases = {"click_model_basic": 0, "click_model_improved": 0}
for _ in range(number_of_users): 
    model_name, purchase_made = model_a_or_b("transformers dvd", 
                                             "click_model_basic",
                                             "click_model_improved")
    if purchase_made:
        purchases[model_name] += 1 
    
purchases

In [None]:
sdbn = sessions_to_sdbn(sessions)
sdbn.loc["transformers dvd"]

## New helper: show the features for each SDBN entry (omitted)

This function shows us the logged features of each training row for the given sdbn data for debugging.

So not just

| query   | doc      | grade
|---------|----------|---------
|transformers dvd | 1234 | 1.0

But also a recording of the matches that occured

| query           | doc      | grade    | short_desc_match  | long_desc_match |...
|-----------------|----------|----------|-------------------|-----------------|---
|transformers dvd | 1234     | 1.0      | 0.0               | 1.0             |...

In [None]:
def associate_sdbn_with_features(sdbn, feature_set):
    """Log features alongside sdbn into a dataframe"""
    judgments = sdbn_to_judgments(sdbn)
    judgments_path = "retrotech_judgments.txt"
    write_judgments(judgments, judgments_path)
    
    # For more on this code, review Chapter 10
    requests.delete(f"{SOLR_URL}/products/schema/feature-store/explore")
    
    resp = requests.put(f"{SOLR_URL}/products/schema/feature-store",
                    json=feature_set)

    ftr_logger=FeatureLogger(client, index="products", feature_set="explore", id_field="upc")
    
    with judgments_open(judgments_path) as judgment_list:
        for qid, query_judgments in groupby(judgments, key=lambda j: j.qid):
            ftr_logger.log_for_qid(judgments=query_judgments, 
                                   qid=qid,
                                   keywords=judgment_list.keywords(qid))

    logged_judgments = ftr_logger.logged
    means, std_devs, normed_judgments = normalize_features(logged_judgments)
    feature_deltas, predictor_deltas = pairwise_transform(normed_judgments)
    features, predictors = judgments_to_nparray(logged_judgments)
    logged_judgments_dataframe = pd.concat([pd.DataFrame(predictors),
                                            pd.DataFrame(features)], 
                                           axis=1,
                                           ignore_index=True)
    columns = {idx + 2: ftr["name"] for idx, ftr in enumerate(feature_set)}
    columns[0] = "grade"
    columns[1] = "qid"
    
    qid_to_query = {}
    for j in logged_judgments:
        qid_to_query[j.qid] = j.keywords
        
    qid_to_query = pd.DataFrame(qid_to_query.values()).reset_index().rename(columns={"index": "qid", 0: "query"})
    
    logged_judgments_dataframe = logged_judgments_dataframe.rename(columns=columns)
    logged_judgments_dataframe = logged_judgments_dataframe.merge(qid_to_query, how="left", on="qid")
    cols_order = ["query", "grade"] + [ftr["name"] for idx, ftr in enumerate(feature_set)]
    logged_judgments_dataframe["grade"] = logged_judgments_dataframe["grade"] / 10.0 
    return logged_judgments_dataframe[cols_order].sort_values("query")

## Listing 12.5 - Output matches for one feature set

Another way of formulating `presentation_bias` is to look at the kinds of documents not being shown to users, so we can strategically show those to users. Below we show the value of each feature in `explore_feature_set` for each document in the sdbn judgments.

In [None]:
sdbn = sessions_to_sdbn(sessions)

explore_feature_set = [
{
  "name": "long_desc_match",
  "store": "explore",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "longDescription:(${keywords})^=1"}
},
{
  "name": "short_desc_match",
  "store": "explore",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "shortDescription:(${keywords})^=1"}
},
{
  "name": "name_match",
  "store": "explore",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "name:(${keywords})^=1"}
},
{
  "name": "has_promotion",
  "store": "explore",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "promotion_b:true"}
}]

sdbn_with_features = associate_sdbn_with_features(sdbn, explore_feature_set)
sdbn_with_features[sdbn_with_features["query"] == "transformers dvd"]

## Listing 12.6 - Train Gaussian Process Regressor

We train data on just the `transformers_dvd` training data. 

NOTE we could also train on the full sdbn data, and see globally what's missing. However it's often convenient to zero in on specific queries to round out their training data.

In [None]:
from sklearn.gaussian_process import GaussianProcessRegressor

x_train = transformers_dvds[["long_desc_match", "short_desc_match",
                             "name_match", "has_promotion"]]
y_train = transformers_dvds["grade"]

gpr = GaussianProcessRegressor()
gpr.fit(x_train, y_train)

## Listing 12.7: Predict on every value

Here `gpr` predicts on every possible feature value. This lets us analyze which set of feature values to use when exploring with users.

In [None]:
zero_or_one = [0, 1]

index = pd.MultiIndex.from_product(
    [zero_or_one] * 4, names=["long_desc_match", "short_desc_match",
                              "name_match", "has_promotion"])
to_explore = pd.DataFrame(index=index).reset_index()

predictions_with_std = \
    gpr.predict(to_explore[["long_desc_match", "short_desc_match",
                                 "name_match", "has_promotion"]],
                return_std=True)
to_explore["predicted_grade"] = predictions_with_std[0]
to_explore["prediction_stddev"] = predictions_with_std[1]

to_explore.sort_values("prediction_stddev")

## Listing 12.8 - Calculate Expected Improvement


We use [Expected Improvement](https://distill.pub/2020/bayesian-optimization/) scoring to select candidates for exploration within the `transformers dvd` query.

In [None]:
from scipy.stats import norm

theta = 0.6
to_explore["opportunity"] = to_explore["predicted_grade"] - \
                            sdbn["grade"].mean() - theta

to_explore["prob_of_improvement"] = \
    norm.cdf(to_explore["opportunity"]) / to_explore["prediction_stddev"]

to_explore["expected_improvement"] = \
    to_explore["opportunity"] * to_explore["prob_of_improvement"] + \
    to_explore["prediction_stddev"] * \
    norm.pdf(to_explore["opportunity"] / to_explore["prediction_stddev"])

to_explore.sort_values("expected_improvement", ascending=False).head()

: 

## Create a query to fetch 'explore' docs (omitted)

Based on the selected features from the GaussianProcessRegressor, we create a query to fetch a doc that contains those features.

In [None]:
def explore_query(explore_vector, query):
    config_explore = {
        "long_desc_match": {"field": "longDescription", "query_dependent": True},                      "short_desc_match": {"field": "shortDescription", "query_dependent": True},
        "name_match": {"field": "name", "query_dependent": True},
        "long_description_bm25": {"field": "longDescription", "query_dependent": True},
        "manufacturer_match": {"field": "manufacturer", "query_dependent": True},
        "has_promotion": {"field": "promotion_b", "query_dependent": False, "1_value": "true"}
    }
    clauses = []
    for col_name, config in config_explore.items():
        try:
            clause = ""
            if explore_vector[col_name] == 1.0:
                clause = f"+{config["field"]}:"
            elif explore_vector[col_name] == -1.0:
                clause = f"-{config["field"]}:"
            if len(clause) > 0:  
                if config["query_dependent"]:
                    clause += f"({query})"
                else:
                    clause += f"{config["1_value"]}"

            clauses.append(clause)
        except KeyError as e:
            pass
    
    final_query = " ".join(clauses)
    final_query = final_query.strip()
    if len(final_query) == 0:
        return "*:*"
    return final_query

## Listing 12.9 - Find document to explore from Solr

Here we fetch a document that matches the properties of something missing from our training set for display to the user

In [None]:
random.seed(1234)

products_collection = engine.get_collection("products")
fields = ["long_desc_match", "short_desc_match",
          "name_match", "has_promotion"]
explore_vector = to_explore.sort_values("expected_improvement",
                                        ascending=False) \
                            .head().iloc[0][fields]

def explore(collection, query, explore_vector):
    """ Explore according to the provided explore vector, select
        a random doc from that group."""
    draw = random.random()
    q = explore_query(explore_vector, query)
    request = {
        "fields": ["upc", "name", "manufacturer", "score"],
        "limit": 1,
        "params": {"q": q, "sort": f"random_{draw} DESC"}
    }
    
    response = collection.search(request)
    return engine.docs_from_response(response)[0]["upc"]

explore(products_collection, "transformers dvd", explore_vector)

## Simulate new sessions with the new data

(Takes a while)

We simulate new sessions, if the upc is in `might_purchase` or `wants_to_purchase`, we set it to 'clicked' with a given probability.

In [None]:
import random
random.seed(1234)

wants_to_purchase = ["97360724240", "97363560449", "97363532149", "97360810042", "97368920347"]
might_purchase = ["97361312743", "97363455349", "97361372389"]
explore_on_rank = 2.0

with_explore_sessions = sessions.copy()
for i in range(0, 500):
    print(i)
    explore_upc = explore("transformers dvd", explore_vector)
    print(i, explore_upc)
    sess_ids = list(set(sessions[sessions["query"] == "transformers dvd"]["sess_id"].tolist()))
    random.shuffle(sess_ids)
    sess_ids[0]
    new_session = sessions[sessions["sess_id"] == sess_ids[0]].copy()
    new_session["sess_id"] = 100000 + i
    new_session.loc[new_session["rank"] == explore_on_rank, "doc_id"] = explore_upc
    draw = random.random()
    new_session.loc[new_session["rank"] == explore_on_rank, "clicked"] = False
    if explore_upc in wants_to_purchase:
        if draw < 0.8:
            print(f"click {explore_upc}")
            new_session.loc[new_session["rank"] == explore_on_rank, "clicked"] = True
    elif explore_upc in might_purchase:
        if draw < 0.5:
            print(f"click {explore_upc}")
            new_session.loc[new_session["rank"] == explore_on_rank, "clicked"] = True
    else:
        if draw < 0.01:
            print(f"click {explore_upc}")
            new_session.loc[new_session["rank"] == explore_on_rank, "clicked"] = True

    with_explore_sessions = pd.concat([with_explore_sessions, new_session])

with_explore_sessions[with_explore_sessions["sess_id"] == 100049]

## Listing 12.10 - Update judgments from new sessions

Have we added any new docs that appear to be getting more clicks?

In [None]:
reimproved_sdbn = sessions_to_sdbn(with_explore_sessions)
reimproved_sdbn.loc['transformers dvd']

## New heavily clicked doc is promoted!

```
      {
        "upc":"97360810042",
        "name":"Transformers: Dark of the Moon - Blu-ray Disc",
        "name_ngram":"Transformers: Dark of the Moon - Blu-ray Disc",
        "name_omit_norms":"Transformers: Dark of the Moon - Blu-ray Disc",
        "name_txt_en_split":"Transformers: Dark of the Moon - Blu-ray Disc",
        "manufacturer":"\\N",
        "shortDescription":"\\N",
        "longDescription":"\\N",
        "promotion_b":true,
        "id":"72593b1c-313b-4f25-a4f2-04eae29d858b",
        "_version_":1710117636920049669
      },
```

## Listing 12.11 - Rebuild model using updated judgments

After showing the new document to users, we can rebuild the model using judgments that cover this feature blindspot.

In [None]:
random.seed(1234)

# {'blue ray': 0.0,
# 'dryer': 0.07068309073137659,
# 'headphones': 0.06426395939086295,
# 'dark of moon': 0.25681268708548055,
# 'transformers dvd': 0.10077083021678328}

feature_set_reimproved = [
{
  "name": "name_fuzzy",
  "store": "aips_feature_store",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q" : "name_ngram:(${keywords})"}
},
{
  "name": "name_pf2",
  "store": "aips_feature_store",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "{!edismax qf=name name pf2=name}(${keywords})"
  }
},
{
  "name": "shortDescription_pf2",
  "store": "aips_feature_store",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {
    "q": "{!edismax qf=shortDescription pf2=shortDescription}(${keywords})"
  }
},
{
  "name": "has_promotion",
  "store": "aips_feature_store",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "promotion_b:true^=1.0"}
}]

train, test = test_train_split(sdbn, train=0.8)
ranksvm_ltr(train, "click_model_reimproved", feature_set_improved)
eval_model(test, "click_model_reimproved", sdbn=reimproved_sdbn)

## Listing 12.12 - Rerun A/B test on new `test3` model

In [None]:
number_of_users = 1000
purchases = {"click_model": 0, "click_model_edismax_promo": 0}
for _ in range(0, number_of_users):    
    model_name, purchase_made = model_a_or_b("transformers dvd", 
                                             "click_model_basic",
                                             "click_model_reimproved")
    if purchase_made:
        purchases[model_name] += 1 
    
purchases

## Listings 12.6-12.8 in one function (omitted)

We wrap the core of the Active Learning we covered in this chapter into a single function to allow us to select the ideal document to explore.

In [None]:
from sklearn.gaussian_process import GaussianProcessRegressor
from scipy.stats import norm


def best_explore_candidate(sdbn, feature_set, theta=0.6):
    
    requests.delete(f"{SOLR_URL}/products/schema/feature-store/explore")
    
    resp = requests.put(f"{SOLR_URL}/products/schema/feature-store",
                    json=feature_set)
    
    sdbn_ftrs = associate_sdbn_with_features(sdbn, feature_set)
    transformers_dvds = sdbn_ftrs[sdbn_ftrs["query"] == "transformers dvd"]

    y_train = transformers_dvds["grade"]
    feature_names = [ftr["name"] for ftr in explore_feature_set]
    x_train = transformers_dvds[feature_names]

    gpr=GaussianProcessRegressor()
    gpr.fit(x_train, y_train)
    
    zero_or_one = [0,1]

    index = pd.MultiIndex.from_product([zero_or_one] * 4,
                                       names = feature_names)
    to_explore = pd.DataFrame(index=index).reset_index()

    predictions_with_std = gpr.predict(to_explore[feature_names], return_std=True)
    to_explore["predicted_grade"] = predictions_with_std[0]
    to_explore["prediction_stddev"] = predictions_with_std[1]

    to_explore.sort_values("prediction_stddev")

    to_explore["opportunity"] = to_explore["predicted_grade"] - sdbn["grade"].mean() - theta


    to_explore["prob_of_improvement"] = norm.cdf( (to_explore["opportunity"]) / to_explore["prediction_stddev"])

    to_explore["expected_improvement"] = to_explore["opportunity"] * to_explore["prob_of_improvement"] \
     + to_explore["prediction_stddev"] * norm.pdf( to_explore["opportunity"] / to_explore["prediction_stddev"])


    to_explore.sort_values("expected_improvement", ascending=False).head()
    
    options = to_explore.loc[:, feature_names]
    return options.loc[0]


explore_feature_set = [
    {
      "name" : "manufacturer_match",
      "store": "explore",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "manufacturer:(${keywords})^=1"
      }
    },
    {
      "name" : "name_fuzzy",
      "store": "explore",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "name_ngram:(${keywords})"
      }
    },
    {
      "name" : "long_description_bm25",
      "store": "explore",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "longDescription:(${keywords})"
      }
    },
    {
      "name" : "short_description_constant",
      "store": "explore",
      "class" : "org.apache.solr.ltr.feature.SolrFeature",
      "params" : {
        "q" : "shortDescription:(${keywords})^=1"
      }
    }]



best_explore_candidate(sdbn, explore_feature_set)

## Listing 12.13 - Fully Automated LTR Loop

These lines expand Listing 12.13 from the book (the book content is a truncated form of what's below). You could put this in a loop and constantly try new features to try to get closer at a generalized ranking solution of what users actually want.

In [None]:
exploit_feature_set = [
{
  "name": "name_fuzzy",
  "store": "exploit_store",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "name_ngram:(${keywords})"}
},
{
  "name": "long_description_bm25",
  "store": "exploit_store",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "longDescription:(${keywords})"}
},
{
  "name": "short_description_constant",
  "store": "exploit_store",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "shortDescription:(${keywords})^=1"}
}]

train, test = test_train_split(sdbn, train=0.8) 
ranksvm_ltr(train, "exploit_model", exploit_feature_set)
eval_model(test, "exploit_model", sdbn=reimproved_sdbn)

# ===============
# EXPLORE

explore_feature_set = [
{
  "name": "manufacturer_match",
  "store": "explore",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "manufacturer:(${keywords})^=1"}
},
{
  "name": "name_fuzzy",
  "store": "explore",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "name_ngram:(${keywords})"}
},
{
  "name": "long_description_bm25",
  "store": "explore",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "longDescription:(${keywords})"}
},
{
  "name": "short_description_constant",
  "store": "explore",
  "class": "org.apache.solr.ltr.feature.SolrFeature",
  "params": {"q": "shortDescription:(${keywords})^=1"}
}]

explore_vector = best_explore_candidate(sdbn, explore_feature_set, theta=0.6)
explore_upc = explore('transformers dvd', explore_vector) 


# =========
# GATHER                                   
sdbn = sessions_to_sdbn(sessions,            
                        prior_weight=10,    
                        prior_grade=0.2)    

Up next: [Chapter 13: Semantic Search with Dense Vectors](../ch13/1.setting-up-the-outdoors-dataset.ipynb)