# Train a Model, Evaluate, and Use in the Search Engine

In these last few steps we train a model using the pairwise training set generated [in the previous step](3.pairwise-transform.ipynb).

In [1]:
from itertools import groupby
import numpy
import random
import sys
sys.path.append('../..')
from aips import *
import json

engine = get_engine()
tmdb_collection = engine.get_collection("tmdb")
ltr = get_ltr_engine(tmdb_collection)

## Reload judgments & training set

Load the dataset generated [from the previous section](3.pairwise-transform.ipynb).

In [2]:
from ltr.judgments import judgments_open

predictor_deltas = numpy.load("data/predictor_deltas.npy")
feature_data = numpy.load("data/feature_data.npy")

std_devs = feature_data[-1]
means = feature_data[-2]
feature_deltas = feature_data[:-2]

normed_judgments = []
with judgments_open("data/normed_judgments.txt") as judg_list:
    for j in judg_list:
        normed_judgments.append(j)

Parsing QID 100


## Listing 10.12

Train the model with the fully transformed dataset

In [3]:
random.seed(0)

from sklearn import svm
model = svm.LinearSVC(max_iter=10000)
model.fit(feature_deltas, predictor_deltas)
display(model.coef_[0])

array([0.37611453, 0.26665264, 0.09971659])

## A few sample features (omitted from book)

Gathering features from a few movies "Star Trek II: The Wrath of Khan" and "Star Trek III: Search for Spock" to kick the tires of our model.

In [15]:
# If you wanted to confirm Wrath of Khans features

ids = ["154"] #social network graded documents
options = {"keywords": "wrath of khan"}
response = ltr.get_logged_features("movie_model", ids, options=options, fields=["id", "title"])

# Features from the search engine
# Wrath of Khan
wok_features = [5.9217176, 3.401492, 1982.0]
# Search For Spock
spock_features = [0.0,0.0,1984.0]

# Wrath of Khan
normed_wok_features = [0, 0, 0]
for idx, f in enumerate(wok_features):
    normed_wok_features[idx] = (f - means[idx]) / std_devs[idx]

normed_spock_features = [0, 0, 0]
for idx, f in enumerate(spock_features):
    normed_spock_features[idx] = (f - means[idx]) / std_devs[idx]
    
display(normed_spock_features)

[-0.4300788728590096, -0.4553915319569039, -0.44511783129197596]

## Taking the model for test drive... (omitted from book)

Here we score a few documents with the model. This code is omitted from the book, but is explored in section 10.6.2

In [16]:
def score_one(features, model):
    score = 0.0
    for idx, f in enumerate(features):
        this_coef = model.coef_[0][idx].item()
        score += f * this_coef
    
    return score

def rank(query_judgments, model):
    for j in query_judgments:
        j.score = score_one(j.features, model)
    
    return sorted(query_judgments, key=lambda j: j.score, reverse=True)

score_one(normed_spock_features, model)

-0.3050363943771231

Wrath of Khan should score higher

In [17]:
score_one(normed_wok_features, model)

1.4957538659013554

## Listing 10.13 Test Training Split

In [18]:
random.seed(1234)

In [19]:
all_qids = list(set([j.qid for j in normed_judgments]))
random.shuffle(all_qids)
proportion_train = 0.1

split_index = int(len(all_qids) * proportion_train)
test_qids = all_qids[:split_index]
train_qids = all_qids[split_index:]

train_data, test_data= [], []
for j in normed_judgments:
    if j.qid in train_qids:
        train_data.append(j)
    elif j.qid in test_qids:
        test_data.append(j)

## Repeated from earlier - parwise transform

You've already seen this code in the third notebook, so you can move on. We just need it here to do a pairwise_transform of the training data to train a model.

In [20]:
import numpy

def pairwise_transform(normed_judgments):
        
    from itertools import groupby
    predictor_deltas = []
    feature_deltas = []
    
    # For each query's judgments
    for qid, query_judgments in groupby(normed_judgments, key=lambda j: j.qid):

        # Annoying issue consuming python iterators, we ensure we have two
        # full copies of each query's judgments
        query_judgments_copy_1 = list(query_judgments) 
        query_judgments_copy_2 = list(query_judgments_copy_1)

        # Examine every judgment combo for this query, 
        # if they're different, store the pairwise difference:
        # +1 if judgment1 more relevant
        # -1 if judgment2 more relevant
        for judgment1 in query_judgments_copy_1:
            for judgment2 in query_judgments_copy_2:
                
                j1_features=numpy.array(judgment1.features)
                j2_features=numpy.array(judgment2.features)
                
                if judgment1.grade > judgment2.grade:
                    predictor_deltas.append(+1)
                    feature_deltas.append(j1_features - j2_features)
                elif judgment1.grade < judgment2.grade:
                    predictor_deltas.append(-1)
                    feature_deltas.append(j1_features - j2_features)

    # For training purposes, we return these as numpy arrays
    return numpy.array(feature_deltas), numpy.array(predictor_deltas)

## Listing 10.14 - train on just train data

We repeat the model training process only on the train subset of the queries. Notice because our test/training split is at the query level we repeat the pairwise transform we did earlier

In [21]:
train_feature_deltas, train_predictor_deltas = pairwise_transform(train_data)

from sklearn import svm
model = svm.LinearSVC(max_iter=10000, verbose=1)
model.fit(train_feature_deltas, train_predictor_deltas)
display(model.coef_[0])

[LibLinear]

array([0.34455363, 0.26599357, 0.08024811])

## Listing 10.15 - eval model on test data

Here we compute a simple precision metric (proportion of relevant results in top N) averaged over all the test data. It's important to note this is not a very robust statistical analysis of the model's quality, we would want to perform multiple test-training samples and perform statistical significance testing between this experiment and a baseline

In [22]:
def evaluate_model(test_data, model, k=5):
    total_precision = 0
    unique_queries = groupby(test_data, lambda j: j.qid)
    num_groups = 0
    for qid, query_judgments in unique_queries:
        num_groups += 1
        ranked = rank(list(query_judgments), model)
        total_relevant = len([j for j in ranked[:k] if j.grade == 1])
        total_precision += total_relevant / float(k)
    return total_precision / num_groups

evaluation = evaluate_model(test_data, model)
print(evaluation)

0.28


# Listing 10.16 - A search engine LTR model

This turns the model into one usable by the search engine

- The weights for each (normalized) feature
- The means to use to normalize each feature
- The std deviation used to normalize each feature

In [23]:
model_name = "movie_model"
feature_names = ["title_bm25", "overview_bm25", "release_year"]
linear_model = ltr.generate_model(model_name, feature_names,
                                  means, std_devs, model.coef_[0])
response = ltr.upload_model(linear_model)
display(linear_model)

{'store': 'movie_model',
 'class': 'org.apache.solr.ltr.model.LinearModel',
 'name': 'movie_model',
 'features': [{'name': 'title_bm25',
   'norm': {'class': 'org.apache.solr.ltr.norm.StandardNormalizer',
    'params': {'avg': '0.7240581864747018', 'std': '1.683547442499148'}}},
  {'name': 'overview_bm25',
   'norm': {'class': 'org.apache.solr.ltr.norm.StandardNormalizer',
    'params': {'avg': '0.6904480748206504', 'std': '1.5161636226604036'}}},
  {'name': 'release_year',
   'norm': {'class': 'org.apache.solr.ltr.norm.StandardNormalizer',
    'params': {'avg': '1993.045660222131', 'std': '20.32194530575292'}}}],
 'params': {'weights': {'title_bm25': 0.34455363381155396,
   'overview_bm25': 0.2659935690511117,
   'release_year': 0.08024810647908147}}}

## Listing 10.17 - Search with the trained LTR model

Executing a search with the LTR model reranking (expensive)

In [24]:
request = {"query": "harry potter",
           "query_fields": ["title", "overview"],
           "return_fields": ["title", "id", "score"]}
response = ltr.search_with_model("movie_model", **request)
print("\nReturned Documents:")
display(response["docs"])


Returned Documents:


[{'id': '570724', 'title': 'The Story of Harry Potter', 'score': 2.1900952},
 {'id': '116972',
  'title': 'Discovering the Real World of Harry Potter',
  'score': 2.0560164},
 {'id': '672',
  'title': 'Harry Potter and the Chamber of Secrets',
  'score': 1.8415655},
 {'id': '671',
  'title': "Harry Potter and the Philosopher's Stone",
  'score': 1.8207769},
 {'id': '54507', 'title': 'A Very Potter Musical', 'score': 1.80214}]

## Listing 10.18 - A search query utilizing a baseline search and rerank utilizing the model

Issuing a simple lexical query followed by a rerank the top 500 documents using the LTR model (optimized)

In [25]:
request = {"query": "harry potter",
           "query_fields": ["title", "overview"],
           "return_fields": ["title", "id", "score"],
           "rerank_query": "harry potter",
           "rerank_count": 500}
response = ltr.search_with_model("movie_model", **request)
print("\nReturned Documents:")
display(response["docs"])


Returned Documents:


[{'id': '570724', 'title': 'The Story of Harry Potter', 'score': 2.1900952},
 {'id': '116972',
  'title': 'Discovering the Real World of Harry Potter',
  'score': 2.0560164},
 {'id': '672',
  'title': 'Harry Potter and the Chamber of Secrets',
  'score': 1.8415655},
 {'id': '671',
  'title': "Harry Potter and the Philosopher's Stone",
  'score': 1.8207769},
 {'id': '54507', 'title': 'A Very Potter Musical', 'score': 1.80214}]

## Rinse and repeat!

What would you change about this model or the features used? Maybe revisit [the features](2.judgments-and-logging.ipynb) to explore some different ideas?

Up next: [Chapter 11: Automating Learning to Rank with Click Models](../ch11/0.setup.ipynb)