# Train a Model, Evaluate, and Use in the Search Engine

In these last few steps we train a model using the pairwise training set generated [in the previous step](3.pairwise-transform.ipynb).

In [1]:
from itertools import groupby
import numpy as np
import random
import requests
import sys
sys.path.append('..')
from aips import *
import json
import math

engine = get_engine()
ltr = get_ltr_engine()
tmdb_collection = engine.get_collection("tmdb")

## Reload judgments & training set

Load the dataset generated [from the previous section](3.pairwise-transform.ipynb).

In [2]:
from ltr.judgments import judgments_open

predictor_deltas = np.load("data/predictor_deltas.npy")
feature_data = np.load("data/feature_data.npy")

std_devs = feature_data[-1]
means = feature_data[-2]
feature_deltas = feature_data[:-2]

normed_judgments = []
with judgments_open("data/normed_judgments.txt") as judg_list:
    for j in judg_list:
        normed_judgments.append(j)

Parsing QID 100


## Listing 10.12

Train the model with the fully transformed dataset

In [3]:
from sklearn import svm
model = svm.LinearSVC(max_iter=10000, verbose=1)
model.fit(feature_deltas, predictor_deltas)
model.coef_

[LibLinear]

array([[0.40512184, 0.29006366, 0.14451721]])

## A few sample features (omitted from book)

Gathering features from a few movies "Star Trek II: The Wrath of Khan" and "Star Trek III: Search for Spock" to kick the tires of our model.

In [4]:
# If you wanted to confirm Wrath of Khans features

ids = ["154"] #social network graded documents
options = {"keywords": "wrath of khan"}
response = ltr.log_query(tmdb_collection, "movies", ids, options=options, fields=["id", "title"])

# Features Solr returns
# Wrath of Khan
wok_features = [5.9217176, 3.401492, 1982.0]
# Search For Spock
spock_features = [0.0,0.0,1984.0]

# Wrath of Khan
normed_wok_features = [0, 0, 0]
for idx, f in enumerate(wok_features):
    normed_wok_features[idx] = (f - means[idx]) / std_devs[idx]

normed_spock_features = [0, 0, 0]
for idx, f in enumerate(spock_features):
    normed_spock_features[idx] = (f - means[idx]) / std_devs[idx]
    
normed_spock_features

[-0.4319807665098383, -0.444478207387438, -0.4675688993325839]

## Taking the model for test drive... (omitted from book)

Here we score a few documents with the model. This code is omitted from the book, but is explored in section 10.6.2

In [5]:
def score_one(features, model):
    score = 0.0
    for idx, f in enumerate(features):
        this_coef = model.coef_[0][idx].item()
        score += f * this_coef
    
    return score

def rank(query_judgments, model):
    for j in query_judgments:
        j.score = score_one(j.features, model)
    
    return sorted(query_judgments, key=lambda j: j.score, reverse=True)

score_one(normed_spock_features, model)

-0.3715035724290449

Wrath of Khan should score higher

In [6]:
score_one(normed_wok_features, model)

1.702523726774056

## Listing 10.13 Test Training Split

In [7]:
all_qids = list(set([j.qid for j in normed_judgments]))

random.seed(1234)
random.shuffle(all_qids)
proportion_train = 0.1

test_train_split_idx = int(len(all_qids) * proportion_train)
test_qids = all_qids[:test_train_split_idx]
train_qids = all_qids[test_train_split_idx:]

train_data = []; test_data=[]
for j in normed_judgments:
    if j.qid in train_qids:
        train_data.append(j)
    elif j.qid in test_qids:
        test_data.append(j)

## Repeated from earlier - parwise transform

You've already seen this code in the third notebook, so you can move on. We just need it here to do a pairwise_transform of the training data to train a model.

In [8]:
import numpy as np
from ltr.judgments import judgments_from_file, judgments_to_nparray

def pairwise_transform(normed_judgments):
        
    from itertools import groupby
    predictor_deltas = []
    feature_deltas = []
    
    # For each query's judgments
    for qid, query_judgments in groupby(normed_judgments, key=lambda j: j.qid):

        # Annoying issue consuming python iterators, we ensure we have two
        # full copies of each query's judgments
        query_judgments_copy_1 = list(query_judgments) 
        query_judgments_copy_2 = list(query_judgments_copy_1)

        # Examine every judgment combo for this query, 
        # if they're different, store the pairwise difference:
        # +1 if judgment1 more relevant
        # -1 if judgment2 more relevant
        for judgment1 in query_judgments_copy_1:
            for judgment2 in query_judgments_copy_2:
                
                j1_features=np.array(judgment1.features)
                j2_features=np.array(judgment2.features)
                
                if judgment1.grade > judgment2.grade:
                    predictor_deltas.append(+1)
                    feature_deltas.append(j1_features - j2_features)
                elif judgment1.grade < judgment2.grade:
                    predictor_deltas.append(-1)
                    feature_deltas.append(j1_features - j2_features)

    # For training purposes, we return these as numpy arrays
    return np.array(feature_deltas), np.array(predictor_deltas)

## Listing 10.14 - train on just train data

We repeat the model training process only on the train subset of the queries. Notice because our test/training split is at the query level we repeat the pairwise transform we did earlier

In [9]:
train_feature_deltas, train_predictor_deltas = pairwise_transform(train_data)

from sklearn import svm
model = svm.LinearSVC(max_iter=10000, verbose=1)
model.fit(train_feature_deltas, train_predictor_deltas)
model.coef_

[LibLinear]

array([[0.37486805, 0.28187457, 0.12097922]])

## Listing 10.15 - eval model on test data

Here we compute a simple precision metric (proportion of relevant results in top N) averaged over all the test data. It's important to note this is not a very robust statistical analysis of the model's quality, we would want to perform multiple test-training samples and perform statistical significance testing between this experiment and a baseline

In [10]:
def evaluate_model(test_data, model, n=4):
    total_precision = 0
    unique_queries = groupby(test_data, lambda j: j.qid)
    num_groups = 0
    for qid, query_judgments in unique_queries:
        num_groups += 1
        ranked = rank(list(query_judgments), model)
        total_relevant = len([j for j in ranked[:n] if j.grade == 1])
        total_precision += total_relevant / float(n)
    return total_precision / num_groups

evaluate_model(test_data, model)

0.425

# Listing 10.16 - A Solr model

This turns the model into one usable by Solr by telling Solr

- The weights for each (normalized) feature
- The means to use to normalize each feature
- The std deviation used to normalize each feature

In [11]:
feature_names = ["title_bm25", "overview_bm25", "release_year"]
linear_model = ltr.generate_model("movies", "movie_model", feature_names,
                                  means, std_devs, model.coef_[0])
response = ltr.upload_model(tmdb_collection, linear_model)

print(json.dumps(linear_model, indent=2))

{
  "store": "movies",
  "class": "org.apache.solr.ltr.model.LinearModel",
  "name": "movie_model",
  "features": [
    {
      "name": "title_bm25",
      "norm": {
        "class": "org.apache.solr.ltr.norm.StandardNormalizer",
        "params": {
          "avg": "0.7245440735518126",
          "std": "1.6772600303613545"
        }
      }
    },
    {
      "name": "overview_bm25",
      "norm": {
        "class": "org.apache.solr.ltr.norm.StandardNormalizer",
        "params": {
          "avg": "0.6662927508611409",
          "std": "1.4990448120673643"
        }
      }
    },
    {
      "name": "release_year",
      "norm": {
        "class": "org.apache.solr.ltr.norm.StandardNormalizer",
        "params": {
          "avg": "1993.3349740932642",
          "std": "19.964916628520722"
        }
      }
    }
  ],
  "params": {
    "weights": {
      "title_bm25": 0.37486804920929745,
      "overview_bm25": 0.2818745653954488,
      "release_year": 0.12097922060239638
    }
  }


## Listing 10.17 - Solr query w/ model

Issuing a Solr query to execute the model on nearly the full corpus 

In [12]:
request = ltr.generate_query("movie_model", "harry potter",
                             ["title", "id", "score"], log=True)
response = tmdb_collection.search(**request)
documents = response["docs"]

print("\nReturned Documents:")
print(json.dumps(documents, indent="  "))

Search Request:
{
  "query": "{!ltr reRankDocs=60000 model=movie_model efi.keywords=\"harry potter\"}",
  "limit": 5,
  "params": {
    "ident": "true",
    "omitHeader": "true",
    "log": true
  },
  "fields": [
    "title",
    "id",
    "score"
  ]
}

Returned Documents:
[
  {
    "id": "570724",
    "title": "The Story of Harry Potter",
    "score": 2.57654
  },
  {
    "id": "116972",
    "title": "Discovering the Real World of Harry Potter",
    "score": 2.3650618
  },
  {
    "id": "672",
    "title": "Harry Potter and the Chamber of Secrets",
    "score": 2.129026
  },
  {
    "id": "671",
    "title": "Harry Potter and the Philosopher's Stone",
    "score": 2.109296
  },
  {
    "id": "393135",
    "title": "Harry Potter and the Ten Years Later",
    "score": 2.048223
  }
]


## Listing 10.18 - Solr query w/ model and reranking

Issuing a Solr query reranking the top 500 documents on top of a simpler baseline `edismax` search.

In [13]:
request = ltr.generate_query("movie_model", "harry potter", ["title", "id", "score"], rerank=500, log=True)
response = tmdb_collection.search(**request)
documents = response["docs"]

print("\nReturned Documents:")
print(json.dumps(documents, indent="  "))

Search Request:
{
  "query": "harry potter",
  "limit": 5,
  "params": {
    "defType": "edismax",
    "ident": "true",
    "omitHeader": "true",
    "log": true,
    "rq": "{!ltr reRankDocs=500 model=movie_model efi.keywords=\"harry potter\"}",
    "qf": [
      "title",
      "overview"
    ]
  },
  "fields": [
    "title",
    "id",
    "score"
  ]
}

Returned Documents:
[
  {
    "id": "570724",
    "title": "The Story of Harry Potter",
    "score": 2.57654
  },
  {
    "id": "116972",
    "title": "Discovering the Real World of Harry Potter",
    "score": 2.3650618
  },
  {
    "id": "672",
    "title": "Harry Potter and the Chamber of Secrets",
    "score": 2.129026
  },
  {
    "id": "671",
    "title": "Harry Potter and the Philosopher's Stone",
    "score": 2.109296
  },
  {
    "id": "54507",
    "title": "A Very Potter Musical",
    "score": 2.0859373
  }
]


## Rinse and repeat!

What would you change about this model or the features used? Maybe revisit [the features](2.judgments-and-logging.ipynb) to explore some different ideas?

Up next: [Chapter 11: Automating Learning to Rank with Click Models](../ch11/0.setup.ipynb)