## [Chapter 3] Controlling Relevance

In the [last notebook](1.ch3-vectors-and-text-similarity.ipynb), we introduced how queries and documents can be represented as vectors, how cosine similarity can be used as a relevance function to compare queries and documents, and how tf-idf ranking can be used to create a feature weight that balances both strength of occurrence (TF) and significance of a term (IDF) for each term in a term-based vector.

In this notebook, we will show how a full relevance function can be specified and controlled in a search engine (Apache Solr). Let's start with showing off the default similarity calculation leveraged by all Lucene-based search engines: BM25

### BM25 (Best Match Okapi 25)
The BM25 algorithm is the default similarity function in Apache Lucene, Apache Solr, Elasticsearch, Lucidworks Fusion, and other Lucene-based search engines. The algorithm leverages TF and IDF, but also incorporates many additional configurable options. The full BM25 calculation is found in Figure 3.7 in the book.

Instead of reimplementing the full BM25 algorithm in Python, let's go ahead and switch over to using our search engine (Apache Solr) and see how it performs the calculation.

### Creating our Collection

### Listing 3.9

In [2]:
import sys
sys.path.append('..')
from aips import *

collection = "cat_in_the_hat"
create_collection(collection)

#Ensure the fields we need are available
upsert_text_field(collection, "title")
upsert_text_field(collection, "description")

Wiping 'cat_in_the_hat' collection
Status: Success

Creating cat_in_the_hat' collection
Status: Success

Adding 'title' field to collection
Status: Success

Adding 'description' field to collection
Status: Success


### Add some documents

### Listing 3.10

In [3]:
docs = [
    {
        "id": "doc1",
        "title": "Worst",
        "description": "The interesting thing is that the person in the wrong made the right decision in the end."
    },
    {
        "id": "doc2",
        "title": "Best",
        "description": "My favorite book is the cat in the hat, which is about a crazy cat who breaks into a house and creates a crazy afternoon for two kids."
        
    },
    {
        "id": "doc3",
        "title": "Okay",
        "description": "My neighbors let the stray cat stay in their garage, which resulted in my favorite hat that I let them borrow being ruined."        
    }
]
print("\nAdding Documents to '" + collection + "' collection")
response = requests.post(solr_url + collection + "/update?commit=true", json=docs).json()
print("Status: " "Success" if response["responseHeader"]["status"] == 0 else "Failure" )



Adding Documents to 'cat_in_the_hat' collection
Status: Success


### Inspecting BM25 Score
Now, let's execute a query for `the cat in the hat`, and see how each document scores using the BM25 similarity calculation:

### Listing 3.11

In [4]:
query = "the cat in the hat"
request = {
    "query": query,
    "fields": ["id", "title", "description", "score", "[explain style=html]"],
    "params": {
      "qf": "description",
      "defType": "edismax",
      "indent": "true"
    }
}
from IPython.core.display import display,HTML
display(HTML("<br/><strong>Query: </strong><i>" + query + "</i><br/><br/><strong>Ranked Docs:</strong>"))
response = str(requests.post(solr_url + collection + "/select", json=request).json()["response"]["docs"]).replace('\\n', '').replace(", '", ",<br/>'")
display(HTML(response))
#print(str(response))

While the BM25 calculation is much more complex than the TF-IDF feature weight calculations we saw in the [last notebook](ch3-vectors-and-text-similarity.ipynb), it is fundamentally still derived from TF-IDF at it's core. As such, you'll notice that the search results actually return in the exact same order as our TF-IDF calculations from the last notebook:

```
doc2: 0.6878265
doc3: 0.6248112
doc1: 0.31257337
```

Our query for *the cat in the hat* can still very much be thought of as a vector of the BM25 scores for each of the terms: ["the", "cat", "in", "the", "hat"].

What may not be obvious, however, is that the feature weights for each of these features are actually just overridable functions. For example, this query could alternatively be expressed as the vector:

```
[ query("the"), query("cat"), query("in"), query("the"), query("hat") ]
```

In Solr syntax, this would be:

```
q={!func}query("the") {!func}query("cat") {!func}query("in") {!func}query("the") {!func}query("hat")
```

For that query, we get the following:

### Listing 3.12

In [5]:
query = '{!func}query("the") {!func}query("cat") {!func}query("in") {!func}query("the") {!func}query("hat")'
request = {
    "query": query,
    "fields": ["id", "title", "score"],
    "params": {
      "qf": "description",
      "defType": "edismax",
      "indent": "true"
    }
}
display(HTML("<strong>Query</strong>: <i>" + query + "</i><br/><br/><strong>Results:</strong>"))
response = str(requests.post(solr_url + collection + "/select", json=request).json()["response"]["docs"]).replace('\\n', '').replace(", ", ",<br/>'")
display(HTML(response))


The scores are exactly the same! Not only that, but once we realize that every term in a query is nothing more than a configurable scoring function, it opens up tremendous possibilities for manipulating that scoring function.

### Functions, Functions, Everywhere!
Now that we've seen that the relevance score for each term in our queries is simply a function operating on that term to generate a feature weight, the next logical question is "what OTHER kinds of functions can I use in my queries?".

We've already encountered the *query* function, which is effectively the default calculation that executes whenever no explicit function is specified, and which uses the BM25 similarity algorithm by default.

But what if we want to consider some other features in our scoring calculation, perhaps some that are not text-based?

Here is a partial list of common relevance techniques:
- Geospatial Boosting (documents near the user running the query should rank higher)
- Date Boosting (Newer documents should get a higher relevancy boost)
- Popularity Boosting (Documents which are more popular should get a higher relevancy boost)
- Field Boosting (Terms matching in certain fields should get a higher weight than in other fields)
- Category Boosting (Documents in categories related to query terms should get a higher relevancy boost)
- Phrase Boosting (documents matching multi-term phrases in the query should rank higher than those only matching the words separately)
- ...

Many of these techniques are built into specific query parsers in Solr, either through query syntax or through query parser options. For example, field boosting can be accomplished through the `qf` parameter on the `edismax` query parser:

```
q={!type=edismax qf="title^10 description 2.5"}the cat in the hat
```

Boosting on full phrase matching, on two-word phrases, and on three-word phrases is also a native feature of the edismax query parser:

- Boost docs containing the exact phrase "the cat in the hat":*
```
q={!type=edismax qf="title description" pf=description}the cat in the hat
``` 

- Boost docs containing "the cat", "cat in", "in the", or "the hat":
```
q={!type=edismax qf="title description" pf2=description}the cat in the hat
``` 

- Boost docs containing "the cat in" or "in the hat":
```
q={!type=edismax qf="title description" pf3=description}the cat in the hat
``` 


Many of the relevancy boosting techniques will require constructing your own features leveraging function queries, however. For example, if we wanted to create a query that did nothing more than boost the relevance ranking of documents geographically closest to the user running the search (relevance based on distance away), we could issue the following query:

```
q=*:*&sort=geodist(location, $user_latitude, $user_longitude) asc&user_latitude=33.748&user_longitude=-84.39
```

The above is using the `sort` parameter to strictly order documents by the calculated value from the `geodist` function. This works great if we want to order results by a single feature, but what if we want to construct a more nuanced sort based upon multiple features? To accomplish this, we will update our query to include each of these function in each document's relevance calculation, and then sort by the relevance score:

```
  q={!func}scale(query($keywords),0,25) 
     {!func}recip(geodist($lat_long_field,$user_latitude,$user_longitude),1,25,1)
     {!func}recip(ms(NOW/HOUR,modify_date),3.16e-11,25,1)
     {!func}scale(popularity,0,25)
     &keywords="basketball"
     &lat_long_field=location
     &user_latitude=33.748
     &user_longitude=-84.391
```     

The above query does a few interesting things:
- It constructs a query vector containing four features: BM25 Keywords relevance score (higher is better), geo distance (lower is better), publication date (newer is better), and popularity (higher is better).
- Each of the feature values is scaled between 0 and 25 so that they are all comparable, with the best keyword/geo/publication date/popularity score getting a score of 25, and the worst getting a score close to zero.
- Thus a "perfect score" would add up to 100 (25 + 25 + 25 + 25), and the worst score would be approximately 0
- Since the relative contribution of 25 is specified as part of the query for each feature, we can easily change the weights of any feature on the fly to give preference to certain features in the final relevance calculation.

With the above query, we have thus fully taken the relevance function into our own hand by modeling our relevance features and giving them weights. While this is very powerful, it still requires significant manual effort and testing to figure out which features matter for our domain, and what their relative weights should be. In chapter ? we will walk through building Machine-learned Ranking models to automatically make those decisions for us (a process known as "Learning to Rank"). For now, however, our goal was to ensure you understood the mechanics for modeling features in our query vectors, and controlling their weights. 

If you'd like to learn more about how to utilize function queries, I recommend reviewing chapter 7 of [Solr in Action](http://solrinaction.com), "Complex Ranking Functions, for a much fuller exposition. For a full list of available function queries in Solr, you can also check out the [Solr Reference Guide](https://lucene.apache.org/solr/guide/8_3/function-queries.html).

## Matching vs. Ranking
Thus far, we've only really spoken of queries as feature vectors, and we've only discussed relevance ranking as a process of calculating and adding up scores for each each feature (keyword or function) in the query. This may seem a bit strange, since most search books start with coverage of matching keywords in the search engine's inverted index and filter result sets well before discussing relevance.

We've delayed the discussion of "filtering" results until this point on purpose, however, in order to frame what search engines do as two explicitly different actions:
1) Matching: Filtering results to a known set of possible answers
2) Ranking: Ordering all of the possible answers by relevance

In reality, we can completely skip step 1 (filtering) and we'd still see the exact same results on page one (and for many pages), since the most relevant results show up first. If you think back to chapter 2, we even saw some vector scoring calculations (comparing feature vectors for food items - i.e. "apple juice" vs. "donut") that were unable to filter results and had to try to score every document to determine which ones to return based upon relevance.

So if the initial Matching phase is really optional, then why do it at all? The simple answer is that it is primarily a performance optimization. Instead of iterating through every single document and calculating a relevance score, by filtering the initial result set to a smaller set of documents which are logical matches, we can significantly speed up both our relevance calculations and the overall response time of our search engine.

Of course, there are some additional benefits to filtering our results sets, in that the total document count is reduced and we can provide analytics (facets) on the set of logically-matching documents in order to help the user further explore and refind their results set.

But when thinking about constructing relevance functions, like we did in the last section, the idea of filtering and scoring can often get mixed up, particularly since Solr itself mixes concerns in the query parameter.

### Separating Concerns: Filtering vs. Scoring
Solr has two primary ways to control filtering and scoring, the "query" (`q` parameter) and the "filters" (`fq` parameters). Consider the following request:
```
q=the cat in the hat&fq=category:books&fq=audience:kid&mm=100%&defType=edismax&mm=100%&qf=description
```

In this query, Solr is being instructed to filter the possible result set down to only documents with a value of `books` in the `category` field and also a value of `kid` in the `audience` field. In addition to those filters, however, the query itself also acts as a filter, so the result set gets further filtered down to only documents also containing (100%) of the values `the`, `cat`, `in`, `the`, and `hat` in the `description` field.

The logical difference between the `q` and `fq` parameters is that the `fq` only acts as a filter, whereas the `q` acts as BOTH a filter and feature vector for relevance ranking. This is useful default behavior for queries, but mixing the concerns of filtering and scoring in the same `q` parameter can be suboptimal, especially if we're simply trying to manipulate the relevance calculation and not arbitrarily removing results from our document set.

There are a few ways to address this:
1. Model the "q" parameter as a function (functions only count toward relevance and do not filter) 
```
q={!func}query("{!type=edismax qf=description mm=100% v=$query}")
    &fq={!cache=false v=$query}
    &query=the cat in the hat
```

2. Make your query match all documents (no filtering or scoring) and apply a Boost Query (`bq`) parameter to incluence relevance without scoring 
```
  q=*:*
    &bq={!type=edismax qf=description mm=100% v=$query}
    &fq={!cache=false v=$query}
    &query=the cat in the hat
```

Both of these approaches are logically equivalent, but we'll go with option 2 throughout this book since it is a bit cleaner to use the dedicated `bq` parameter which was designed to contribute toward the relevance calculation without filtering. [BIG TODO: Verify that multiplicative boosts still work when specifying a bq= and a boost= using edismax.

## Multiplicative vs. Additive Boosting
One last topic to address concerning how we control our relevance functions is the idea of using multiplicative vs. additive boosting of relevance features.

In all of our examples to this point, we have added multiple features together into our query vector to contribute to the score. For example, the following queries will all yield equivalent relevance calculations assuming they are all filtered down to the same result set (i.e. `fq=the cat in the hat`):
- `q=the cat in the hat` 
- `q={!func}query("the") {!func}query("cat") {!func}query("in") {!func}query("the") {!func}query("hat")`
- `q=the cat&bq=in the hat`
- `q=*:*&bq=the cat in the hat`
= `q={!func}query("the cat in the hat")`

This kind of relevance is known as Additive Boosting, and maps well to our concept of a query as nothing more than a vector of features in the query that need to have their similarity compared across documents.

In many cases, however, we are likely to want to specify Multiplicative boosts as part of our relevance calculations. Instead of adding additional features to our vector, multiplicative boosts increase the relevance of an entire document by multiplying the document's score by some multiplier.

For example, if we wanted to query for `the cat in the hat`, but wanted the popularity of documents (those with a higher number in the `popularity` field) to have a less constrained effect, we can't easily do this by just adding another feature into our query vector - at least not without modifying the weights of all the other features, plus any additional normalization that may be applied by the BM25 ranking function. If we wanted to apply multiple boosts like this (for example, boosting both on popularity AND on publication date), then the option of modeling this as an additive boost becomes unreasonably complex and harder to control.

In Figure X, we were able to successfully utilize additive boosting by explicitly constraining the minumum and maximum values each feature in our query vector so that each feature provided a known contribution to the overall relevance function.

Multiplicative boosting enables boosts to "pile up" upon each other, however, because each of the boosts is multiplied agains the overall relevance score for the document, resulting in a much fuzzier match and preventing the need for the kind of tight constraints we had to supply for our additive boost example.

To supply a multiplicative boost, you can either use the `{!boost}` query parser in your query vector or, if you are using the `edismax` query parser, the simplified `boost` query param. For example, to multiple a document's relevance score by ten times the value in the popularity field, you would do either: 
```
q=the cat in the hat&defType=edismax&boost=mul(popularity,10)
```
OR
```
q={!boost b=mul(popularity,10)}the cat in the hat
```

In general, multiplicative boosts enable you greater flexibility to combine different relevance features without having to explicitly pre-define and exact relevance formula accounting for every potential contributing factor. On the other hand, this flexibility can lead to unexpected consequences if the multiplicative boosts values for particular features get too high and overshadow other features. Additive boosts can be a pain to manage, because you have to explicitly scale them so that they can be combined together and maintain a predictible contribution to the overall score, but once you've done this you maintain tight control over the relevance scoring calculation and range of scores.

Both additive and multiplicative boosts can be useful in different scenarios, so it's best to consider the problem at hand and experiment with what gets you the best results.