Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
116 lines (102 sloc) 3.95 KB
---
# Copyright 2017 Yahoo Holdings. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
title: "The Dot Product Search Operator"
---
<p>
<code>DotProductItem</code> is a query item available in the search
container that can be used to search for documents matching a subset
of weighted tokens while at the same time calculating the sparse dot
product between token weights in the query item and the corpus.
This is the query operator for which Parallel Wand is an optimization.
</p>
<h2>When to Use</h2>
<p>
You have a collection of weighted tokens produced by an algorithm and
want to perform matching against a corpus containing weighted tokens
produced by another algorithm in order to implement personalized content exploration.
</p><p>
Generally, the use-cases for <em>dot product</em> are the same as
for <a href="advanced-search-operators.html#parallel-wand">wand</a>.
The <a href="../ranking.html#raw-scores-and-query-item-labeling">raw scores</a> produced by
<em>dot product</em> operators are equivalent to those produced by <em>Parallel Wand</em>.
</p><p>
The difference is that <em>Parallel Wand</em> will perform local
optimizations in order to retrieve the top-k results that would be
returned by the <em>dot product</em> operator.
This optimization will only yield correct results if the overall ranking is equal to the
score produced by the <em>dot product</em> operator itself.
</p><p>
It might make sense to start out using <em>dot product</em> and later
switch to wand if the performance gain outweighs the reduction in
flexibility and correctness.
Also note that benchmarking should be involved in such a switch,
to quantify the possible gain in performance (which might be negative).
</p><p>
Here follows a list of cases where <em>dot product</em> might be
preferable to <em>wand</em> (Better means more correct):
<ul>
<li>Might be more efficient with few tokens or few total results.</li>
<li>Works better with arbitrary rank expressions and compound queries.</li>
<li>Works better with grouping.</li>
<li>Scales better when partitioning the problem space.</li>
</ul>
</p>
<h2>How to Use</h2>
<p>
<code>DotProductItem</code> is an advanced feature. You may need a
custom searcher to prepare the query. Before that, you will need a
corpus with weighted tokens to be matched. Finally, it is all tied
together by using the score produced by the <em>dot product</em>
operator in a ranking expression.
</p>
<h3>Prepare the Corpus</h3>
<p>
Use a weighted set field in the document to store the tokens.
Use an attribute vector for best performance:
<pre>
field features type weightedset&lt;string&gt; {
indexing: summary | attribute
attribute: fast-search
}
</pre>
</p>
<h3>Prepare the Query</h3>
<p>
The query needs to be prepared by a custom searcher or sent
using <a href=" ../query-language.html#dot-product">YQL</a>.
The code below shows the relevant part.
If you use multiple dot products in the same query it is a good idea to label them.
This enables us to use individual dot product scores when ranking results later.
<pre>
Item makeDotProduct(String label, String field, Map&lt;String, Integer&gt; token_map) {
DotProductItem item = new DotProductItem(field);
item.setLabel(label);
for (Map.Entry&lt;String, Integer&gt; entry : token_map.entrySet()) {
item.addToken(entry.getKey(), entry.getValue());
}
return item;
}
</pre>
</p>
<h3>Ranking</h3>
<p>
The <em>dot product</em> operator produces
<a href="../ranking.html#raw-scores-and-query-item-labeling">raw scores</a>
that may be used in a ranking expression.
The simplest approach is to use the sum of all raw scores for the field containing the tokens:
<pre>
rank-profile default {
first-phase {
expression: rawScore(features)
}
}
</pre>
For better control, label each dot product in the query and use their scores individually:
<pre>
rank-profile default {
first-phase {
expression: itemRawScore(dp1) + itemRawScore(dp2)
}
}
</pre>
</p>