Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
92 lines (84 sloc) 4.05 KB
# Copyright 2017 Yahoo Holdings. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
title: "The WeightedSetItem Term"
WeightedSetItem is a query term available in
the container that can be used to search for a collection of tokens
with individual weights. It has similar semantics to EQUIV, since it
acts as a single term in the query. However, the restriction
dictating that it contains a collection of weighted tokens directly
enables specific back-end optimizations that improves performance for
large sets of tokens compared to using the generic EQUIV or OR operators.
<h2 id="use-case">Use Case</h2>
The actual use cases for this feature are for limiting the search
result to documents with specific properties that can have a large
number of distinct values. Some concrete use cases are:
<li>We know who the user is, and want to restrict to documents
written by one of the user's friends</li>
<li>We have the topic area the user is interested in, and want to
restrict to the top-ranked authors for this topic</li>
<li>We have recorded the documents that have been clicked by users
in the last 10 minutes, and want to search only in these</li>
Note that in most actual use cases the field we are searching in is
some sort of user ID, topic ID, group ID, or document ID and can often
be modeled as a number &ndash; usually in a field of
type <code>long</code> (or <code>array&lt;long&gt;</code> if multiple
values are needed). If a string field is used, it will usually also
be some sort of ID; if you have data in a string field intended for
searching with WeightedSetItem then using <code>match: word</code> for
that field is recommended.
<h2 id="activate">How to Introduce WeightedSetItem terms in Your Application</h2>
The decision to use a <code>WeightedSetItem</code> must be taken by application-specific logic.
This must be in the form of a Container plugin where the query object can be manipulated as follows:
<li>Decide if <code>WeightedSetItem</code> should be used</li>
<li>Create a new <code>WeightedSetItem</code> for the field you want to use as filter</li>
<li>Find the tokens and optionally weights to insert into the set</li>
<li>Combine new <code>WeightedSetItem</code> with the original query
by using an <code>AndItem</code></li>
Here is a simple code example adding a hardcoded filter containing 10 tokens:
private Result hardCoded(Query query, Execution execution) {
WeightedSetItem filter = new WeightedSetItem("author");
filter.addToken("magazine1", 2);
filter.addToken("magazine2", 2);
filter.addToken("magazine3", 2);
filter.addToken("tv", 3);
filter.addToken("tabloid1", 1);
filter.addToken("tabloid2", 1);
filter.addToken("tabloid3", 1);
filter.addToken("tabloid4", 1);
filter.addToken("tabloid5", 1);
filter.addToken("tabloid6", 1);
QueryTree tree = query.getModel().getQueryTree();
Item oldroot = tree.getRoot();
AndItem newtop = new AndItem();
query.trace("FriendFilterSearcher added hardcoded filter: ", true, 2);
The biggest challenge here is finding the tokens to insert; normally
the incoming search request URL should not contain all the tokens directly.
For example, the search request could contain the user id
and a lookup (in a database or a Vespa index) would fetch the friends list.
Since the tokens are inserted directly into the query without going
through the Search Container query parsing and query handling, they won't be
subject to transforms such as lowercasing, stemming, or phrase generation.
This means that if the field is a string field you'll
need to insert lowercased tokens only, and only single tokens in the index can be matched.
For more examples on how the code might look there is
<a href="">container javadoc</a> available.