Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
1000 lines (953 sloc) 44.9 KB
---
# Copyright 2019 Oath Inc. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
title: "Query Language Reference"
---
<script>
function replace(encodedText) {
var body_element = document.getElementsByTagName('body')[0];
var selection = window.getSelection();
var newdiv = document.createElement('div');
body_element.appendChild(newdiv);
newdiv.innerHTML = encodedText;
newdiv.style.position='absolute';
newdiv.style.left='-99999px';
selection.selectAllChildren(newdiv);
window.setTimeout(function() {
body_element.removeChild(newdiv);
},0);
}
function init() {
var elements = document.getElementsByClassName("urlunencode");
var len = elements.length
for (var i = 0 ; i < len; i++)
{
var original = elements[i].innerHTML;
elements[i].innerHTML = decodeURIComponent(original);
elements[i].getAttributeNode("oncopy").nodeValue = "replace(\""+original+"\");";
}
}
</script>
<p>
Vespa accepts unstructured human input and structured queries for application logic separately,
then combines them into a single data structure for executing.
Human input is parsed heuristically, while application queries are formulated in YQL.
</p><p>
A query URL looks like:
<pre>
http://myhost.mydomain.com:8080/search/?yql=select%20%2A%20from%20sources%20%2A%20where%20text%20contains%20%22blues%22%3B
</pre>
In other words, <em>yql</em> contains:
<pre class="urlunencode" oncopy="">
select%20%2A%20from%20sources%20%2A%20where%20text%20contains%20%22blues%22%3B
</pre>
This <a href="search-definitions-reference.html#match">matches</a> all documents
where the field named <em>text</em> contains the word <em>blues</em>.
<p>
<h2 id="select">select</h2>
<p>
<em>select</em> is the list of <a href="../search-definitions.html#indexing">summary fields</a> requested
(a field with the "summary" index attribute).
Vespa will hide other fields in the matching documents.
<pre class="urlunencode" oncopy="">
select%20price,isbn%20from%20sources%20%2A%20where%20title%20contains%20%22madonna%22%3B
</pre>
The above explicitly requests the fields "price" and "isbn" (from all sources).
To request all fields, use an asterisk as field selection:
<pre class="urlunencode" oncopy="">
select%20*%20from%20sources%20%2A%20where%20title%20contains%20%22madonna%22%3B
</pre>
</p>
<h2 id="from-sources">from sources</h2>
<!-- ToDo: describe how this is equal to model.sources -->
<p>
<em>from sources</em> specifies which document
<a href="search-api-reference.html#model.sources">sources</a> to search. Example:
<pre class="urlunencode" oncopy="">
select%20%2A%20from%20music%20where%20title%20contains%20%22madonna%22%3B
</pre>
searches in <em>music</em> documents. Search in:
<table class="table">
<thead></thead><tbody>
<tr><td>all sources</td>
<td><code>select … from <strong>sources *</strong> where …</code></td></tr>
<tr><td>a set of sources</td>
<td><code>select … from <strong>sources source1, source2</strong> where …</code></td></tr>
<tr><td>a single source</td>
<td><code>select … from <strong>source1</strong> where …</code></td></tr>
</tbody>
</table>
In other words, <em>sources</em> is used for querying some/all sources.
If only a single source is queries, the <em>sources</em> keyword is dropped.
</p>
<h2 id="where">where</h2>
<p>
Operators
<table class="table">
<thead></thead><tbody>
<tr id="numeric"><th>numeric</th><td>
<p>
The following numeric operators are available:
<em>=, &lt;, &gt;, &lt;=, &gt;=, range(field, lower bound, upper bound)</em>
<pre class="urlunencode" oncopy="">
where%20500%20%3E%3D%20price%3B
</pre>
<pre class="urlunencode" oncopy="">
where%20range%28fieldname%2C%200%2C%205000000000L%29%3B
</pre>
Numbers must be in the signed 32-bit range,
input 64-bit signed numbers using <em>L</em> as suffix.
</p><p>
The interval is by default a closed interval.
If the lower bound is exclusive, set the annotation "bounds" to "leftOpen". <!-- ToDo annot -->
If the upper bound is exclusive, set the same annotation to "rightOpen".
If both bounds are exclusive, set the annotation to "open". <!-- ToDo: example here! -->
</p><p>
The number operations support an extra annotation, the integer "hitLimit".
This is used for <em>capped range search</em>.
An alternative to using negative and positive values for "hitLimit"
is always using a positive number of hits
(as a negative number of hits do no not make much sense)
and combine this with either of the boolean annotations "ascending" and "descending" (but not both).
Then "[{"hitLimit": 38, "descending": true}]" would be equivalent to setting it to -38,
i.e. only populate with 38 hits and start from upper boundary, i.e. descending order.
</p>
</td></tr>
<tr id="boolean"><th>boolean</th><td>
<p>
The boolean operator is: =
<pre class="urlunencode" oncopy="">
where%20alive%20%3D%20true%3B
</pre>
</p>
</td></tr>
<tr id="contains"><th>contains</th><td>
<p>
The right hand side argument of the contains operator is either a string literal,
or a function, like <em>phrase</em>.
</p>
<p>
<em>contains</em> is the basic building block for text matching.
The kind of <a href="search-definitions-reference.html#match">matching</a>
to be done depends on the field settings in the search definition.
<pre class="urlunencode" oncopy="">
where%20title%20contains%20%22madonna%22%3B
</pre>
The matched field must be an
<a href="../search-definitions.html#indexing">indexed field or attribute</a>.
</p><p>
Fields inside structs are referenced using dot notation -
e.g <code>mystruct.mystructfield</code>.
</p><p>
By default, the string will be <a href="../linguistics.html#tokenization">tokenized</a>
to match the field(s) searched.
Explicitly control tokenization by using annotations:
<pre class="urlunencode" oncopy="">
where%20title%20contains%20%28%5B%7B%22stem%22%3A%20false%7D%5D%22madonna%22%29%3B
</pre>
Note the use of parentheses to control precedence.
</p>
<table class="table">
<tr id="and"><th>and</th><td>
<p>
<em>and</em> accepts other <em>and</em> statements, <em>or</em> statements,
<a href="#userquery">userQuery</a>, logically inverted statements -
and contains statements as arguments:
<pre class="urlunencode" oncopy="">
where%20title%20contains%20%22madonna%22%20and%20title%20contains%20%22saint%22%3B
</pre>
</p>
</td></tr>
<tr id="or"><th>or</th><td>
<p>
<em>or</em> accepts other <em>or</em> statements, <em>and</em> statements,
<a href="#userquery">userQuery</a> - and contains statements as arguments:
<pre class="urlunencode" oncopy="">
where%20title%20contains%20%22madonna%22%20or%20title%20contains%20%22saint%22%3B
</pre>
</p>
</td></tr>
<tr id="andnot"><th>andnot</th><td>
<p>
As Vespa does recall as opposed to filtering,
the only <em>excluding</em> operator in Vespa is <em>andnot</em>.
In YQL this is expressed as the right hand side, and only the right hand side,
argument of the <em>and</em> operator may be a logically inverted expression,
i.e. using the <em>!</em> operator:
<pre class="urlunencode" oncopy="">
where%20title%20contains%20%22madonna%22%20and%20%21%28title%20contains%20%22saint%22%29%3B
</pre>
</p>
</td></tr>
</table>
<table class="table">
<tr id="phrase"><th>phrase</th><td>
<p>
YQL has no native definition of e.g. phrase matching.
Here the Vespa integration uses a function:
<pre class="urlunencode" oncopy="">
where%20text%20contains%20phrase%28%22st%22%2C%20%22louis%22%2C%20%22blues%22%29%3B
</pre>
It can be necessary to pass along extra information about a search term,
for instance when specifying a term should not be stemmed before matching.
This is done by using YQL annotations:
<pre class="urlunencode" oncopy="">
where%20text%20contains%20%28%5B%7B%22stem%22%3A%20false%7D%5D%22blues%22%29%3B
</pre>
<p>
</td></tr>
<tr id="near"><th>near</th><td>
<p>
<em>near()</em> matches if all argument terms occur close to each other in the same document.
It supports the <em>distance</em>-annotation which controls
how many words are allowed to separate the argument terms.
The default value is 2.
<pre class="urlunencode" oncopy="">
where%20text%20contains%20%28%5B%20%7B%22distance%22%3A%205%7D%20%5Dnear%28%22a%22%2C%20%22b%22%29%29%3B
</pre>
</p>
</td></tr>
<tr id="onear"><th>onear</th><td>
<p>
<em>onear()</em> (ordered near) is like <em>near()</em>,
but also requires the terms in the document having the same order
as given in the function (i.e. it is a phrase allowing other words interleaved).
</p>
</td></tr>
<tr id="sameelement"><th>sameElement</th><td>
<p>
<em>sameElement()</em> is an operator that requires the terms to match within the same struct element in an array or a map field. Example:
<pre>
struct person {
field first_name type string {}
field last_name type string {}
field year_of_birth type int {}
}
field persons type array&lt;person&gt; {
indexing: summary
struct-field first_name { indexing: attribute }
struct-field last_name { indexing: attribute }
struct-field year_of_birth { indexing: attribute }
}
field identities type map&lt;string, person&gt; {
indexing: summary
struct-field key { indexing: attribute }
struct-field value.first_name { indexing: attribute }
struct-field value.last_name { indexing: attribute }
struct-field value.year_of_birth { indexing: attribute }
}
</pre>
With normal <em>AND</em> the query <code>persons.first_name AND persons.last_name</code>
will normally not give you what you want.
It will match if a document has a <em>persons</em> element with a matching <em>first_name</em>
<em>AND</em> any element with a matching <em>last_name</em>.
So you will get a lot of false positives since there is nothing limiting them to the same element.
However, that is what <em>sameElement</em> ensures.
<pre class="urlunencode" oncopy="">
where%20persons%20contains%20sameElement%28first_name%20contains%20'Joe',
%20last_name%20contains%20'Smith',%20year_of_birth%20%3C%201940%29%3B
</pre>
The above returns all documents containing Joe Smith born before 1940 in the <em>persons</em> array.
</p><p>
Searching in a map is similar to searching in an array of struct.
The difference is that you have an extra synthetic struct with the field members <em>key</em> and <em>value</em>.
The above example with the <em>identities</em> map looks like this:
<pre class="urlunencode" oncopy="">
where%20identities%20contains%20sameElement%28key%20contains%20'father',
%20value.first_name%20contains%20'Joe',%20value.last_name%20contains%20'Smith',%20value.year_of_birth%20%3C%201940%29%3B
</pre>
The above returns all documents that have tagged Joe Smith born before 1940 as a 'father'.
The importance here is using the indirection of <em>key</em> and <em>value</em>
to address the keys and the values of the map.
</p>
</td></tr>
<tr id="equiv"><th>equiv</th><td>
<p>
If two terms in the same field should give exactly the same behavior when match,
the <em>equiv()</em> operator behaves like a special case of "or".
<pre class="urlunencode" oncopy="">
where%20fieldName%20contains%20equiv%28%22A%22%2C%22B%22%29%3B
</pre>
</p><p>
In many cases, the OR operator will give the same results as an EQUIV.
The matching logic is exactly the same,
and an OR does not have the limitations that EQUIV does (below).
The difference is in how matches are visible to ranking functions.
All words that are children of an OR count for ranking.
When using an EQUIV however, it looks like a single word:
<ul>
<li>Counts as only +1 for queryTermCount</li>
<li>Counts as 1 word for completeness measures</li>
<li>Proximity will not discriminate different words inside the EQUIV</li>
<li>Connectivity can be set between the entire EQUIV and the word before and after</li>
<li>Items inside the EQUIV are not directly visible to ranking features,
so weight and connectivity on those will have no effect</li>
</ul>
Limitations on how <em>equiv</em> can be used in a query:
<ul>
<li><em>equiv</em> may not appear inside a phrase</li>
<li>It may only contain <code>TermItem</code> and <code>PhraseItem</code> instances.
Operators like <em>and</em> cannot be placed inside <em>equiv</em></li>
<li><code>PhraseItems</code> inside <em>equiv</em> will rank like as if they have size 1</li>
</ul>
Learn how to use <a href="../query-language.html#equiv">equiv</a>.
</p>
</td></tr>
</table></td>
</tr>
<tr id="matches"><th>matches</th><td>
<p>
Regular expressions is supported using
<a href="http://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Extended_Regular_Expressions">posix extended syntax</a>
with the limitation that it is <strong>case insensitive</strong>.
Replace <code>contains</code> with <code>matches</code> to run a regex search.
This example becomes a substring search:
<pre class="urlunencode" oncopy="">
where%20title%20matches%20%22madonna%22%3B
</pre>
This example matches both <code>madonna</code>, <code>madona</code> and with any number of <code>n</code>s:
<pre class="urlunencode" oncopy="">
where%20title%20matches%20%22mado%5Bn%5D%2Ba%22%3B
</pre>
Here you match any string starting with <code>mad</code>:
<pre class="urlunencode" oncopy="">
where%20title%20matches%20%22^mad%22%3B
</pre>
</p><p>
<strong>Note:</strong> Only <a href="search-definitions-reference.html#attribute">attribute</a>
fields in <a href="services-content.html#document">documents</a> that have <code>mode="index"</code> is supported.
It is also not optimized.
Having a prefix using the <code>^</code> will be faster than not having one.
</p>
</td></tr>
<tr id="userinput"><th>userInput</th><td>
<p>
<em>userInput()</em> is a robust way of mixing user input and a formal query.
It allows controlling whether the user input is to be stemmed, lowercased, etc,
but it also allows for controlling whether it should be treated as a raw string,
whether it should simply be segmented or parsed as a query.
<pre class="urlunencode" oncopy="">
yql=select%20%2A%20from%20sources%20%2A%20where%20userInput%28%40animal%29%3B&amp;animal=panda
</pre>
Here, the userInput() function will access the query property "animal",
and parse the property value as an "ALL" query, resulting in the following expression:
<pre class="urlunencode" oncopy="">
select%20%2A%20from%20sources%20%2A%20where%20default%20contains%20%22panda%22%3B
</pre>
Now, if we changed the value of "animal" without changing the rest of the expression:
<pre class="urlunencode" oncopy="">
yql=select%20%2A%20from%20sources%20%2A%20where%20userInput%28%40animal%29%3B&amp;animal=panda%20smokey
</pre>
The result would be:
<pre class="urlunencode" oncopy="">
select%20%2A%20from%20sources%20%2A%20where%20default%20contains%20%22panda%22%20and%20default%20contains%20%22smokey%22%3B
</pre>
Now, let's assume we want to combine multiple query properties and have a more complex expression as well:
<pre class="urlunencode" oncopy="">
yql=select%20%2A%20from%20sources%20%2A%20where%20range%28year%2C%201963%2C%202014%29
%20and%20%28userInput%28%40animal%29%20or%20userInput%28%40teddy%29%29%3B&amp;animal=panda&amp;teddy=bear%20roosevelt
</pre>
The resulting YQL expression will be:
<pre class="urlunencode" oncopy="">
select%20%2A%20from%20sources%20%2A%20where%20range%28year%2C%201963%2C%202014%29
%20and%20%28default%20contains%20%22panda%22%20or%20%28default%20contains%20%22bear%22%20and%20default%20contains%20%22roosevelt%22%29%29%3B
</pre>
Now, consider we do not want the "teddy" field to be treated as its own query segment,
it should only be segmented with the linguistic libraries to get recall.
We can do this by adding a "grammar" annotation to the userInput() call:
<pre class="urlunencode" oncopy="">
yql=select%20%2A%20from%20sources%20%2A%20where%20range%28year%2C%201963%2C%202014%29
%20and%20%28userInput%28%40animal%29%20or%20%5B%7B%22grammar%22%3A%20%22segment%22%7D%5DuserInput%28%40teddy%29%29%3B&amp;animal=panda&amp;teddy=bear%20roosevelt
</pre>
Then, the linguistic library will split on space, and the resulting expression is:
<pre class="urlunencode" oncopy="">
select%20%2A%20from%20sources%20%2A%20where%20range%28year%2C%201963%2C%202014%29
%20and%20%28default%20contains%20%22panda%22%20or%20default%20contains%20phrase%28%22bear%22%2C%20%22roosevelt%22%29%29%3B
</pre>
Instead of a variable reference,
the <em>userInput()</em> function also accepts raw strings as arguments,
but this would obviously not be suited for parametrizing the query from a query profile.
It is mostly intended as test feature.
</p><p>
<em>userInput()</em> control annotations:
<table class="table">
<thead>
<tr>
<th>Name</th><th>Default</th><th>Values</th><th>Effect</th>
</tr>
</thead>
<tr>
<td>grammar</td>
<td><code>all</code></td>
<td><code>raw</code>, <code>segment</code> and all values accepted for the
<a href="search-api-reference.html#model.type">model.type</a> argument
in the search API.
</td>
<td>How to parse the user input. "raw" will treat the user input as a
string to be matched without any processing, "segment" will do a
first pass through the linguistic libraries, while the rest of the
values will treat the string as a query to be parsed. If query parsing
fails, an error message will be returned.
</td>
</tr>
<tr>
<td>defaultIndex</td>
<td><code>default</code></td>
<td>Any searchable field in the system's search definition.</td>
<td>Same as <a
href="search-api-reference.html#model.defaultIndex">model.defaultIndex</a>
in the search API. If "grammar" is set to "raw" or "segment",
this will be the field searched.
</td>
</tr>
<tr>
<td>language</td>
<td><em>Autodetect</em></td>
<td>RFC 3066 language code</td>
<td>Language setting for the linguistics treatment of this userInput() call,
also see <a
href="search-api-reference.html#model.language">model.language</a> in
the search API reference.
</td>
</tr>
<tr>
<td>allowEmpty</td>
<td><code>false</code></td>
<td>Boolean true or false.</td>
<td>Whether to allow empty input for query parsing and search terms.
If this is true, a NullItem instance is inserted in the proper place in the query tree.
If "allowEmpty" is false, the query will fail
if the user provided data can not be parsed or is empty.
</td>
</tr>
</table>
</p><p>
In addition, other annotations, like <em>stem</em> or <em>ranked</em>, will take effect as normal.
</p><p>
The query parsing mechanism has currently certain limitations for propagating annotation,
therefore, for any value of <em>grammar</em> other than <em>raw</em> or <em>segment</em>,
only the following annotations will take effect:
<ul>
<li><code>ranked</code></li>
<li><code>filter</code></li>
<li><code>stem</code></li>
<li><code>normalizeCase</code></li>
<li><code>accentDrop</code></li>
<li><code>usePositionData</code></li>
</ul>
</p>
</td></tr>
<tr id="userquery"><th>userQuery</th><td>
<p>
<em>userQuery()</em> evaluates to the parsed user query,
i.e. the HTTP API parameter named <em>query</em>
(including the <em>filter</em> part, if this is available).
The function userQuery represents where the heuristically parsed query
is to be inserted as a sub-tree into the YQL query.
In other words, this is not a string substitution,
the user query is first parsed with any of the supported grammars,
then the resulting tree is inserted into the corresponding place in the YQL query tree:
<pre>
http://myhost.mydomain.com:8080/search/?query=abc%20def%20-ghi&amp;type=all&amp;
yql=select%20%2A%20from%20sources%20%2A%20where%20vendor%20contains%20%22brick%20and%20mortar%22%20AND%20price%20%3C%2050
%20AND%20userQuery%28%29%3B
</pre>
Breakdown:
<table class="table">
<tr><td>query</td>
<td>abc def -ghi</td></tr>
<tr><td>type</td>
<td>all</td></tr>
<tr><td>yql</td>
<td>select * from sources * where vendor contains "brick and mortar" AND price &lt; 50 AND userQuery();</td></tr>
</table>
The above example will in other words evaluate to a query
where the numeric field <em>price</em> must have a value lower than 50,
<em>vendor</em> must match the term <em>brick and mortar</em>,
<em>and</em> the default index must contain the two terms <em>abc</em> and <em>def</em>
while <em>not</em> containing the term <em>ghi</em>.
The spaces in the vendor term will not be used to split this into several new terms by YQL.
The string specified by the search will be used.
Query transformers may convert the string at a later stage,
but it is not necessary to do anything "special" to create a search term containing arbitrary characters.
</p>
</td></tr>
<tr id="rank"><th>rank</th><td>
<p>
The first, and only the first, argument of the <em>rank()</em> function
determines whether a document is a match,
but all arguments are used for calculating rank score.
<pre class="urlunencode" oncopy="">
where%20rank%28a%20contains%20%22A%22%2C%20b%20contains%20%22B%22%29%3B
</pre>
</p>
</td></tr>
<tr id="dotproduct"><th>dotProduct</th><td>
<p>
<em>dotProduct</em> calculates the dot product between the weighted set
in the query and a weighted set field in the document as its rank score contribution:
<pre class="urlunencode" oncopy="">
where%20dotProduct%28description%2C%20%7B%22a%22%3A1%2C%20%22b%22%3A2%7D%29%3B
</pre>
The result is stored as a <a href="../advanced-ranking.html#raw-scores-and-query-item-labeling">raw score</a>.
</p><p>
A normal use case is a collection of weighted tokens produced by an algorithm,
to match against a corpus containing weighted tokens
produced by another algorithm in order to implement personalized content exploration.
</p><p>
Refer to <a href="../advanced-ranking.html">advanced ranking</a>
for a discussion of usage and examples.
</p>
<table class="table">
<thead></thead><tbody>
<tr>
<th>Field type</th>
<td>Weighted set attribute with fast-search. Note: Also supported for regular attribute or
index fields, but then with much weaker performance).</td>
</tr><tr>
<th>Query model</th>
<td>Weighted set with {token, weight} pairs</td>
</tr><tr>
<th>Matching</th>
<td>Documents where the weighted set field contains at least one of the tokens in the query.</td>
</tr><tr>
<th>Ranking</th>
<td>Dot product score between the weights of the matched query tokens and field tokens.
This score is available using <code>rawScore</code> or <code>itemRawScore</code> rank features.</td>
</tr><tr>
<th style="white-space:nowrap;">Java Query Item</th>
<td><a href="http://javadoc.io/page/com.yahoo.vespa/container-search/latest/com/yahoo/prelude/query/DotProductItem.html">DotProductItem</a></td>
</tr>
</tbody>
</table>
</td></tr>
<tr id="weightedset"><th>weightedSet</th><td>
<p>
When using <em>weightedSet</em> to search a field,
all tokens present in the searched field will be matched against the weighted set in the query.
This means that using a weighted set to search a single-value attribute field
will have similar semantics to using a normal term to search a weighted set field.
The low-level matching information resulting from matching a document with a weighted set in the query
will contain the weights of all the matched tokens in descending order.
Each matched weight will be represented as a standard occurrence on position 0 in element 0.
<pre class="urlunencode" oncopy="">
where%20weightedSet%28description%2C%20%7B%22a%22%3A1%2C%20%22b%22%3A2%7D%29%3B
</pre>
<em>weightedSet</em> has similar semantics to <a href="#equiv">equiv</a>,
as it acts as a single term in the query.
However, the restriction dictating that it contains a collection of weighted tokens directly
enables specific back-end optimizations that improves performance
for large sets of tokens compared to using the generic <a href="#equiv">equiv</a> or <a href="#or">or</a> operators.
</p><p>
Refer to <a href="../advanced-ranking.html">advanced ranking</a>
for a discussion of usage and examples.
</p>
<table class="table">
<thead></thead><tbody>
<tr>
<th>Field type</th>
<td>Singlevalue or <a href="../search-definitions.html#multivalue-fields">multivalue</a>
attribute or index field.
(Note: Most use cases operates on a single value field).</td>
</tr><tr>
<th>Query model</th>
<td>Weighted set with {token, weight} pairs.</td>
</tr><tr>
<th>Matching</th>
<td>Documents where the field contains at least one of the tokens in the query.</td>
</tr><tr>
<th>Ranking</th>
<td>The operator will act as a single term in the back-end.
The query term weight is the weight assigned to the operator itself
and the match weight is the largest weight among matching tokens from the weighted set.
This operator does not produce a raw score.
Due to better ranking and performance we recommend using <a href="#dotproduct">dotProduct</a> instead.</td>
</tr><tr>
<th style="white-space:nowrap;">Java Query Item</th>
<td><a href="http://javadoc.io/page/com.yahoo.vespa/container-search/latest/com/yahoo/prelude/query/WeightedSetItem.html">WeightedSetItem</a></td>
</tr>
</tbody>
</table>
</td></tr>
<tr id="wand"><th>wand</th><td>
<p>
<em>wand</em> can be used to search for documents
where weighted tokens in a field matches a subset of weighted tokens in the query.
At the same time, it internally calculates the dot product between token weights in the query and the field.
<em>wand</em> is guaranteed to return the top-k hits according to its internal dot product rank score.
It is an operator that scales adaptively from <a href="#or">or</a> to <a href="#and">and</a>.
</p><p>
<em>wand</em> optimizes the performance of using multiple threads per search in the backend,
and is also called <em>Parallel Wand</em>.
</p><p>
<em>wand</em> also allows numeric arguments, then the search argument is an array of arrays of length two.
In each pair, the first number is the search term, the second its weight:
<pre class="urlunencode" oncopy="">
where%20wand%28description%2C%20%5B%5B11%2C1%5D%2C%20%5B37%2C2%5D%5D%29%3B
</pre>
Both <em>wand</em> and <a href="#weakand">weakAnd</a> support the annotations <em>scoreThreshold</em>,
which is an double giving the minimum rank score for hits to include, and <em>targetNumHits</em>
which is the wanted number of hits.
By default, set <em>targetNumHits</em> equal to the number of hits to return.
If additional second phase ranking with rerank-count is used,
do not set <em>targetNumHits</em> less than the configured rank-profile's rerank-count.
<pre class="urlunencode" oncopy="">
where%20%5B%20%7B%22scoreThreshold%22%3A%200.13%2C%20%22targetNumHits%22%3A%207%7D%20%5Dwand%28description%2C%20%7B%22a%22%3A1%2C%20%22b%22%3A2%7D%29%3B
</pre>
Refer to <a href="../advanced-ranking.html">advanced ranking</a>
for a discussion of usage and examples.
</p>
<table class="table">
<thead></thead><tbody>
<tr>
<th>Field type</th>
<td>Weighted set attribute with fast-search.
Note: Also supported for regular attribute or index fields,
but then with much weaker performance).</td>
</tr><tr>
<th>Query model</th>
<td>Weighted set with {token, weight} pairs.</td>
</tr><tr>
<th>Matching</th>
<td>Documents where the weighted set field contains at least one of the tokens in the query
and where the internal dot product score for this document,
is larger than the worst among the current top-k best hits.
This means that more than top-k documents are matched and returned for ranking.
It also means that many documents are skipped,
even they match several tokens in the query because the dot product score is too low.
This skipping makes <em>wand</em> faster than <a href="#dotproduct">dotProduct</a> in some cases.
</td>
</tr><tr>
<th>Ranking</th>
<td>Dot product score between the weights of the matched query tokens and field tokens.
This score is available using <code>rawScore</code> or <code>itemRawScore</code> rank features.
Note that the top-k best hits are only guaranteed to be returned
when using this internal score as the final ranking expression.
</td>
</tr><tr>
<th style="white-space:nowrap;">Java Query Item</th>
<td><a href="http://javadoc.io/page/com.yahoo.vespa/container-search/latest/com/yahoo/prelude/query/WandItem.html">WandItem</a></td>
</tr>
</tbody>
</table>
</td></tr>
<tr id="weakand"><th>weakAnd</th><td>
<p>
<em>weakAnd</em> is some times called <em>Vespa Wand</em>.
Unlike <a href="#wand">wand</a>, it accepts arbitrary word matches (across arbitrary fields) as arguments.
Only a limited number of documents are returned for ranking (default is 100),
but it does not guarantee to return the best k hits.
This function can be seen as an optimized <a href="#or">or</a>:
<pre class="urlunencode" oncopy="">
where%20weakAnd%28a%20contains%20%22A%22%2C%20b%20contains%20%22B%22%29%3B
</pre>
Both <a href="#wand">wand</a> and <em>weakAnd</em> support the annotations <em>scoreThreshold</em>,
which is an double giving the minimum rank score for hits to include, and <em>targetNumHits</em>
which is the wanted number of hits:
<pre class="urlunencode" oncopy="">
where%20%5B%7B%22scoreThreshold%22%3A%200.41%2C%20%22targetNumHits%22%3A%207%7D%5DweakAnd%28a%20contains%20%22A%22%2C%20b%20contains%20%22B%22%29%3B
</pre>
Unlike <a href="#wand">wand</a>, <em>weakAnd</em> can be used
to search across several fields of various types,
but it does NOT guarantee to return the top-k best number of hits.
It can however be combined with any ranking expression.
Keep in mind that this expression should correlate with its simple internal ranking score
that uses query term weight and inverse document frequency for matching terms.
</p><p>
Refer to <a href="../advanced-ranking.html">advanced ranking</a>
for a discussion of usage and examples.
</p>
<table class="table">
<thead></thead><tbody>
<tr>
<th>Field type</th>
<td>Multiple fields of all types (both attribute and index).</td>
</tr><tr>
<th>Query model</th>
<td>Arbitrary number of query items searching across different fields.</td>
</tr><tr>
<th>Matching</th>
<td>Documents that matches at least one of the tokens in the query
and where the internal operator score for this document
is larger than the worst among the current top-k best hits.
As with <a href="#wand">wand</a>, this means that typically more than top-k documents are matched
and a lot of documents are skipped.
</td>
</tr><tr>
<th>Ranking</th>
<td>Internal ranking score based on query term weight
and inverse document frequency for matching terms to find the top-k hits.
This score is currently not available to the ranking framework.
Matching terms are exposed to the ranking framework
(same as when using <a href="#and">and</a> or <a href="#or">or</a>),
so an arbitrary ranking expression can be used in combination with this operator.
Note that the ranking expression used should correlate with this internal ranking score.
<code>nativeFieldMatch</code> and <code>nativeDotProduct</code> are good starting points.
</td>
</tr><tr>
<th style="white-space:nowrap;">Java Query Item</th>
<td><a href="http://javadoc.io/page/com.yahoo.vespa/container-search/latest/com/yahoo/prelude/query/WeakAndItem.html">WeakAndItem</a></td>
</tr>
</tbody>
</table>
</td></tr>
<tr id="nonempty"><th>nonEmpty</th><td>
<p>
<em>nonEmpty</em> takes as its only argument an arbitrary search expression.
It will then perform a set of checks on that expression.
If all the checks pass, the result is the same expression, otherwise the query will fail.
The checks are as follows:
<ol>
<li>No empty search term</li>
<li>No empty operators, like phrases without terms</li>
<li>No null markers (NullItem) from e.g. failed query parsing</li>
</ol>
<pre class="urlunencode" oncopy="">
yql=select%20%2A%20from%20sources%20%2A%20where%20bar%20contains%20%22a%22%20and%20nonEmpty%28bar%20contains%20%22bar%22%20and%20foo%20contains%20%40foo%29&amp;foo=
</pre>
Note how "foo" is empty in this case, which will force the query to fail.
If "foo" contained a searchable term, the query would not have failed.
</p>
</td></tr>
<tr id="predicate"><th>predicate</th><td>
<p>
<em>predicate()</em> specifies a predicate query -
see <a href="predicate-fields.html">predicate fields</a>.
It takes three arguments: the predicate field to search, a map of attributes, and a map of range attributes:
<pre class="urlunencode" oncopy="">
where%20predicate(predicate_field%2C%7B%22gender%22%3A%22Female%22%7D%2C%7B%22age%22%3A20L%7D)%3B
</pre>
Due to a quirk in YQL-parsing, one cannot specify an empty map, use the number 0 instead.
<pre class="urlunencode" oncopy="">
where%20predicate(predicate_field%2C0%2C%7B%22age%22%3A20L%7D)%3B
</pre>
</p>
</td></tr>
</tbody>
</table>
</p>
<h2 id="order-by">order by</h2>
<!-- ToDo ref http://localhost:4000/documentation/reference/search-api-reference.html#ranking.sorting -->
<p>
Sort using <code>order by</code>.
Add <code>asc</code> or <code>desc</code> after the name of an
<a href="../attributes.html">attribute</a> to set sort order -
ascending order is default.
<pre class="urlunencode" oncopy="">
where%20title%20contains%20%22madonna%22%20order%20by%20price%20asc%2C%20releasedate%20desc%3B
</pre>
Sorting function, locale and strength are defined using the annotations "function", "locale" and "strength", as in:
<pre class="urlunencode" oncopy="">
where%20title%20contains%20%22madonna%22%20order%20by%20%5B%7B%22function%22%3A%20%22uca%22%2C%20%22locale%22%3A%20%22en_US%22%2C%20%22strength%22%3A%20%22IDENTICAL%22%7D%5Dother%20desc%2C
%20%5B%7B%22function%22%3A%20%22lowercase%22%7D%5Dsomething%3B
</pre>
<strong>Note: </strong> <a href="search-definitions-reference.html#match-phase">match-phase</a>
is enabled when sorting - refer to the <a href="sorting.html">sorting reference</a>.
</p>
<h2 id="limit-offset">limit / offset</h2>
<p>
To specify a slice / limit the number of hits returned / do pagination,
use <code>limit</code> and/or <code>offset</code>:
<pre class="urlunencode" oncopy="">
where%20title%20contains%20%22madonna%22%20limit%2031%20offset%2029%3B
</pre>
The above will return two hits (if there sufficiently many hits matching the query),
skipping the 29 first documents.
</p>
<h2 id="timeout">timeout</h2>
<p>
Set query timeout in milliseconds using <code>timeout</code>:
<pre class="urlunencode" oncopy="">
where%20title%20contains%20%22madonna%22%20timeout%2070%3B
</pre>
Only literal numbers are valid, i.e. setting another unit is not supported.
</p>
<h2 id="annotations">Annotations</h2>
<p>
Terms and phrases can be annotated to manipulate the behavior.
Add an annotation using <code>[]</code>, like:
<pre class="urlunencode" oncopy="">
where%20text%20contains%20%28%5B%20%7B%22distance%22%3A%205%7D%20%5Dnear%28%22a%22%2C%20%22b%22%29%29%3B
</pre>
</p>
<h3>Annotations supported by strings</h3>
<p>
These annotations are supported by the string arguments to functions like
and phrase() and near() and also the string argument to the "contains" operator.
</p>
<table class="table">
<tr><td>"nfkc": true|false</td>
<td>NFKC <a href="../linguistics.html#normalization">normalization</a>. Default on.</td>
</tr>
<tr><td style="white-space:nowrap;">"implicitTransforms": true|false</td>
<td>Implicit term transformations (field defaults), default on.
If implicitTransforms is active, the settings for the field in the search
definition will be honored in term transforms, e.g. if the field has stemming, this term will be stemmed.
If implicitTransforms are turned off,
the search backend will receive the term exactly as written in the initial YQL expression.
This is in other words a top level switch to turn off all other
<a href="../linguistics.html#stemming">stemming</a>, accent removal, Unicode
<a href="../linguistics.html#normalization">normalizations</a> and so on.</td>
</tr>
<tr><td>"annotations": {<br/>
&nbsp;&nbsp;"string": "string"<br/>}</td>
<td>Custom term annotations. This is by default empty.</td>
</tr>
<tr><td>"origin": {<br/>
&nbsp;&nbsp;"original": "string",<br/>
&nbsp;&nbsp;"offset": int,<br/>
&nbsp;&nbsp;"length": int<br/>}</td>
<td>The (sub-)string which produced this term. Default unset.</td>
</tr>
<tr><td>"usePositionData": true|false</td>
<td>Use position data for ranking algorithm. Default true.
This is <em>term</em> position, not to be confused with
<a href="search-api-reference.html#geographical-searches">geo searches</a></td>
</tr>
<tr><td>"stem": true|false</td>
<td>Stem this term if it is the setting for this field, default on.</td>
</tr>
<tr><td>"normalizeCase": true|false</td>
<td>Normalize casing of this term if it is the setting for this field, default on.</td>
</tr>
<tr><td>"accentDrop": true|false</td>
<td>Remove accents from this term if it is the setting for this field, default on.</td>
</tr>
<tr><td>"andSegmenting": true|false</td>
<td>Force phrase or AND operator if re-segmenting (e.g. in stemming) this
term results in multiple terms. Default is choosing from language
settings.</td>
</tr>
<tr><td>"prefix": true|false</td>
<td>Do prefix matching for this word. Default false. ("Search for
"word*".")</td>
</tr>
<tr><td>"suffix": true|false</td>
<td>Do suffix matching for this word. Default false. ("Search for
"*word".")</td>
</tr>
<tr><td>"substring": true|false</td>
<td>Do substring matching for this word if available in the index. Default
false. ("Search for "*word*".") Only supported for streaming
search.</td>
</tr>
</table>
<h3>Annotations supported by strings and functions</h3>
<p>
These annotations are supported by strings and by the functions which
are handled like leaf nodes internally in the query tree:
phrase(), near(), onear(), range(), equiv(), weightedSet(), weakAnd() and wand().
</p>
<table class="table">
<tr><td>"id": int</td>
<td>Unique ID used for e.g. connectivity.</td>
</tr>
<tr><td>"connectivity": {<br/>
&nbsp;&nbsp;"id": int,<br/>
&nbsp;&nbsp;"weight": double<br/>}</td>
<td>Map with the ID and weight of explicitly connectivity of this item.</td>
</tr>
<tr><td>"significance": double</td>
<td>Significance value for ranking.</td>
</tr>
<tr><td>"annotations": {<br/>
&nbsp;&nbsp;"string": "string"<br/>}</td>
<td>Custom annotations. No special semantics inside the YQL layer.</td>
</tr>
<tr><td>"filter": true|false</td>
<td>Regard this term as a "filter" term. Default false.</td>
</tr>
<tr><td>"ranked": true|false</td>
<td>Include this term for ranking calculation. Default true.</td>
</tr>
<tr><td>"label": "string"</td>
<td>Label for referring to this term during ranking.</td>
</tr>
<tr><td>"weight": int</td>
<td>Term weight, used in some ranking calculations.</td>
</tr>
</table>
<h3 id="annotations-of-sub-expressions">Annotations of sub-expressions</h3>
<p>
Consider the following query:
<pre class="urlunencode" oncopy="">
select%20%2A%20from%20sources%20%2A%20where%20%28%5B%7B%22stem%22%3A%20false%7D%5D%28foo%20contains%20%22a%22%20and%20bar%20contains%20%22b%22%29%29
%20or%20foo%20contains%20%28%5B%7B%22stem%22%3A%20false%7D%5D%22c%22%29%3B
</pre>
The "stem" annotation controls whether a given term should be stemmed if its
field is configured as a stemmed field (default is "true").
The "AND" operator itself has no internal API for whether its operands should be stemmed or not,
but we can still annotate as such,
because when the value of a given annotation is determined,
the expression tree is followed from the term in question and up through its ancestors.
Traversing the tree stops when a value is found (or there is nothing more to traverse).
In other words, none of the terms in this example will be stemmed.
</p><p>
How annotations behave may be easier to understand of expressing a boolean query in the style of an S-expression:
<pre>
(AND term1 term2 (OR term3 term4) (OR term5 (AND term6 term7)))
</pre>
The annotation scopes would then be as follows, i.e. annotations on
which elements will be checked when determining the settings for a given term:
<table class="table">
<thead></thead><tbody>
<tr><td>term1</td><td>term1 itself, and the first AND</td></tr>
<tr><td>term2</td><td>term2 itself, and the first AND</td></tr>
<tr><td>term3</td><td>term3 itself, the first OR and the first AND</td></tr>
<tr><td>term4</td><td>term4 itself, the first OR and the first AND</td></tr>
<tr><td>term5</td><td>term5 itself, the second OR and the first AND</td></tr>
<tr><td>term6</td><td>term6 itself, the second AND, the second OR and the first AND</td></tr>
<tr><td>term7</td><td>term7 itself, the second AND, the second OR and the first AND</td></tr>
</tbody>
</table>
</p>
<h2 id="query-properties">Query properties</h2>
<p>
Use YQL variable syntax to initialize words in phrases and as single terms.
This removes the need for caring about quoting a term in YQL, as well as URL quoting.
The term will be used <em>exactly</em> as it is in the URL.
As an example, look at a query with a YQL argument, and the properties
<em>animal</em> and <em>syntaxExample</em>:
<pre class="urlunencode" oncopy="">
yql=select%20%2A%20from%20sources%20%2A%20where%20foo%20contains%20%40animal
%20and%20foo%20contains%20phrase%28%40animal%2C%20%40syntaxExample%2C%20%40animal%29%3B&amp;animal=panda&amp;syntaxExample=syntactic
</pre>
This YQL expression will then access the query properties <em>animal</em> and
<em>syntaxExample</em> and evaluate to:
<pre class="urlunencode" oncopy="">
select%20%2A%20from%20sources%20%2A%20where%20%28foo%20contains%20%22panda%22%20AND%20foo%20contains%20phrase%28%22panda%22%2C%20%22syntactic%22%2C%20%22panda%22%29%29%3B
</pre>
</p>
<h2 id="yql-in-query-profiles">YQL in query profiles</h2>
<p>
YQL requires quoting to be included in a URL.
Since YQL is well suited to application logic, while not being intended for end users,
a solution to this is storing the application's YQL queries into different
<a href="../query-profiles.html">query profiles</a>.
To add a default query profile, add <em>search/query-profiles/default.xml</em> to the
<a href="../cloudconfig/application-packages.html">application package</a>:
<pre>
&lt;query-profile id="default"&gt;
&lt;field name="yql"&gt;select * from sources * where default contains "latest" or userQuery();&lt;/field&gt;
&lt;/query-profile&gt;
</pre>
This will add <em>latest</em> as an <em>OR term</em> to all queries not having an explicit query profile parameter.
The important thing to note is how it is not necessary to URL-quote anything in the query profiles files.
They operate independently of the HTTP parsing as such.
</p>
<h2 id="query-rewriting-in-searchers">Query rewriting in Searchers</h2>
<p>
Searchers which modifies the textual YQL statement (not recommended)
should be annotated with <em>@Before("ExternalYql")</em>.
Searchers modifying query tree produced from an input YQL statement
should annotate with <em>@After("ExternalYql")</em>.
</p>
<h2 id="grouping">Grouping</h2>
<p>
Group / aggregate results by adding a grouping expression after a <code>|</code> -
<a href="../grouping.html">read more</a>.
<pre class="urlunencode" oncopy="">
select%20*%20from%20sources%20*%20where%20sddocname%20contains%20%27purchase%27%20%7C%20all(group(customer)%20each(output(sum(price))))%3B
</pre></p>
<script>
window.onload=init();
</script>
You can’t perform that action at this time.