Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
246 lines (225 sloc) 9.48 KB
---
# Copyright 2017 Yahoo Holdings. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
title: "Sorting Reference"
---
<p>
A sorting specification in a query consists of one or more
sorting expressions. Each sorting expression is an optional sort order
followed by an attribute name or a function over an attribute.
Multiple expressions are separated by a single SPACE character.
</p><p>
Using more than one sort expression will give you multilevel
sorting. In this case, the most significant expression is the
first, while subsequent expressions detail sorting within groups
of equal values for the previous expression.
<pre>
Sorting ::= SortExpr ( ' ' SortExpr )*
SortExpr ::= [SortOrder] SortObject
SortObject ::= SortAttribute | Function
Function ::= LOWERCASE '(' SortAttribute ')' |
RAW '(' SortAttribute ')' |
UCA '(' SortAttribute [ ',' Locale [ ',' Strength] ] ')'
LOWERCASE ::= 'lowercase'
UCA ::= 'uca'
RAW ::= 'raw'
Locale ::= An identifier following <a href="http://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers">unicode locale identifiers,</a> fx 'en_US'.
Strength :: 'PRIMARY' | 'SECONDARY' | 'TERTIARY' | 'QUATERNARY' | 'IDENTICAL'
SortOrder ::= '+' | '-'
SortAttribute ::= ID /* A valid attribute name */
</pre>
</p><p>
Find sorting examples in the
<a href="../tutorials/blog-search.html#sorting-and-grouping">Blog search tutorial</a>.
See <a href="../geo-search.html">Geo search</a> for sorting by distance.
Refer to <a href="../query-language.html#order-by">
YQL Vespa reference</a> for how to set sorting parameters in YQL.
</p>
<h2 id="sort-order">Sort order</h2>
<p>
For sort order, <code>+</code> denotes ascending sorting order,
while <code>-</code> gives descending order.
</p><p>
Ascending order is defined as lowest values first for numerical
attributes. Strings are sorted according to the sort function chosen.
Descending order is the reverse of ascending order.
</p><p>
If +/- is omitted, the default will be used, either the system wide default of <code>+</code>
or any override in <a href="search-definitions-reference.html#sorting">searchdefinition</a>.
Also note that when composing the query URL, <code>+</code> has to be encoded as %2B.
For consistency, <code>-</code> can be encoded as %2D.
</p>
<h3 id="default-sort-order">Default sorting order</h3>
<p>
Default sort order is <code>+</code> or ascending, except for the special builtin <code>[rank]</code>
which has <code>-</code> or descending.
</p>
<h2 id="sort-attribute">Sort attribute</h2>
<p>
The sorting attribute in a sort expression is the name of an
attribute in the index. Attribute names will often be the same
as field names. In the search definition, an attribute is
indicated by the indexing language fragment for a field having
an <a href="search-definitions-reference.html#attribute">attribute</a>
statement.
</p><p>
When sorting on attributes, it is recommended to use the built
in <em>unranked</em> rank-profile, this allows the search kernel
to execute the query significantly faster then execution the
ranking framework for many hits and then finally ignore this
score and sort by the specified sorting specification.
</p>
<h2 id="sort-function">Sort function</h2>
<p>
It is possible to specify how sorting should be done.
The default depends on configuration of attributes and
language, if specified in query.
Fallback when no sorting could be done is using the order that
the documents were indexed in (on the backend search node).
</p><p>
You can specify that you want to sort on raw binary values,
or a simple and cheap lowercase, but usually the more expensive,
but linguistically correct,
<a href="http://www.unicode.org/unicode/reports/tr10/">UCA sorting</a>
is used.
Default sort function for strings is <code>uca</code>, unless overridden in
<a href="search-definitions-reference.html#sorting">searchdefinition</a>,
or given explicitly in sortspec.
Numeric fields are numerically sorted.
</p>
<table class="table">
<thead></thead><tbody>
<tr>
<th style="width:150px" id="raw">Raw byteorder</th>
<td>
Here a simple and fast ordering based on memcmp of utf8 for strings
and correct sort order compliant binary rep for other fields is done.
However that is not correct for anything except computers,
looking only at the binary representation.
</td>
</tr><tr>
<th id="lowercase">Lowercase</th>
<td>
This improves the sorting by first lowercasing and normalising the strings before sorting.
This is slightly more correct and might be enough for what you want.
It is not that much more costly than raw sort.
</td>
</tr><tr>
<th id="uca">UCA</th>
<td><p>
This sorting is based on the <a href="http://site.icu-project.org/">icu</a> library
that follows the <a href="http://www.unicode.org/unicode/reports/tr10/">
Universal Collation Algorithm</a>.
The specification of
<a href="http://icu-project.org/apiref/icu4j/com/ibm/icu/util/ULocale.html#ULocale-java.lang.String-">locale</a>
and <a href="http://icu-project.org/apiref/icu4j/com/ibm/icu/text/Collator.html">strength</a>
are identical to how <a href="http://site.icu-project.org/">icu</a> specifies them.
The default is strength <code>PRIMARY</code>
which only sorts on primary differentiating characteristics;
this means that letters in uppercase/lowercase
or with differences in accents only are considered equal.
</p><p>
Note that both locale and strength are optional,
but that if you need to set strength, you must also specify locale:
</p>
<table class="table">
<thead></thead><tbody>
<tr>
<th style="width:200px">Default sort locale</th>
<td>
Locale is default derived from query unless overridden in
<a href="search-definitions-reference.html#sorting">searchdefinition</a>,
or given explicitly in sortspec.
Note that if neither the query sortspec, nor the searchdefinition,
nor the query itself specifies a language or locale
then UCA sorting won't be used by default anymore;
in this case the fallback is to use the lowercase function instead.
</td>
</tr><tr>
<th>Default sort strength</th>
<td>
Strength is <code>PRIMARY</code> unless overridden in
<a href="search-definitions-reference.html#sorting">searchdefinition</a>,
or given explicitly in sortspec.
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<h2 id="special-sorting-attributes">Special sorting attributes</h2>
<p>
Three special attributes are available for sorting in addition to the index specific attributes:
<table class="table">
<thead>
</thead><tbody>
<tr><th>[relevance]</th>
<td>
The document's relevance score for this query.
This is the same as the default ordering when no sort specification
is given ([rank] is a legacy alias for the same thing).
</td>
</tr><tr><th>[source]</th>
<td>
The document's source name. This is only relevant when querying multiple sources.
</td>
</tr><tr><th>[docid]</th>
<td>
The document's identification in the search backend.
This will typically give you the documents in indexing order.
<strong>Keep in mind that this id is unique only to the backend node</strong>.
The same document might have different id on a different node.
The same way a different document might have the same id on another node.
This is just intended as a cheap way of getting an almost stable sort order.
</td>
</tr>
</tbody>
</table>
These special attributes are most useful as secondary sort expressions in a multilevel sort.
This will allow you to sort groups of equal values for the primary expression
in either relevancy or indexing order.
Without this additional sort expression, the order within each equal group is not deterministic.
</p>
<h2 id="limitations">Limitations</h2>
<p>
<strong>Note that it is only possible to sort on attributes</strong>.
Trying to sort on a plain field, without an associated attribute, will not work.
Trying to sort on a multivalued attribute will also not work; the sort expression will be ignored.
</p><p>
Also note that <a href="search-definitions-reference.html#match-phase">match-phase</a>
is enabled when sorting.
</p>
<h2 id="examples">Examples</h2>
<p>
Sort by surname in ascending order:
<pre>+surname</pre>
</p>
<p>
Sort by surname in ascending order after lowercasing the surname:
<pre>+lowercase(surname)</pre>
</p>
<p>
Sort by surname in ascending English US locale collation order.
<pre>+uca(surname,en_US)</pre>
</p>
<p>
Sort by surname in ascending Norwegian 'Bokm&aring;l' locale collation order:
<pre>+uca(surname,nb_NO)</pre>
</p>
<p>
Sort by surname in ascending Norwegian 'Bokm&aring;l' locale collation order,
but more attributes of a character are used to distinguish.
Now it is case sensitive and 'aa' and 'å' are different:
<pre>+uca(surname,nb_NO,TERTIARY)</pre>
</p>
<p>
Sort by surname, with the youngest ones first when the names are equal:
<pre>+surname -yearofbirth</pre>
</p>
<p>
Sort in ascending order birthyear groups,
and sort by relevancy within each group of equal values
with the highest rank first:
<pre>+yearofbirth -[rank]</pre>
</p>