Skip to content

Commit

Permalink
ValueRangeProcessor -> RangeProcessor
Browse files Browse the repository at this point in the history
  • Loading branch information
ojwb committed Oct 28, 2016
1 parent 94b21e7 commit 656a419
Showing 1 changed file with 21 additions and 21 deletions.
42 changes: 21 additions & 21 deletions howtos/range_queries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ queries using these operators to restrict matches suitably.
Since we want to expose this functionality generally to users, we want
them to be able to type in a query that will include one or more range
restrictions; the QueryParser contains support for doing this, using
*value range processors*, subclasses of :xapian-class:`ValueRangeProcessor`.
*range processors*, subclasses of :xapian-class:`RangeProcessor`.
Xapian comes with some standard ones itself, or you can write your own.

Since document values are stored as strings in Xapian, and the
Expand Down Expand Up @@ -87,9 +87,9 @@ Searching with ranges
---------------------

All we need to do once we've got the document values in place is to
tell the QueryParser about them. The simplest value range processor is
:xapian-class:`StringValueRangeProcessor`, but here we need two
:xapian-class:`NumberValueRangeProcessor` instances.
tell the QueryParser about them. The simplest range processor is
:xapian-class:`RangeProcessor` itself, but here we need two
:xapian-class:`NumberRangeProcessor` instances.

To distinguish between the two different ranges, we'll require that
dimensions must be specified with the suffix 'mm', but years are just
Expand Down Expand Up @@ -140,7 +140,7 @@ prefixes.
If you get the rules wrong, the QueryParser will raise a
`QueryParserError`, which in production code you could catch and
either signal to the user or perhaps try the query again without the
`ValueRangeProcessor` that tripped up:
`RangeProcessor` that tripped up:

.. xapianrunexample:: search_ranges
:args: db 1000mm..
Expand All @@ -152,7 +152,7 @@ Handling dates

To restrict to a date range, we need to decide how to both store the
date in a document value, and how we want users to input the date
range in their query. :xapian-class:`DateValueRangeProcessor`, which is part of
range in their query. :xapian-class:`DateRangeProcessor`, which is part of
Xapian, works by storing the date as a string in the form 'YYYYMMDD',
and can take dates in either US style (month/day/year) or European
style (day/month/year).
Expand Down Expand Up @@ -190,11 +190,11 @@ give to the QueryParser.
.. xapianexample:: search_ranges2
:marker: date example code

The :xapian-class:`DateValueRangeProcessor` is working on value slot 2, with an
The :xapian-class:`DateRangeProcessor` is working on value slot 2, with an
"epoch" of 1860 (so two digit years will be considered as starting at
1860 and going forward as far 1959). The second parameter is whether
it should prefer US style dates or not; since we're looking at US
states, we've gone for US dates. The :xapian-class:`NumberValueRangeProcessor`
states, we've gone for US dates. The :xapian-class:`NumberRangeProcessor`
is as we saw before, which means that it can't cope with two digit years.

This enables us to search for any state that talks about the Spanish
Expand All @@ -208,28 +208,28 @@ or for all states admitted in the 19th century:
.. xapianrunexample:: search_ranges2
:args: statesdb 1800..1899

That uses the :xapian-class:`NumberValueRangeProcessor` on value slot 1, as in
That uses the :xapian-class:`NumberRangeProcessor` on value slot 1, as in
our previous example. Let's be more specific and ask for only those
between November 8th 1889, when Montana became part of the Union, and
July 10th 1890, when Wyoming joined:

.. xapianrunexample:: search_ranges2
:args: statesdb 11/08/1889..07/10/1890

That uses the :xapian-class:`DateValueRangeProcessor` on value slot 2; it can't
That uses the :xapian-class:`DateRangeProcessor` on value slot 2; it can't
cope with year ranges, which is why we indexed to both slots 1 and 2.

Writing your own ValueRangeProcessor
------------------------------------
Writing your own RangeProcessor
-------------------------------

We haven't yet done anything with population. What we want is
something that behaves like :xapian-class:`NumberValueRangeProcessor`, but knows
something that behaves like :xapian-class:`NumberRangeProcessor`, but knows
what reason possible values are. If we insert it *before* the
:xapian-class:`NumberValueRangeProcessor` on slot 1 (year), it can pick up
:xapian-class:`NumberRangeProcessor` on slot 1 (year), it can pick up
anything that should be treated as a population, and let everything else be
treated as a year range.

To do this, we need to know how a :xapian-class:`ValueRangeProcessor` gets
To do this, we need to know how a :xapian-class:`RangeProcessor` gets
called by the QueryParser. What happens is that each processor in turn is
passed the start and end of the range. If it doesn't understand the range, it
should return :xapian-constant:`BAD_VALUENO`. If it *does* understand
Expand All @@ -238,7 +238,7 @@ the range, it should return the value number to use with
modify the start and end values (to convert them to the correct format for
the string comparison which :xapian-constant:`OP_VALUE_RANGE` uses).

What we're going to do is to write a custom :xapian-class:`ValueRangeProcessor`
What we're going to do is to write a custom :xapian-class:`RangeProcessor`
that accepts numbers in the range 500,000 to 50,000,000; these can't
possibly be years in our data set, and encompass the full range of
populations. If either number is outside that range, we will return
Expand All @@ -253,11 +253,11 @@ range in the query string; either but not both can be the empty
string, which indicates an open-ended range. In python this method
should return a tuple of the value slot and the two strings modified
so they can be used for :xapian-just-constant:`OP_VALUE_RANGE`. Rather than
re-implement :xapian-class:`NumberValueRangeProcessor`, we wrap it to do the
re-implement :xapian-class:`NumberRangeProcessor`, we wrap it to do the
serialisation (due to the way python interacts with the API it's currently not
possible to subclass it successfully here).

Value range processors are called in the order they're added, so our
Range processors are called in the order they're added, so our
custom one gets a chance to look at all ranges, but will only 'claim'
ranges which use integer numbers within the 500 thousand to 50 million
range.
Expand All @@ -277,14 +277,14 @@ With a little more work, we could support ranges such as '..5m' to
mean up to 5 million, or '..750k' for up to 750 thousand.

Similarly, it would be possible to use the same approach to create a custom
:xapian-class:`ValueRangeProcessor` that could restrict to a range of years, and
cope with two digit years, as our :xapian-class:`DateValueRangeProcessor` did for
:xapian-class:`RangeProcessor` that could restrict to a range of years, and
cope with two digit years, as our :xapian-class:`DateRangeProcessor` did for
full dates.

Performance limitations
-----------------------

Without other terms in a query, a :xapian-class:`ValueRangeProcessor` can cause
Without other terms in a query, a :xapian-class:`RangeProcessor` can cause
a value operation to be performed across the whole database, which means
loading all the values in a given slot. On a small database, this
isn't a problem, but for a large one it can have performance
Expand Down

0 comments on commit 656a419

Please sign in to comment.