Skip to content

Commit

Permalink
Update for changes in 1.4
Browse files Browse the repository at this point in the history
Should be current for 1.4.1
  • Loading branch information
ojwb committed Nov 1, 2016
1 parent 6f04a75 commit 7ee829d
Show file tree
Hide file tree
Showing 9 changed files with 121 additions and 26 deletions.
4 changes: 2 additions & 2 deletions advanced/replication.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ modifications are made.
Backend Support
===============

Replication is supported by the chert, and brass database backends,
Replication is supported by the chert and glass database backends,
and can cleanly handle the
master switching database type (a full copy is sent in this situation). It
doesn't make a lot of sense to support replication for the remote backend.
Expand Down Expand Up @@ -151,7 +151,7 @@ switched atomically after a database copy has occurred. The
this situation, so ends up attempting to read the old database which has been
deleted.

We intend to fix this issue in the Brass backend (currently under development)
We intend to fix this issue in the future
by eliminating this hidden use of a stub database file.

Alternative approaches
Expand Down
4 changes: 2 additions & 2 deletions advanced/scalability.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ typically mean that you only need to cache a few percent of the database
to eliminate most disk cache misses).

It also means that reducing the database size is usually a win. The
Chert backend compresses the information in the tables in ways which
backend compresses the information in the tables in ways which
work well given the nature of the data but aren't too expensive to
unpack (e.g. lists of sorted docids are stored as differences with
smaller values encoded in fewer bytes). There is further potential for
Expand Down Expand Up @@ -79,7 +79,7 @@ documents should expire from the index.
Size Limits in Xapian
=====================

The chert backend (which is currently the default and recommended
The glass backend (which is currently the default and recommended
backend) stores the indexes in several files containing Btree tables. If
you're indexing with positional information (for phrase searching) the
term positions table is usually the largest.
Expand Down
8 changes: 4 additions & 4 deletions concepts/indexing/databases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Backends
Xapian databases store data in custom formats which allow searches to be
performed extremely quickly; Xapian does not use a relational database as
its datastore. There are several database backends; the main backend in
the 1.2 release series of Xapian is called the *Chert* backend. This
the 1.4 release series of Xapian is called the *Glass* backend. This
stores information in the filesystem (under a given path).

It is possible to perform searches across multiple databases at once, and
Expand All @@ -42,14 +42,14 @@ remote databases.
On-disk databases
-----------------

As mentioned, Xapian 1.2 has a default database type called *Chert*;
As mentioned, Xapian 1.4 has a default database type called *Glass*;
:ref:`earlier formats can be upgraded using Xapian's copydatabase utility
<upgrading-databases>`. When opening an existing database, Xapian will
automatically figure out the backend to use.

If you're
familiar with data storage structures, you might be interested to know that
Chert and Brass both use a copy-on-write B+-tree structure - but don't worry
both Chert and Glass use a copy-on-write B+-tree structure - but don't worry
if that doesn't mean anything to you!

Stub database files
Expand Down Expand Up @@ -77,7 +77,7 @@ Xapian has an *inmemory* database type, which may be useful for testing and
perhaps some short-term usage. However it is inefficient, and does not support
all of Xapian's features (such as spelling correction, synonyms or replication),
so for production systems it is often better to use an on-disk database such
as *Chert*, with the files stored in a RAM disk.
as *Glass*, with the files stored in a RAM disk.

Remote databases and replication
--------------------------------
Expand Down
21 changes: 13 additions & 8 deletions concepts/indexing/limitations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ do, but it's worth knowing about them.
Term length
-----------

Terms are limited to 245 bytes in length (at least with the "chert"
backend), but each zero byte in a term is currently internally encoded as
two bytes, so the limit is less for a term which contains zero bytes.
Terms are limited to 245 bytes in length (at least with the "glass" and
"chert" backends), but each zero byte in a term is currently internally encoded
as two bytes, so the limit is less for a term which contains zero bytes.
It's rarely useful to have longer terms, but one situation where it can be
is if you're using something like a URL as an ID term; `there is some
discussion of this as one of our FAQs
Expand All @@ -37,12 +37,17 @@ document values longer than a few tens of bytes, as reading multiple
Document ID
-----------

Document IDs are (currently) 32-bit which limits you to 2\ :sup:`32`-1
(nearly 4.3 billion) documents in a database. Document IDs for deleted
documents aren't reused for when automatically assigning a new document ID,
so this limit also includes documents you've deleted. You can effectively
Document IDs are (currently) 32-bit by default which limits you to
2\ :sup:`32`-1 (nearly 4.3 billion) documents in a database. Document IDs for
deleted documents aren't reused for when automatically assigning a new document
ID, so this limit also includes documents you've deleted. You can effectively
reclaim such no-longer-used document IDs by compacting the database.

If you configure xapian-core with `--enable-64bit-docid` then 64-bit docids
will be used instead. You may well also want to make termcounts 64-bit
with `--enabl-64bit-termcount`. Note that these options change type sizes and
hence the ABI of the library.

B-tree block number
-------------------

Expand All @@ -68,7 +73,7 @@ allows files and filesystems up to 16EB (figures from Wikipedia).

Total document length limit
---------------------------
Chert stores the total length (i.e. number of terms) of all the documents
Glass stores the total length (i.e. number of terms) of all the documents
in a database so it can calculate the average document length. This is
currently stored as an unsigned 64-bit quantity so you're almost certain
to hit another limit first.
7 changes: 4 additions & 3 deletions concepts/indexing/uniqueness.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,14 @@ Xapian index. There are two ways of approaching this.

One is to use a one-to-one mapping between your identifiers and Xapian
docids. This will work if your identifiers are positive integers and they
all fit within 32 bits (under about 4 billion).
all fit within 32 bits (under about 4 billion), or if they are 64-bit
and you configure xapian-core with `--enable-64bit-docid`.

The other is to use a special term containing your identifier, which will
work for any type of identifier. Typically you will prefix this (by
convention with 'Q') to avoid collisions with other terms. Terms have a
limited length (245 bytes in chert), so if your unique identifiers are
really long you'll need to do something more complicated.
limited length (245 bytes in glass and chert), so if your unique identifiers
are really long you'll need to do something more complicated.

For more information on both techniques, `see our FAQ on this`_.

Expand Down
4 changes: 2 additions & 2 deletions conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,9 @@
# built documents.
#
# The short X.Y version.
version = '1.2'
version = '1.4'
# The full version, including alpha/beta/rc tags.
release = '1.2.19'
release = '1.4.1'

# General information about the project.
_project = u'Getting Started with Xapian'
Expand Down
2 changes: 1 addition & 1 deletion howtos/spelling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ search would be prohibitively expensive for many uses.
Backend Support
---------------

Currently spelling correction is supported for chert, and brass databases. It
Currently spelling correction is supported for chert and glass databases. It
works with a single database or multiple databases (use
:xapian-method:`Database::add_database()` as usual). We've no plans to support
it for the InMemory backend, but we do intend to support it for the remote
Expand Down
2 changes: 1 addition & 1 deletion howtos/synonyms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ yet though.
Backend Support
---------------

Currently synonyms are supported by the chert and brass databases. They work
Currently synonyms are supported by the chert and glass databases. They work
with a single database or multiple databases (use
:xapian-method:`Database::add_database()` as usual). We've no plans to support
them for the InMemory backend, but we do intend to support them for the remote
Expand Down
95 changes: 92 additions & 3 deletions language_specific/python/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,57 @@ The lazy evaluation is mainly transparent, but does become visible in one situat
- Document.termlist (also accessible as Document.__iter__): **termfreq** and **positer**
- Database.postlist: **positer**

In older releases, the pythonic iterators returned lists representing the
appropriate item when their ``next()`` method was called. These were
removed in Xapian 1.1.0.


Non-Pythonic Iterators
######################

Before the pythonic iterator wrappers were added, the python bindings provided
thin wrappers around the C++ iterators. However, these iterators don't behave
like most iterators do in Python, so the pythonic iterators were implemented to
replace them. The non-pythonic iterators were removed in Xapian 1.3.0 -
the documentation below is provided to aid migration away from them.

All non-pythonic iterators support ``next()`` and
``equals()`` methods
to move through and test iterators (as for all language bindings).
MSetIterator and ESetIterator also support ``prev()``.
Python-wrapped iterators also support direct comparison, so something like:

::

m=mset.begin()
while m!=mset.end():
# do something
m.next()

C++ iterators are often dereferenced to get information, eg
``(*it)``. With Python these are all mapped to named methods, as
follows:

+------------------+----------------------+
| Iterator | Dereferencing method |
+==================+======================+
| PositionIterator | ``get_termpos()`` |
+------------------+----------------------+
| PostingIterator | ``get_docid()`` |
+------------------+----------------------+
| TermIterator | ``get_term()`` |
+------------------+----------------------+
| ValueIterator | ``get_value()`` |
+------------------+----------------------+
| MSetIterator | ``get_docid()`` |
+------------------+----------------------+
| ESetIterator | ``get_term()`` |
+------------------+----------------------+


Other methods, such as ``MSetIterator.get_document()``, are
available unchanged.

MSet
####

Expand All @@ -138,11 +189,50 @@ work using the C++ array dereferencing):
| ``get_docid(index)`` | ``get_hit(index).get_docid()`` |
+------------------------------------+----------------------------------------+

Additionally, the MSet has a property, ``mset.items``, which returns a
list of tuples representing the MSet. This is now deprecated - please use the
property API instead (it works in Xapian 1.0.x too). The tuple members and the
equivalent property names are as follows:


+-------------------------+---------------+---------------------------------------------------------------------------+
| Index | Property name | Contents |
+=========================+===============+===========================================================================+
| ``xapian.MSET_DID`` | docid | Document id |
+-------------------------+---------------+---------------------------------------------------------------------------+
| ``xapian.MSET_WT`` | weight | Weight |
+-------------------------+---------------+---------------------------------------------------------------------------+
| ``xapian.MSET_RANK`` | rank | Rank |
+-------------------------+---------------+---------------------------------------------------------------------------+
| ``xapian.MSET_PERCENT`` | percent | Percentage weight |
+-------------------------+---------------+---------------------------------------------------------------------------+
| ``xapian.MSET_DOCUMENT``| document | Document object (Note: this member of the tuple was never actually set!) |
+-------------------------+---------------+---------------------------------------------------------------------------+


Two MSet objects are equal if they have the same number and maximum possible
number of members, and if every document member of the first MSet exists at the
same index in the second MSet, with the same weight.


ESet
####

The ESet has a property, ``eset.items``, which returns a list of
tuples representing the ESet. This is now deprecated - please use the
property API instead (it works in Xapian 1.0.x too). The tuple members and the
equivalent property names are as follows:


+------------------------+---------------+-----------+
| Index | Property name | Contents |
+========================+===============+===========+
| ``xapian.ESET_TNAME`` | term | Term name |
+------------------------+---------------+-----------+
| ``xapian.ESET_WT`` | weight | Weight |
+------------------------+---------------+-----------+


Non-Class Functions
###################

Expand All @@ -155,9 +245,8 @@ wrapped like so for Python:
- ``Xapian::minor_version()`` is wrapped as ``xapian.minor_version()``
- ``Xapian::revision()`` is wrapped as ``xapian.revision()``
- ``Xapian::Auto::open_stub()`` is wrapped as ``xapian.open_stub()`` (now deprecated)
- ``Xapian::Brass::open()`` is wrapped as ``xapian.brass_open()`` (now deprecated)
- ``Xapian::Chert::open()`` is wrapped as ``xapian.chert_open()`` (now deprecated)
- ``Xapian::InMemory::open()`` is wrapped as ``xapian.inmemory_open()``
- ``Xapian::InMemory::open()`` is wrapped as ``xapian.inmemory_open()`` (now deprecated)
- ``Xapian::Remote::open()`` is wrapped as ``xapian.remote_open()`` (both the TCP and "program" versions are wrapped - the SWIG wrapper checks the parameter list to decide which to call).
- ``Xapian::Remote::open_writable()`` is wrapped as ``xapian.remote_open_writable()`` (both the TCP and "program" versions are wrapped - the SWIG wrapper checks the parameter list to decide which to call).

Expand All @@ -181,7 +270,7 @@ a mixture of terms and queries if you wish. For example:
MatchAll and MatchNothing
-------------------------

These are wrapped as ``xapian.Query.MatchAll`` and
As of 1.1.1, these are wrapped as ``xapian.Query.MatchAll`` and
``xapian.Query.MatchNothing``.


Expand Down

0 comments on commit 7ee829d

Please sign in to comment.