Skip to content

Commit

Permalink
Added further documentation about SearchIndex/``RealTimeSearchInd…
Browse files Browse the repository at this point in the history
…ex``.
  • Loading branch information
toastdriven committed Nov 25, 2009
1 parent 2edbdb2 commit 93e01d0
Show file tree
Hide file tree
Showing 2 changed files with 51 additions and 0 deletions.
37 changes: 37 additions & 0 deletions docs/searchindex_api.rst
Expand Up @@ -140,6 +140,43 @@ And finally, in ``search/search.html``::
{% endfor %}


Keeping The Index Fresh
=======================

There are several approaches to keeping the search index in sync with your
database. None are more correct than the others and depending the traffic you
see, the churn rate of your data and what concerns are important to you
(CPU load, how recent, et cetera).

The conventional method is to use ``SearchIndex`` in combination with cron
jobs. Running a ``./manage.py update_index`` every couple hours will keep your
data in sync within that timeframe and will handle the updates in a very
efficient batch. Additionally, Whoosh (and to a lesser extent Xapian) behave
better when using this approach.

Another option is to use ``RealTimeSearchIndex``, which uses Django's signals
to immediately update the index any time a model is saved/deleted. This
yields a much more current search index at the expense of being fairly
inefficient. Solr is the only backend that handles this well under load, and
even then, you should make sure you have the server capacity to spare.

A third option is to develop a custom ``QueueSearchIndex`` that, much like
``RealTimeSearchIndex``, uses Django's signals to enqueue messages for
updates/deletes. Then writing a management command to consume these messages
in batches, yielding a nice compromise between the previous two options.

.. note::

Haystack doesn't ship with a ``QueueSearchIndex`` largely because there is
such a diversity of lightweight queuing options and that they tend to
polarize developers. Queuing is outside of Haystack's goals (provide good,
powerful search) and, as such, is left to the developer.

Additionally, the implementation is relatively trivial in that you simply
extend the same four methods as ``RealTimeSearchIndex`` and simply add
messages to the queue of choice.


Advanced Data Preparation
=========================

Expand Down
14 changes: 14 additions & 0 deletions docs/tutorial.rst
Expand Up @@ -308,6 +308,20 @@ command to make this process easy.
Simply run ``./manage.py rebuild_index``. You'll get some totals of how many
models were processed and placed in the index.

.. note::

Using the standard ``SearchIndex``, your search index content is only
updated whenever you run either ``./manage.py update_index`` or start
afresh with ``./manage.py rebuild_index``.

You should cron up a ``./manage.py update_index`` job at whatever interval
works best for your site (using ``--age=<num_hours>`` reduces the number of
things to update).

Alternatively, if you have low traffic and/or your search engine can handle
it, the ``RealTimeSearchIndex`` automatically handles updates/deletes
for you.


Complete!
=========
Expand Down

0 comments on commit 93e01d0

Please sign in to comment.