Added further documentation about SearchIndex/``RealTimeSearchInd…

…ex``.
toastdriven · Nov 25, 2009 · 93e01d0 · 93e01d0
1 parent 2edbdb2
commit 93e01d0
Show file tree

Hide file tree

Showing 2 changed files with 51 additions and 0 deletions.
diff --git a/docs/searchindex_api.rst b/docs/searchindex_api.rst
@@ -140,6 +140,43 @@ And finally, in ``search/search.html``::
     {% endfor %}
 
 
+Keeping The Index Fresh
+=======================
+
+There are several approaches to keeping the search index in sync with your
+database. None are more correct than the others and depending the traffic you
+see, the churn rate of your data and what concerns are important to you
+(CPU load, how recent, et cetera).
+
+The conventional method is to use ``SearchIndex`` in combination with cron
+jobs. Running a ``./manage.py update_index`` every couple hours will keep your
+data in sync within that timeframe and will handle the updates in a very
+efficient batch. Additionally, Whoosh (and to a lesser extent Xapian) behave
+better when using this approach.
+
+Another option is to use ``RealTimeSearchIndex``, which uses Django's signals
+to immediately update the index any time a model is saved/deleted. This
+yields a much more current search index at the expense of being fairly
+inefficient. Solr is the only backend that handles this well under load, and
+even then, you should make sure you have the server capacity to spare.
+
+A third option is to develop a custom ``QueueSearchIndex`` that, much like
+``RealTimeSearchIndex``, uses Django's signals to enqueue messages for
+updates/deletes. Then writing a management command to consume these messages
+in batches, yielding a nice compromise between the previous two options.
+
+.. note::
+
+    Haystack doesn't ship with a ``QueueSearchIndex`` largely because there is
+    such a diversity of lightweight queuing options and that they tend to
+    polarize developers. Queuing is outside of Haystack's goals (provide good,
+    powerful search) and, as such, is left to the developer.
+
+    Additionally, the implementation is relatively trivial in that you simply
+    extend the same four methods as ``RealTimeSearchIndex`` and simply add
+    messages to the queue of choice.
+
+
 Advanced Data Preparation
 =========================
 

diff --git a/docs/tutorial.rst b/docs/tutorial.rst
@@ -308,6 +308,20 @@ command to make this process easy.
 Simply run ``./manage.py rebuild_index``. You'll get some totals of how many
 models were processed and placed in the index.
 
+.. note::
+
+    Using the standard ``SearchIndex``, your search index content is only
+    updated whenever you run either ``./manage.py update_index`` or start
+    afresh with ``./manage.py rebuild_index``.
+
+    You should cron up a ``./manage.py update_index`` job at whatever interval
+    works best for your site (using ``--age=<num_hours>`` reduces the number of
+    things to update).
+
+    Alternatively, if you have low traffic and/or your search engine can handle
+    it, the ``RealTimeSearchIndex`` automatically handles updates/deletes
+    for you.
+
 
 Complete!
 =========