Permalink
Browse files

Add source for docs

  • Loading branch information...
1 parent 2080e7e commit 11c10199e5d84a4ce2e50f4def6e148e52519363 Toby White committed May 17, 2011
View
130 docs/Makefile
@@ -0,0 +1,130 @@
+# Makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS =
+SPHINXBUILD = sphinx-build
+PAPER =
+BUILDDIR = _build
+
+# Internal variables.
+PAPEROPT_a4 = -D latex_paper_size=a4
+PAPEROPT_letter = -D latex_paper_size=letter
+ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
+
+.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest
+
+help:
+ @echo "Please use \`make <target>' where <target> is one of"
+ @echo " html to make standalone HTML files"
+ @echo " dirhtml to make HTML files named index.html in directories"
+ @echo " singlehtml to make a single large HTML file"
+ @echo " pickle to make pickle files"
+ @echo " json to make JSON files"
+ @echo " htmlhelp to make HTML files and a HTML help project"
+ @echo " qthelp to make HTML files and a qthelp project"
+ @echo " devhelp to make HTML files and a Devhelp project"
+ @echo " epub to make an epub"
+ @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
+ @echo " latexpdf to make LaTeX files and run them through pdflatex"
+ @echo " text to make text files"
+ @echo " man to make manual pages"
+ @echo " changes to make an overview of all changed/added/deprecated items"
+ @echo " linkcheck to check all external links for integrity"
+ @echo " doctest to run all doctests embedded in the documentation (if enabled)"
+
+clean:
+ -rm -rf $(BUILDDIR)/*
+
+html:
+ $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
+ @echo
+ @echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
+
+dirhtml:
+ $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
+ @echo
+ @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
+
+singlehtml:
+ $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
+ @echo
+ @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
+
+pickle:
+ $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
+ @echo
+ @echo "Build finished; now you can process the pickle files."
+
+json:
+ $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
+ @echo
+ @echo "Build finished; now you can process the JSON files."
+
+htmlhelp:
+ $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
+ @echo
+ @echo "Build finished; now you can run HTML Help Workshop with the" \
+ ".hhp project file in $(BUILDDIR)/htmlhelp."
+
+qthelp:
+ $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
+ @echo
+ @echo "Build finished; now you can run "qcollectiongenerator" with the" \
+ ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
+ @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/sunburnt.qhcp"
+ @echo "To view the help file:"
+ @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/sunburnt.qhc"
+
+devhelp:
+ $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
+ @echo
+ @echo "Build finished."
+ @echo "To view the help file:"
+ @echo "# mkdir -p $$HOME/.local/share/devhelp/sunburnt"
+ @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/sunburnt"
+ @echo "# devhelp"
+
+epub:
+ $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
+ @echo
+ @echo "Build finished. The epub file is in $(BUILDDIR)/epub."
+
+latex:
+ $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+ @echo
+ @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
+ @echo "Run \`make' in that directory to run these through (pdf)latex" \
+ "(use \`make latexpdf' here to do that automatically)."
+
+latexpdf:
+ $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+ @echo "Running LaTeX files through pdflatex..."
+ make -C $(BUILDDIR)/latex all-pdf
+ @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
+
+text:
+ $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
+ @echo
+ @echo "Build finished. The text files are in $(BUILDDIR)/text."
+
+man:
+ $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
+ @echo
+ @echo "Build finished. The manual pages are in $(BUILDDIR)/man."
+
+changes:
+ $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
+ @echo
+ @echo "The overview file is in $(BUILDDIR)/changes."
+
+linkcheck:
+ $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
+ @echo
+ @echo "Link check complete; look for any errors in the above output " \
+ "or in $(BUILDDIR)/linkcheck/output.txt."
+
+doctest:
+ $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
+ @echo "Testing of doctests in the sources finished, look at the " \
+ "results in $(BUILDDIR)/doctest/output.txt."
View
15 docs/about.rst
@@ -0,0 +1,15 @@
+.. _about:
+
+About Sunburnt
+==============
+
+Sunburnt is a library to interface with a Solr instance from Python. It was written by Toby White <toby@timetric.com>, originally for use with the Timetric platform (http://timetric.com).
+
+Sunburnt is designed to provide a high level API for
+
+ * querying Solr in a Pythonic way, without having to understand Solr's query syntax in depth, and
+ * inserting Python objects into a Solr index with the minimum of fuss,
+
+and particularly importantly, to provide Python-level error-checking. If you make a mistake, sunburnt will do its best to tell you why, rather than just throwing back an obscure Solr error.
+
+For an overview of the design choices, see http://eaddrinu.se/blog/2010/sunburnt.html.
View
93 docs/addingdocuments.rst
@@ -0,0 +1,93 @@
+.. _addingdocuments:
+
+Adding documents
+================
+
+The easiest way to add data to the sunburnt instance is to do so using a Python dictionary, of exactly the same form as a query result. The dictionary keys are the names of the fields, and the dictionary values are the values of the corresponding fields.
+
+::
+
+ document = {"id":"0553573403",
+ "cat":"book",
+ "name":"A Game of Thrones",
+ "price":7.99,
+ "inStock": True,
+ "author_t":
+ "George R.R. Martin",
+ "series_t":"A Song of Ice and Fire",
+ "sequence_i":1,
+ "genre_s":"fantasy"}
+
+ si.add(document)
+
+You can add lists of dictionaries in the same way. Given the example "books.csv" file, you could feed it to sunburnt like so:
+
+::
+
+ lines = csv.reader(”books.csv”)
+ field_names = lines.next()
+ documents = [dict(zip(field_names, line) for line in lines]
+ si.add(documents)
+ si.commit()
+
+.. note:: Committing changes
+
+ Solr separates out the act of adding documents to the index (with ``add()`` above)
+ and committing them (with ``commit()``). Only after they are committed will they
+ be searchable. However, you can set your Solr instance up to *autocommit* after
+ adding documents, so that you don’t need to do a separate commit step. See
+ http://wiki.apache.org/solr/SolrConfigXml#Update_Handler_Section. For simple Solr
+ instances, this is probably the easiest approach. For heavily used instances, you
+ should think carefully about your committing strategy.
+
+If your data is coming from somewhere else, though, you may not already have it in the
+form of a dictionary. So sunburnt will accept arbitrary Python objects as input to ``add()``.
+To extract the fields, it will inspect the objects for attributes or methods corresponding
+to field names, and use the values of the attributes (or, the result of calling the methods) as values.
+
+So in the case above, we might have an object that looked like this:
+
+::
+
+ class Book(object):
+ name = “A Game of Thrones”
+ author_t = “George R.R. Martin”
+ id = “0553573403”
+ series_t = “A Song of Ice and Fire”
+ sequence_i = 1
+
+ def price(self):
+ return check_current_price(self)
+
+ def inStock(self):
+ return check_stock_levels(self) > 0
+
+
+Adding this to the Solr index is as simple as:
+
+::
+
+ si.add(Book())
+
+(and you can add a list of books in the same way)
+
+This is particularly powerful if you’re using something like Django,
+which provides you with ORM objects - you can drop these ORM objects
+straight into Solr. Given a Django ``Book`` model, you could add the
+whole contents of your database with the single call:
+
+::
+
+ si.add(Book.objects.all())
+
+When adding very large quantities of data, you might have a source
+which is lazily constructed. With Django, you'd really rather construct
+an ORM iterator, and have sunburnt work its way through the iterator
+lazily, in multiple updates, rather than try and construct a single
+huge update POST. You can do this by doing:
+
+::
+
+ si.add(Book.objects.iterator(), chunk=1000)
+
+where ``chunk`` controls how many documents are put into each update chunk.
View
216 docs/conf.py
@@ -0,0 +1,216 @@
+# -*- coding: utf-8 -*-
+#
+# sunburnt documentation build configuration file, created by
+# sphinx-quickstart on Sat Mar 12 15:37:32 2011.
+#
+# This file is execfile()d with the current directory set to its containing dir.
+#
+# Note that not all possible configuration values are present in this
+# autogenerated file.
+#
+# All configuration values have a default; values that are commented out
+# serve to show the default.
+
+import sys, os
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#sys.path.insert(0, os.path.abspath('.'))
+
+# -- General configuration -----------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be extensions
+# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
+extensions = []
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix of source filenames.
+source_suffix = '.rst'
+
+# The encoding of source files.
+#source_encoding = 'utf-8-sig'
+
+# The master toctree document.
+master_doc = 'index'
+
+# General information about the project.
+project = u'sunburnt'
+copyright = u'2011, Toby White'
+
+# The version info for the project you're documenting, acts as replacement for
+# |version| and |release|, also used in various other places throughout the
+# built documents.
+#
+# The short X.Y version.
+version = '0.5'
+# The full version, including alpha/beta/rc tags.
+release = '0.5'
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#language = None
+
+# There are two options for replacing |today|: either, you set today to some
+# non-false value, then it is used:
+#today = ''
+# Else, today_fmt is used as the format for a strftime call.
+#today_fmt = '%B %d, %Y'
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+exclude_patterns = ['_build']
+
+# The reST default role (used for this markup: `text`) to use for all documents.
+#default_role = None
+
+# If true, '()' will be appended to :func: etc. cross-reference text.
+#add_function_parentheses = True
+
+# If true, the current module name will be prepended to all description
+# unit titles (such as .. function::).
+#add_module_names = True
+
+# If true, sectionauthor and moduleauthor directives will be shown in the
+# output. They are ignored by default.
+#show_authors = False
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+# A list of ignored prefixes for module index sorting.
+#modindex_common_prefix = []
+
+
+# -- Options for HTML output ---------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages. See the documentation for
+# a list of builtin themes.
+html_theme = 'default'
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further. For a list of options available for each theme, see the
+# documentation.
+#html_theme_options = {}
+
+# Add any paths that contain custom themes here, relative to this directory.
+#html_theme_path = []
+
+# The name for this set of Sphinx documents. If None, it defaults to
+# "<project> v<release> documentation".
+#html_title = None
+
+# A shorter title for the navigation bar. Default is the same as html_title.
+#html_short_title = None
+
+# The name of an image file (relative to this directory) to place at the top
+# of the sidebar.
+#html_logo = None
+
+# The name of an image file (within the static path) to use as favicon of the
+# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
+# pixels large.
+#html_favicon = None
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
+# using the given strftime format.
+#html_last_updated_fmt = '%b %d, %Y'
+
+# If true, SmartyPants will be used to convert quotes and dashes to
+# typographically correct entities.
+#html_use_smartypants = True
+
+# Custom sidebar templates, maps document names to template names.
+#html_sidebars = {}
+
+# Additional templates that should be rendered to pages, maps page names to
+# template names.
+#html_additional_pages = {}
+
+# If false, no module index is generated.
+#html_domain_indices = True
+
+# If false, no index is generated.
+#html_use_index = True
+
+# If true, the index is split into individual pages for each letter.
+#html_split_index = False
+
+# If true, links to the reST sources are added to the pages.
+#html_show_sourcelink = True
+
+# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
+#html_show_sphinx = True
+
+# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
+#html_show_copyright = True
+
+# If true, an OpenSearch description file will be output, and all pages will
+# contain a <link> tag referring to it. The value of this option must be the
+# base URL from which the finished HTML is served.
+#html_use_opensearch = ''
+
+# This is the file name suffix for HTML files (e.g. ".xhtml").
+#html_file_suffix = None
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'sunburntdoc'
+
+
+# -- Options for LaTeX output --------------------------------------------------
+
+# The paper size ('letter' or 'a4').
+#latex_paper_size = 'letter'
+
+# The font size ('10pt', '11pt' or '12pt').
+#latex_font_size = '10pt'
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title, author, documentclass [howto/manual]).
+latex_documents = [
+ ('index', 'sunburnt.tex', u'sunburnt Documentation',
+ u'Toby White', 'manual'),
+]
+
+# The name of an image file (relative to this directory) to place at the top of
+# the title page.
+#latex_logo = None
+
+# For "manual" documents, if this is true, then toplevel headings are parts,
+# not chapters.
+#latex_use_parts = False
+
+# If true, show page references after internal links.
+#latex_show_pagerefs = False
+
+# If true, show URL addresses after external links.
+#latex_show_urls = False
+
+# Additional stuff for the LaTeX preamble.
+#latex_preamble = ''
+
+# Documents to append as an appendix to all manuals.
+#latex_appendices = []
+
+# If false, no module index is generated.
+#latex_domain_indices = True
+
+
+# -- Options for manual page output --------------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+ ('index', 'sunburnt', u'sunburnt Documentation',
+ [u'Toby White'], 1)
+]
View
101 docs/connectionconfiguration.rst
@@ -0,0 +1,101 @@
+.. _connectionconfiguration:
+
+Configuring a connection
+========================
+
+Whether you're querying or updating a solr server, you need to set up a
+connection to the solr server. Pass the URL of the solr server to a
+SolrInterface object.
+
+::
+
+ solr_interface = sunburnt.SolrInterface("http://localhost:8983/solr/")
+
+If you are using `a multicore setup
+<http://wiki.apache.org/solr/CoreAdmin>` (which is strongly recommended,
+even if you only use a single core), then you need to pass the full URL
+to the core in question.
+
+::
+
+ solr_interface = sunburnt.SolrInterface("http://localhost:8983/solr/master/")
+
+The SolrInterface object can take three additional optional
+parameters.
+
+* ``schemadoc``. By default, sunburnt will query the solr instance for its
+ currently active schema. If you want to use a different schema for
+ any reason, pass in a file object here which yields a schema
+ document.
+
+* ``http_connection``. By default, solr will open a new ``httplib2.Http``
+ object to talk to the solr instance. If you want to re-use an
+ existing connection, or set up your own Http object with different
+ options, etc, then ``http_connection`` can be any object which supports
+ the ``Http.request()`` method. (see :ref:`http-caching`)
+
+* ``mode``. A common solr configuration is to use different cores for
+ writing or reading - they have very different performance
+ characteristics. You can enforce this through sunburnt by setting
+ mode='r' or mode='w'. In either case, sunburnt will throw an
+ exception if you later try to perform the wrong sort of operation on
+ the interface, ie trying to update the index on a read-only core, or
+ trying to run queries on a write-only core. By default, all
+ ``SolrInterface`` objects will be opened read/write.
+
+* ``retry_timeout``. By default, if sunburnt fails to connect to the
+ Solr server, it will fail, throwing a ``socket.error``. If you
+ specify ``retry_timeout`` (as a positive number) then when
+ sunburnt encounters a failure, it will wait ``retry_timeout``
+ seconds before retrying. It will only retry once, and then throw
+ the same ``socket.error`` exception if it fails again. This is
+ useful in case you’re in a context where access to the Solr
+ server might occasionally and briefly disappear, but you don’t want
+ any processes which talk to Solr to fail. For example, if you are
+ in control of the Solr server, and want to restart it to reload its configuration.
+
+.. _http-caching:
+
+HTTP caching
+------------
+
+It's generally a sensible idea to not use the default ``http_connection``,
+which doesn't do any caching. If you're likely to find your program
+making the same requests more than once (because perhaps your users
+make the same common searches), then you should use a caching http
+connection. Solr does very good internal caching of search results, but
+also supports proper HTTP-level caching, and you'll get much better performance
+by taking advantage of that. To do that, set up your interface object
+like so:
+
+::
+
+ solr_url = "http://localhost:8983/solr"
+ h = httplib2.Http(cache="/var/tmp/solr_cache")
+ solr_interface = SolrInterface(url=solr_url, http_connection=h)
+
+
+Schema migrations
+-----------------
+
+Sometimes it's necessary to make changes to your Solr schema. You may
+want to add new fields, or change the configuration of existing
+fields.
+
+There are various ways to approach this. One of the most transparent
+ways is to duplicate an existing core, update its schema offline, and
+then use Solr's multicore commands to change which core
+is exposed. This can be done entirely transparently to any clients
+which are currently connected.
+
+However, the SolrInterface object is set up with a single schema when
+it's initialized (whether by reading the schema from the Solr
+instance, or by the schema being passed in as a parameter). If the
+core is changed to have a different schema, the SolrInterface object
+will not reflect this change until you tell it to re-read the schema:
+
+::
+
+ si = SolrInterface(solr_server)
+ # Elsewhere, restart solr with a different schema
+ si.init_schema()
View
46 docs/deletingdocuments.rst
@@ -0,0 +1,46 @@
+.. _deletingdocuments:
+
+Deleting documents
+==================
+
+You can delete documents individually, or delete all documents resulting frmo a query.
+
+To delete documents individually, you need to pass a list of the documents to
+sunburnt. You can pass them as dictionaries or objects, as for ``add()``. Note
+that in this case, matching will be done by id, not by matching the full document.
+If you pass in a document which is different from that in the index, the indexed
+document with the same id will be deleted, even if all the other attributes are different.
+
+::
+
+ si.delete(obj) # you can pass a single object (or dictionary)
+ si.delete(list_of_objs) # or a list of objects or dictionaries.
+
+You can also simply pass in an id, or list of ids, rather than the whole document
+
+::
+
+ si.delete("0553573403")
+ si.delete(["0553573403", "0553579908"])
+
+To delete documents by query, you construct one or more queries from `Q` objects,
+in the same way that you construct a query as explained in :ref:`optional-terms`.
+You then pass those queries into the ``delete()`` method:
+
+::
+
+ si.delete(queries=si.Q("game")) # or a list of queries
+
+If you need to, you can mix and match individual deletion and deletion by query.
+
+::
+
+ si.delete(docs=list_of_docs, queries=list_of_queries)
+
+To clear the entire index, there is a shortcut which simply deletes every document in the index.
+
+::
+
+ si.delete_all()
+
+Deletions, like additions, only take effect after a commit (or autocommit).
View
44 docs/index.rst
@@ -0,0 +1,44 @@
+.. sunburnt documentation master file, created by
+ sphinx-quickstart on Sat Mar 12 15:37:32 2011.
+ You can adapt this file completely to your liking, but it should at least
+ contain the root `toctree` directive.
+
+Welcome to sunburnt's documentation!
+====================================
+
+Sunburnt is a library to help Python programs interact with `Apache
+Solr <http://lucene.apache.org/solr/>`_.
+
+`Apache Solr <http://lucene.apache.org/solr/>`_ is a search engine.
+
+Support
+-------
+
+For initial help with problems see our `mailing list
+<http://groups.google.com/sunburnt-users>`_. Please file any bugs in
+the `github issue tracker <https://github.com/tow/sunburnt/issues>`_.
+
+Contents:
+
+.. toctree::
+ :maxdepth: 2
+
+ about
+ installation
+ solrbackground
+ connectionconfiguration
+ queryingsolr
+ addingdocuments
+ deletingdocuments
+ indexmanagement
+
+..
+ Indices and tables
+..
+ ==================
+..
+ * :ref:`genindex`
+..
+ * :ref:`modindex`
+..
+ * :ref:`search`
View
42 docs/indexmanagement.rst
@@ -0,0 +1,42 @@
+.. _indexmanagement:
+
+Managing your index
+===================
+
+We mentioned the use of ``commit()`` above.
+There’s a couple of other housekeeping methods that might be useful.
+
+Optimizing
+----------
+
+After updating an index with new data, it becomes fragmented and performance
+suffers. This means that you need to optimize the index. When and how
+often you do this is something you need to decide on a case by case basis.
+If you only add data infrequently, you should optimize after every new update;
+if you trickle in data on a frequent basis, you need to think more about it.
+See http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations.
+
+Either way, to optimize an index, simply call:
+
+::
+
+ si.optimize()
+
+A Solr optimize also performs a commit, so if you’re about to ``optimize()`` anyway,
+you can leave off the preceding ``commit()``. It doesn’t particularly hurt to do both though.
+
+Both ``commit()`` and ``optimize()`` take two optional arguments, which you
+almost never need to worry about. See http://wiki.apache.org/solr/UpdateXmlMessages for details.
+
+::
+
+ wait_flush, wait_searcher
+
+Rollback
+--------
+
+If you haven’t yet added/deleted documents since the last commit, you can issue a rollback to revert the index state to that of the last commit.
+
+::
+
+ si.rollback()
View
76 docs/installation.rst
@@ -0,0 +1,76 @@
+.. _installation:
+
+Installing Sunburnt
+===================
+
+Sunburnt's current release is `0.5`.
+
+You can install sunburnt via pip, you can download a release, or you
+can pull from the git repository.
+
+To use sunburnt, you'll need an Apache Solr installation. Sunburnt
+currently requires at least version 1.4 of Apache Solr.
+
+
+Using pip
+---------
+
+If you have `pip <http://www.pip-installer.org>`_ installed, just type:
+
+::
+
+ pip install sunburnt
+
+If you've got an old version of sunburnt installed, and want to
+upgrade, then type:
+
+::
+
+ pip install -U sunburnt
+
+That's all you need to do; all dependencies will be pulled in automatically.
+
+
+Using a downloaded release
+--------------------------
+
+You can get versions of sunburnt from pypi.
+
+::
+
+ tar xzf http://pypi.python.org/packages/source/s/sunburnt/sunburnt-0.5.tar.gz
+ cd sunburnt-0.5
+ setup.py install
+
+Before using sunburnt, you need to make sure you have `httplib2
+<http://code.google.com/p/httplib2/>`_ and `lxml <http://lxml.de>`_ installed.
+
+
+Using git
+---------
+
+You can install the latest code from github by doing
+
+::
+
+ git clone http://github.com/tow/sunburnt.git
+ cd sunburnt
+ setup.py install
+
+Again, you'll need to have `httplib2
+<http://code.google.com/p/httplib2/>`_ and `lxml <http://lxml.de>`_ installed.
+
+Note that there's no guarantees that the latest git version will be
+particularly stable!
+
+
+Installing and configuring Solr
+===============================
+
+If you're using sunburnt to connect to an existing Solr installation,
+then you won't need further instructions.
+
+Otherwise, the solr wiki contains `helpful instructions on installing and
+configuring Solr
+<http://wiki.apache.org/solr/FrontPage#Installation_and_Configuration>`_,
+and you can set up a simple server by following the `tutorial <http://lucene.apache.org/solr/tutorial.html>`_.
View
170 docs/make.bat
@@ -0,0 +1,170 @@
+@ECHO OFF
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+ set SPHINXBUILD=sphinx-build
+)
+set BUILDDIR=_build
+set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
+if NOT "%PAPER%" == "" (
+ set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
+)
+
+if "%1" == "" goto help
+
+if "%1" == "help" (
+ :help
+ echo.Please use `make ^<target^>` where ^<target^> is one of
+ echo. html to make standalone HTML files
+ echo. dirhtml to make HTML files named index.html in directories
+ echo. singlehtml to make a single large HTML file
+ echo. pickle to make pickle files
+ echo. json to make JSON files
+ echo. htmlhelp to make HTML files and a HTML help project
+ echo. qthelp to make HTML files and a qthelp project
+ echo. devhelp to make HTML files and a Devhelp project
+ echo. epub to make an epub
+ echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter
+ echo. text to make text files
+ echo. man to make manual pages
+ echo. changes to make an overview over all changed/added/deprecated items
+ echo. linkcheck to check all external links for integrity
+ echo. doctest to run all doctests embedded in the documentation if enabled
+ goto end
+)
+
+if "%1" == "clean" (
+ for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
+ del /q /s %BUILDDIR%\*
+ goto end
+)
+
+if "%1" == "html" (
+ %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Build finished. The HTML pages are in %BUILDDIR%/html.
+ goto end
+)
+
+if "%1" == "dirhtml" (
+ %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
+ goto end
+)
+
+if "%1" == "singlehtml" (
+ %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
+ goto end
+)
+
+if "%1" == "pickle" (
+ %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Build finished; now you can process the pickle files.
+ goto end
+)
+
+if "%1" == "json" (
+ %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Build finished; now you can process the JSON files.
+ goto end
+)
+
+if "%1" == "htmlhelp" (
+ %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Build finished; now you can run HTML Help Workshop with the ^
+.hhp project file in %BUILDDIR%/htmlhelp.
+ goto end
+)
+
+if "%1" == "qthelp" (
+ %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Build finished; now you can run "qcollectiongenerator" with the ^
+.qhcp project file in %BUILDDIR%/qthelp, like this:
+ echo.^> qcollectiongenerator %BUILDDIR%\qthelp\sunburnt.qhcp
+ echo.To view the help file:
+ echo.^> assistant -collectionFile %BUILDDIR%\qthelp\sunburnt.ghc
+ goto end
+)
+
+if "%1" == "devhelp" (
+ %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Build finished.
+ goto end
+)
+
+if "%1" == "epub" (
+ %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Build finished. The epub file is in %BUILDDIR%/epub.
+ goto end
+)
+
+if "%1" == "latex" (
+ %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
+ goto end
+)
+
+if "%1" == "text" (
+ %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Build finished. The text files are in %BUILDDIR%/text.
+ goto end
+)
+
+if "%1" == "man" (
+ %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Build finished. The manual pages are in %BUILDDIR%/man.
+ goto end
+)
+
+if "%1" == "changes" (
+ %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.The overview file is in %BUILDDIR%/changes.
+ goto end
+)
+
+if "%1" == "linkcheck" (
+ %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Link check complete; look for any errors in the above output ^
+or in %BUILDDIR%/linkcheck/output.txt.
+ goto end
+)
+
+if "%1" == "doctest" (
+ %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
+ if errorlevel 1 exit /b 1
+ echo.
+ echo.Testing of doctests in the sources finished, look at the ^
+results in %BUILDDIR%/doctest/output.txt.
+ goto end
+)
+
+:end
View
840 docs/queryingsolr.rst
@@ -0,0 +1,840 @@
+.. _queryingsolr:
+
+Querying Solr
+=============
+
+For the examples in this chapter, I'll be assuming that you've
+loaded your server up with the books data supplied with the
+example Solr setup.
+
+The data itself you can see at ``$SOLR_SOURCE_DIR/example/exampledocs/books.csv``.
+To load it into a server running with the example schema:
+
+::
+
+ cd example/exampledocs
+ curl http://localhost:8983/solr/update/csv \
+ --data-binary @books.csv \
+ -H 'Content-type:text/plain; charset=utf-8'
+
+If you're working through this manual tutorial-stylye, you might
+want to keep a copy of the ``books.csv`` file open in an editor
+to check the expected results of some of the queries we'll try.
+
+Throughout the examples, I'll assume you've set up a ``SolrInterface`` object
+pointing at your server, called ``si``.
+
+::
+
+ si = SolrInterface(SOLR_SERVER_URL)
+
+
+Searching your solr instance
+----------------------------
+
+Sunburnt uses a chaining API, and will hopefully look quite familiar
+to anyone who has used the Django ORM.
+
+The ``books.csv`` data uses a schema which looks like this:
+
++----------------+------------+
+| Field | Field Type |
++================+============+
+| ``id`` | string |
++----------------+------------+
+| ``cat`` | string |
++----------------+------------+
+| ``name`` | text |
++----------------+------------+
+| ``price`` | float |
++----------------+------------+
+| ``author_t`` | string |
++----------------+------------+
+| ``series_t`` | text |
++----------------+------------+
+| ``sequence_i`` | integer |
++----------------+------------+
+| ``genre_s`` | string |
++----------------+------------+
+
+and the default search field is a generated field, called "text" which is generated from ``cat`` and ``name``.
+
+.. note:: Dynamic fields.
+
+ The last four fields are named with a suffix. This is because they are dynamic fields - see :doc:`solrbackground`.
+
+A simple search for one word, in the default search field.
+
+::
+
+ si.query("game") # to search for any books with "game" in the title.
+
+Maybe you want to search in the (non-default) field author_t for authors called Martin
+
+::
+
+ si.query(author_t="martin")
+
+Maybe you want to search for books with "game" in their title, by an author called "Martin".
+
+::
+
+ si.query(name="game", author_t="Martin")
+
+Perhaps your initial, default, search is more complex, and has more than one word in it:
+
+::
+
+ si.query(name="game").query(name="thrones")
+
+.. note:: Sunburnt query strings are not solr query strings
+
+ When you do a simple query like ``query("game")``, this is just a query on
+ the default field. It is *not* a solr query string. This means that the
+ following query might not do what you expect:
+
+ ``si.query("game thrones")``
+
+ If you're familiar with solr, you might expect that to return any documents
+ which contain both "game" and "thrones", somewhere in the default field.
+ Actually, it doesn't. This searches for documents containing *exactly* the
+ string "``game thrones``"; the two words next to each other, separated only
+ by whitespace.
+
+ If you want to search for documents containing both strings but you don't
+ care in what order or how close together, then you follow the example
+ above and do ``si.query("game").query("thrones")``. If you want to search
+ for documents that contain ``game`` ``OR`` ``thrones``, then see :ref:`optional-terms`.
+
+
+Since queries are chainable, the name/author query above could also be written
+
+::
+
+ si.query(name="game").query(author_t="Martin")
+
+You can keep on adding more and more queries in this way; the effect is to
+``AND`` all the queries. The results which come back will fulfil all of the
+criteria which are selected. Often it will be simplest to put all the
+queries into the same ``query()`` call, but in a more complex environment,
+it can be useful to partially construct a query in one part of your program,
+then modify it later on in a separate part.
+
+
+Executing queries and interpreting the response
+-----------------------------------------------
+
+Sunburnt is lazy in constructing queries. The examples in the previous section
+don’t actually perform the query - they just create a "query object" with the
+correct parameters. To actually get the results of the query, you’ll need to execute it:
+
+::
+
+ response = si.query("game").execute()
+
+This will return a ``SolrResponse`` object. If you treat this object as a list,
+then each member of the list will be a document, in the form of a Python dictionary
+containing the relevant fields:
+
+For example, if you run the first example query above, you should see a response like this:
+
+::
+
+ >>> for result in si.query("game").execute():
+ ... print result
+
+ {'author_t': u'George R.R. Martin',
+ 'cat': (u'book',),
+ 'genre_s': u'fantasy',
+ 'id': u'0553573403',
+ 'inStock': True,
+ 'name': u'A Game of Thrones',
+ 'price': 7.9900000000000002,
+ 'sequence_i': 1,
+ 'series_t': u'A Song of Ice and Fire'}
+ {'author_t': u'Orson Scott Card',
+ 'cat': (u'book',),
+ 'genre_s': u'scifi',
+ 'id': u'0812550706',
+ 'inStock': True,
+ 'name': u"Ender's Game",
+ 'price': 6.9900000000000002,
+ 'sequence_i': 1,
+ 'series_t': u'Ender'}
+
+Solr has returned two results. Each result is a dictionary, containing all the fields which we initially uploaded.
+
+.. note:: Multivalued fields
+
+ Because ``cat`` is declared in the schema as a multivalued field,
+ sunburnt has returned the ``cat`` field as a tuple of results -
+ albeit in this case both books only have one category assigned to
+ them, so the value of the ``cat`` field is a length-one tuple.
+
+.. note:: Floating-point numbers
+
+ In both cases, although we initially provided the price to two
+ decimal places, Solr stores the answer as a floating point number.
+ When the result comes back, it suffers from the common problem of
+ representing decimal numbers in binary, and the answer looks
+ slightly unexpected.
+
+
+Of course, often you don’t want your results in the form of a dictionary,
+you want an object. Perhaps you have the following class defined in your code:
+
+::
+
+ class Book:
+ def __init__(self, name, author_t, **other_kwargs):
+ self.title = name
+ self.author = author_t
+ self.other_kwargs = other_kwargs
+
+ def __repr__(self):
+ return 'Book("%s", "%s")' % (title, author)
+
+
+You can tell sunburnt to give you ``Book`` instances back by telling ``execute()`` to use the class as a constructor.
+
+::
+
+ >>> for result in si.query(“game”).execute(constructor=Book):
+ ... print result
+
+ Book("A Game of Thrones", "George R.R. Martin")
+ Book("Ender's Game", "Orson Scott Card")
+
+The ``constructor`` argument most often will be a class, but it can be any callable; it will always be called as ``constructor(**response_dict)``.
+
+
+You can extract more information from the response than simply the list of results. The SolrResponse object has the following attributes:
+
+* ``response.status`` : status of query. (If this is not ‘0’, then something went wrong).
+* ``response.QTime`` : how long did the query take in milliseconds.
+* ``response.params`` : the params that were used in the query.
+
+and the results themselves are in the following attributes
+
+* ``response.results`` : the results of your main query.
+* ``response.facet_counts`` : see `Faceting`_ below.
+* ``response.highlighting`` : see `Highlighting`_ below.
+* ``response.more_like_these`` : see `More Like This`_ below.
+
+Finally, ``response.results`` itself has the following attributes
+
+* ``response.results.numFound`` : total number of docs in the index which fulfilled the query.
+* ``response.results.docs`` : the actual results themselves (more easily extracted as ``list(response)``).
+* ``response.results.start`` : if the number of docs is less than numFound, then this is the pagination offset.
+
+
+Pagination
+----------
+
+By default, Solr will only return the first 10 results
+(this is configurable in ``schema.xml``). To get at more
+results, you need to tell solr to paginate further through
+the results. You do this by applying the ``paginate()`` method,
+which takes two parameters, ``start`` and ``rows``:
+
+::
+
+ si.query("black").paginate(start=10, rows=30)
+
+will query for documents containing "black", and then return the
+11th to 40th results. Solr starts counting at 0, so ``start=10``
+will return the 11th result, and ``rows=30`` will return the next 30 results,
+up to the 40th.
+
+
+Returning different fields
+--------------------------
+
+By default, Solr will return all stored fields in the results. You
+might only be interested in a subset of those fields. To restrict
+the fields Solr returns, you apply the ``field_limit()`` method.
+
+::
+
+ si.query("game").field_limit("id") # only return the id of each document
+ si.query("game").field_limit(["id", "name"]) # only return the id and name of each document
+
+You can use the same option to get hold of the relevancy score that Solr
+has calculated for each document in the query:
+
+::
+
+ si.query("game").field_limit(score=True) # Return the score alongside each document
+ si.query("game").field_limit("id", score=True") # return just the id and score.
+
+The results appear just like the normal dictionary responses, but with a different
+selection of fields.
+
+::
+
+ >>> for result in si.query("game").field_limit("id", score=True"):
+ ... print result
+
+ {'score': 1.1931472000000001, 'id': u'0553573403'}
+ {'score': 1.1931472000000001, 'id': u'0812550706'}
+
+
+
+More complex queries
+--------------------
+
+Solr can index not only text fields but numbers, booleans and dates.
+As of version 3.1, it can also index spatial points (though sunburnt
+does not yet have support for spatial queries). This means you can
+refine your textual searches by also querying on associated numbers,
+booleans or dates
+
+In our books example, there are two numerical fields - the ``price``
+(which is a float) and ``sequence_i`` (which is an integer).
+Numerical fields can be queried:
+
+* exactly
+* by comparison (``<`` / ``<=`` / ``>=`` / ``>``)
+* by range (between two values)
+
+Exact queries
+.............
+
+Don’t try and query floats exactly unless you really know what you’re doing (http://download.oracle.com/docs/cd/E19957-01/806-3568/ncg_goldberg.html). Solr will let you, but you almost certainly don’t want to. Querying integers exactly is fine though.
+
+::
+
+ si.query(sequence_i=1) # query for all books which are first in their sequence.
+
+Comparison queries
+..................
+
+These use a new syntax:
+
+::
+
+ si.query(price__lt=7) # notice the double-underscore separating “price” from “lt”.
+
+will search for all books whose price is less than 7 (dollars,
+I guess - the example leaves currency unspecified!). You can do similar searches
+on any float or integer field, and you can use:
+
+* ``gt`` : greater than, ``>``
+* ``gte`` : greater than or equal to, ``>=``
+* ``lt`` : less than, ``<``
+* ``lte`` : less than or equal to, ``<=``
+
+
+Range queries
+.............
+
+As an extension of a comparison query, you can query for values that are within a
+range, ie between two different numbers.
+
+::
+
+ si.query(price__range=(5, 7)) # Search for all books with prices between $5 and $7.
+
+This range query is *inclusive* - it will return prices of books which are priced at
+exactly $5 or exactly $7. You can also make an *exclusive* search:
+
+::
+
+ si.query(price__rangeexc=(5, 7))
+
+which will exclude books priced at exactly $5 or $7.
+
+Finally, you can also do a completely open range search:
+
+::
+
+ si.query(price__any=True)
+
+will search for a book which has *any* price. Why would you do this? Well, if
+you had a schema where price was optional, then this search would return all
+books which had a price - and exclude any books which didn’t have a price.
+
+
+Date queries
+............
+
+You can query on dates the same way as you can query on numbers: exactly, by comparison,
+or by range. The example books data doesn’t include any date fields, so we’ll look at
+the example hardware data, which includes a ``manufacturedate_dt`` field.
+
+Be warned, though, that exact searching on date suffers from similar problems to exact
+searching on floating point numbers. Solr stores all dates to microsecond precision;
+exact searching will fail unless the date requested is also correct to microsecond precision.
+
+::
+
+ si.query(manufacturedate_dt=datetime.datetime(2006, 02, 13))
+
+will search for items whose manufacture date is *exactly* zero microseconds after
+midnight on the 13th February, 2006.
+
+More likely you’ll want to search by comparison or by range:
+
+::
+
+ # all items manufactured on or after the 1st January 2006
+ si.query(manufacturedate_dt__gt=datetime.datetime(2006))
+
+ # all items manufactured in Q1 2006.
+ si.query(manufacturedate_dt__range=(datetime.datetime(2006, 1), datetime.datetime(2006, 4))
+
+The argument to a date query can be any object that looks roughly like
+a Python ``datetime`` object (so ``mx.DateTime`` objects will also work),
+or a string in W3C Datetime notation (http://www.w3.org/TR/NOTE-datetime)
+
+::
+
+ si.query(manufacturedate_dt__gte="2006")
+ si.query(manufacturedate_dt__lt="2009-04-13")
+ si.query(manufacturedate_dt__range=("2010-03-04 00:34:21", "2011-02-17 09:21:44"))
+
+All of the above queries will work as you expect - bearing in mind that solr will
+still be working to microsecond precision. The first query above will return all
+results later than, or on, exactly zero microseconds after midnight, 1st January, 2006.
+
+
+Boolean fields
+..............
+
+Boolean fields are flags on a document. In the example hardware specs, documents
+carry an ``inStock`` field. We can select on that by doing:
+
+::
+
+ si.query("Samsung", inStock=True) # all Samsung hardware which is in stock
+
+
+Sorting results
+---------------
+
+Unless told otherwise, Solr will return results in “relevancy” order. How
+Solr determines relevancy is a complex question, and can depend highly on
+your specific setup. However, it’s possible to override this and sort query
+results by another field. This field must be sortable, so most likely you’d
+use a numerical or date field.
+
+::
+
+ si.query("game").sort_by("price") # Sort by ascending price
+ si.query("game").sort_by("-price") # Sort by descending price (because of the minus sign)
+
+You can also sort on multiple factors:
+
+::
+
+ si.query("game").sort_by("-price").sort_by("score")
+
+This query will sort first by descending price, and then by increasing "score" (which is what solr calls relevancy).
+
+
+Excluding results from queries
+------------------------------
+
+In the examples above, we’ve only considered narrowing our search with positive
+requirements. What if we want to *exclude* results by some criteria?
+Returning to the books data again, we can exclude all
+Lloyd Alexander books by doing:
+
+::
+
+ si.exclude(author_t="Lloyd Alexander")
+
+``exclude()`` methods chain in the same way as ``query()`` methodms, so you can mix and match:
+
+::
+
+ si.query(price__gt=7).exclude(author_t="Lloyd Alexander")
+ # return all books costing more than $7, except for those authored by Lloyd Alexander.
+
+
+.. _optional-terms:
+
+Optional terms and combining queries
+------------------------------------
+
+Sunburnt queries can be chained together in all sorts of ways, with
+query and exclude terms being applied. So far, you’ve only seen
+examples which have compulsory terms, either positive (``query()``)
+or negative(``exclude()``). What if you want to have *optional* terms?
+
+The syntax for this is a little uglier. Let’s imagine we want books
+which *either* have the word "game" *or* the word "black" in their titles.
+
+What we do is construct two *query objects*, one for each condition, and ``OR`` them together.
+
+::
+
+ si.query(si.Q("game") | si.Q("black"))
+
+The ``Q`` object can contain an arbitrary query, and can then be combined using
+Boolean logic (here, using ``|``, the OR operator). The result can then be
+passed to a normal ``si.query()`` call for execution.
+
+``Q`` objects can be combined using any of the Boolean operators, so
+also ``&`` (``AND``) and ``~`` (``NOT``), and can be nested within each
+other. You’re unlikely to care about this unless you are constructing queries
+programmatically, but it’s possible to express arbitrarily complex queries in this way.
+
+A moderately complex query could be written:
+
+::
+
+ si.query(si.Q(si.Q("game") & ~si.Q(author_t="orson")) \
+ | si.Q(si.Q("black" & ~si.Q(author_t="lloyd")))
+
+which will return all results which fulfil the criteria:
+
+* Either (books with "game" in the title which are not by authors called "orson")
+* Or (books with "black" in the title which are not by authors called "lloyd")
+
+
+Wildcard searching
+------------------
+
+Sometimes you want to search for partial matches for a word. Depending on how
+your Solr schema does stemming, this may be done automatically for you. For
+example, in the example schema, if you search for "parse", then documents
+containing "parsing" will also be returned, because Solr will reduce both
+the search term and the term in the document to their stem, which is "pars".
+
+However, sometimes you need to do partial matches that Solr doesn’t know
+about. You can use asterisks and question marks in the normal way, except
+that you may not use leading wildcards - ie no wildcards at the beginning
+of a term.
+
+Using the books example again:
+
+::
+
+ si.query(name="thr*")
+
+will search for all books which have a word beginning with “Thr” in their title. (So it will return "A Game of Thrones" and "The Book of Three").
+
+::
+
+ si.query(name="b*k")
+ # will return "The Black Company", "The Book of Three" and "The Black Cauldron"
+
+The results of a wildcard search are highly dependent on your Solr configuration, and in
+particular depend on what text analysis it performs. You may find you need to lowercase
+your search term even if the original document was mixed cased, because Solr has
+lowercased the document before indexing it. (We have done this here).
+
+If, for some reason, you want to search exactly for a string with an asterisk or a question mark in it then you need to tell Solr to special case it:
+
+::
+
+ si.query(id=RawString(“055323933?*”))
+
+This will search for a document whose id contains *exactly* the string given,
+including the question mark and asterisk. (Since there isn't one in our index,
+that will return no results.)
+
+
+Filter queries and caching
+--------------------------
+
+Solr implements several internal caching layers, and to some extent you can
+control when and how they're used. (This is separate from the :ref:`http-caching` layer).
+
+Often, you find that you can partition your query; one part is run many times
+without change, or with very limited change, and another part varies much more.
+(See http://wiki.apache.org/solr/FilterQueryGuidance for more guidance.)
+
+You can get Solr to cache the infrequently-varying part of the query by use
+of the FilterCache. For example, in the books case, you might provide standard
+functionality to filter results by various price ranges: less than $7.50, or greater
+than $7.50. This portion of your search will be run identically for nearly
+every query, while the main textual part of the query varies lots.
+
+If you separate out these two parts to the query, you can mark the price query
+as being cacheable, by doing a *filter query* instead of a normal query for
+that part of the search.
+
+If you taking search input from the user, you would write:
+
+::
+
+ si.query(name=user_input).filter(price__lt=7.5)
+ si.query(name=user_input).filter(price__gte=7.5)
+
+The ``filter()`` method has the same functionality as the ``query()``
+method, in terms of datatypes and query types. However, it also
+tells Solr to separate out that part of the query and cache the
+results. In this case, Solr will precompute the price portion of
+the query and cache the results, so that as the user-driven queries
+vary, Solr only has to perform in full the unique portion of the
+query, the name query, and the price filter can be applied much more rapidly.
+
+You can filter any sort of query, simply by using ``filter()`` instead
+of ``query()``. And if your filtering involves an exclusion, then ``filter_exclude()``
+has the same functionality as ``exclude()``.
+
+::
+
+ si.query(title="black").filter_exclude(author_t="lloyd")
+ # Might be useful if a substantial portion of your users hate authors called “Lloyd”.
+
+If it’s useful, you can mix and match ``query()`` and ``filter()`` calls as much as
+you like while chaining. The resulting filter queries will be combined
+and cached together.
+
+::
+
+ si.query(...).filter(...).exclude(...).filter_exclude(...)
+
+and the argument to a ``filter()`` or ``filter_exclude()`` call can be a
+Boolean combination of ``si.Q`` objects.
+
+
+Query boosting
+--------------
+
+Solr provides a mechanism for "boosting" results according to the values
+of various fields (See http://wiki.apache.org/solr/SolrRelevancyCookbook#Boosting_Ranking_Terms
+for a full explanation). This is only useful where you're doing a search with optional terms,
+and you want to specify that some of these terms are more important than others.
+
+For example, imagine you are searching for books which either have "black" in the title, or
+have an author named "lloyd". Let’s say that although either will do, you care more about
+the author than the title. You can express this in sunburnt by raising a ``Q`` object to
+a power equivalent to the boost you want.
+
+::
+
+ si.query(si.Q("black") | si.Q(author_t="lloyd")**3)
+
+This boosts the importance of the author field by 3. The number is a fairly arbitrary
+parameter, and it’s something of a black art to choose the relevant value.
+
+A more common pattern is that you want all books with "black" in the title *and you have
+a preference for those authored by Lloyd Alexander*. This is different from the last query;
+the last query would return books by Lloyd Alexander which did not have "black" in the
+title. Achieving this in solr is possible, but a little awkward; sunburnt provides a
+shortcut for this pattern.
+
+::
+
+ si.query("black").boost_relevancy(3, author_t="lloyd")
+
+This is fully chainable, and ``boost_relevancy`` can take an arbitrary
+collection of query objects.
+
+
+Faceting
+--------
+
+For background, see http://wiki.apache.org/solr/SimpleFacetParameters.
+
+Sunburnt lets you apply faceting to any query, with the ``facet_by()`` method, chainable
+on a query object. The ``facet_by()`` method needs, at least, a field (or list of fields) to
+facet on:
+
+::
+
+ facet_query = si.query("game").facet_by("sequence_i").paginate(rows=0)
+
+The above fragment will search for game with "thrones" in the title,
+and facet the results according to the value of ``sequence_i``. It
+will also return zero results, just the facet output.
+
+::
+
+ >>> print facet_query.execute().facet_counts.facet_fields
+
+ {'sequence_i': [('1', 2), ('2', 0), ('3', 0)]}
+
+The ``facet_counts`` objects contains several sets of results - here, we're only
+interested in the ``facet_fields`` object. This contains a dictionary of results,
+keyed by each field where faceting was requested. (In this case, we only requested
+faceting on one field). The dictionary value is a list of two-tuples, mapping the
+value of the faceted field (in this case, ``sequence_i`` takes the values '1', '2', or '3')
+to the numbers of results for each value.
+
+You can read the above result as saying: 'of all the books which have "game" in their
+title, 2 of them have ``sequence_i=1``, 0 of them have ``sequence_i=2``, and 0 of them have
+``sequence_i=3``'.
+
+You can facet on more than one field at a time:
+
+::
+
+ si.query(...).facet_by(fields=["field1", "field2, ...])
+
+and the ``facet_fields`` dictionary will have more than one key.
+
+Solr supports a number of parameters to the faceting operation. All of the basic options
+are exposed through sunburnt:
+
+::
+
+ fields, prefix, sort, limit, offset, mincount, missing, method, enum.cache.minDf
+
+All of these can be used as keyword arguments to the ``facet()`` call, except of course the
+last one since it contains periods. To pass keyword arguments with periods in them, you
+can use `**` syntax:
+
+::
+
+ facet(**{"enum.cache.minDf":25})
+
+You can also facet on the result of one or more queries, using the ``facet_query()`` method. For example:
+
+::
+
+ >>> fquery = si.query("game").facet_query(price__lt=7).facet_query(price__gte=7)
+ >>> print fquery.execute().facet_counts.facet_queries
+
+ [('price:[7.0 TO *]', 1), ('price:{* TO 7.0}', 1)]
+
+This will facet the results according to the two queries specified, so you can see
+how many of the results cost less than $7, and how many cost more.
+
+The results come back this time in the ``facet_queries`` object, but have the same form as before.
+The facets are shown as a list of tuples, mapping query to number of results. You can read
+the above as saying '*of the results, 1 of them fulfilled the first facet-query (price greater than 7) and
+1 of them fulfilled the second query-facet (price less than 7)*'.
+
+.. note:: Other types of facet
+
+ Currently, faceting by date and range are not currently supported (but some of their functionality can be replicated by using ``facet_query()``). Nor are LocalParams or pivot faceting.
+
+
+Highlighting
+------------
+
+For background, see http://wiki.apache.org/solr/HighlightingParameters.
+
+Alongside the normal search results, you can ask solr to return fragments of
+the documents, with relevant search terms highlighted. You do this with the
+chainable ``highlight()`` method. By default this will highlight values in
+the default search field. In our books example, the default search field is
+a generated field, not returned in the results, so we’ll need to explicitly
+specify which field we would like to see highlighted:
+
+::
+
+ >>> highlight_query = si.query("game").highlight("name")
+ >>> print highlight_query.execute().highlighting
+
+ {'0553573403': {'name': ['A <em>Game</em> of Thrones']},
+ '0812550706': {'name': ["Ender's <em>Game</em>"]}}
+
+The highlighting results live in the ``highlighting`` attribute on the SolrResponse object.
+The results are shown as a dictionary of dictionaries. The top-level key is the ID
+(or ``uniqueKey``) of each document returned. For each document, you then have a dictionary
+mapping field names to fragments of highlighted text. In this case we only asked for
+highlighting on the ``name`` field. Multiple fragments might be returned for each field,
+though in this case we only get one fragment each. The text is highlighted with HTML, and
+the fragments should be suitable for dropping straight into a search template.
+
+Again, Solr supports a large number of options to the highlighting command,
+and all of these are exposed through sunburnt. The full list of supported options is:
+
+::
+
+ fields, snippets, fragsize, mergeContinuous, requireFieldMatch, maxAnalyzedChars,
+ alternateField, maxAlternateFieldLength, formatter, simple.pre.simple.post,
+ fragmenter, usePhrasehighlighter, hilightMultiTerm, regex.slop, regex.pattern,
+ regex.maxAnalyzedChars
+
+See the note above in `Faceting`_ about using keyword arguments with periods.
+
+
+More Like This
+--------------
+
+For background, see http://wiki.apache.org/solr/MoreLikeThis. Alongside a set of
+search results, Solr can suggest other documents that
+are similar to each of the documents in the search result.
+
+.. note:: Query handlers
+
+ Sunburnt only supports ``MoreLikeThis`` through the ``StandardQueryHandler``,
+ not through the separate ``MoreLikeThisHandler``. That is, it only supports
+ more-like-this searches on documents that are already in its index.
+
+More-like-this searches are accomplished with the ``mlt()`` chainable
+option. You need to tell solr which fields to consider when deciding
+similarity.
+
+::
+
+ >>> mlt_query = si.query(id="0553573403").mlt("name", mintd=1, mindf=1)
+ >>> mlt_results = mlt_query.execute().more_like_these
+ >>> print mlt_results
+
+ {'0553573403': <sunburnt.schema.SolrResult object at 0x4b10510>}
+
+ >>> print mlt_results['0553573403'].docs
+
+ [{'author_t': u'Orson Scott Card',
+ 'cat': (u'book',),
+ 'genre_s': u'scifi',
+ 'id': u'0812550706',
+ 'inStock': True,
+ 'name': u"Ender's Game",
+ 'price': 6.9900000000000002,
+ 'sequence_i': 1,
+ 'series_t': u'Ender'}]
+
+Here we used ``mlt()`` options to alter the default behaviour (because our
+corpus is so small that Solr wouldn't find any similar documents with the
+standard behaviour.
+
+The ``SolrResponse`` object has a ``more_like_these`` attribute. This is
+a dictionary of ``SolrResult`` objects, one dictionary entry for each
+result of the main query. Here, the query only produced one result (because
+we searched on the ``uniqueKey``. Inspecting the ``SolrResult`` object, we
+find that it contains only one document.
+
+We can read the above result as saying that under the ``mlt()`` parameters
+requested, there was only one document similar to the search result.
+
+In this case, only one document was returned by the original query, In this
+case, there is a shortcut attribute: ``more_like_this`` instead of
+``more_like_these``.
+
+::
+
+ >>> print mlt_query.execute().more_like_this.docs
+
+ [{'author_t': u'Orson Scott Card',
+ ...
+
+to avoid having to do the extra dictionary lookup.
+
+``mlt()`` also takes a list of options (see the Solr documentation for a full explanation);
+
+::
+
+ fields, count, mintf, mindf, minwl, mawl, maxqt, maxntp, boost
+
+
+Spatial fields
+--------------
+
+From version 3.1 of Solr, spatial field-types are supported in the schema. This means
+you can have fields on a document representing (latitude, longitude) pairs.
+(Indeed, you can have fields representing points in an arbitrary number of dimensions.)
+
+Although sunburnt deals correctly storage and retrieval of such fields, currently
+no querying is supported beyond exact matching (including spatial querying).
+
+sunburnt expects spatial fields to be supplied as iterables of length
+two, and will always return them as two-tuples.
+
+
+Binary fields
+-------------
+
+From version 3.1 of Solr, fields for binary data are supported in the schema. In
+Solr these are stored as base64-encoded blobs, but as a sunburnt user you don’t
+have to care about this. Sunburnt will automatically transcode to and from base64
+as appropriate, and your results will contain a binary string where appropriate.
+(Querying on Binary Fields is not supported, and doesn’t make much sense anyway).
View
82 docs/solrbackground.rst
@@ -0,0 +1,82 @@
+.. _Solrbackground:
+
+Reading a Solr Schema
+=====================
+
+This is not the place for a full description of a Solr schema,
+but you need to understand certain concepts to use sunburnt.
+
+For a better understanding of what’s going on, start with
+http://wiki.apache.org/Solr/SchemaXml and http://wiki.apache.org/solr/SchemaDesign.
+
+The examples in this documentation can be run against the example
+data and schema, though you will need to understand the concepts
+below.
+
+You can find the example schema at "``$SOLR_SOURCE_DIR/example/solr/conf/schema.xml``".
+
+* documents
+
+ A Solr index lets you search over multiple documents. Each document is composed
+ of multiple fields, each field having a fieldtype. The list of available fieldtypes
+ and fields defines what a document is for your purposes, and this is specified
+ in the Solr ``schema.xml``.
+
+* fieldtypes
+
+ A schema will define several fieldtypes, which for sunburnt's purposes
+ are roughly equivalent to data types - you can have booleans,
+ numbers (of various precisions), dates, and strings. (As of Solr 3.1, you can also have geographical
+ points and blobs). An important distinction should be made between
+
+ - *strings*, which need not contain human-readable words, and where
+ searching will mostly be exact; and
+
+ - *text*, which largely will contain human-readable words, and where
+ searching will usually be fuzzier.
+
+ Most of Solr’s cleverness is in making sense of text fields.
+
+* fields
+
+ A document schema consists of defining a number of fields, each of
+ which has a name, a fieldtype, plus several options. In contrast to a
+ traditional RDBMS schema, most fields in a document schema will be
+ optional. Fields also may be *indexed* (ie, you can query on their
+ contents) and/or *stored* (ie, when a document is returned from a
+ search, a stored field will be part of the result). Fields can be
+ any combination of these - eg you can have stored fields that aren’t
+ queryable, or queryable fields which won’t be returned in the result.
+
+ Although the latter seems pointless, it’s very often used because
+ you can have generated fields; fields that don’t exist in the
+ original documents, but are useful for querying. Often you might
+ have a default text field, composed of the title, subtitle, and
+ contents of a document. You want to search on the combined field,
+ but you don’t want to return it in the results - results should
+ only have the fields available on the original document.
+
+ - *multivalued*
+
+ Fields can also be *multivalued*. A common pattern might be giving
+ tags to a document. One document can have many tags, so the tags
+ field is multivalued. When you query on tags, all the tags will be
+ searched, and when the document is returned, all the tags will be in the result.
+
+ - *default*
+
+ A schema will usually define a *default* field for the document. This is the
+ field which will be searched on if no other field is specified.
+
+ - *uniqueKey*
+
+ A schema will also usually define a *uniqueKey* - this acts as an ID
+ field for the document. If this is defined, then every document in
+ the index must have a unique value for this field.
+
+ - *dynamic*
+
+ A schema can define *dynamic* fields. These don't have a set name,
+ instead they are called, for example "``\*_i``". This means that when
+ Solr encounters a document which has any field ending in "``_i``", it
+ will use the fieldtype associated with the "``\*_i``" field.

0 comments on commit 11c1019

Please sign in to comment.