Skip to content

Commit

Permalink
documentation on synonyms
Browse files Browse the repository at this point in the history
  • Loading branch information
parthg committed Nov 1, 2016
1 parent 3d4d6dd commit ebec589
Show file tree
Hide file tree
Showing 4 changed files with 133 additions and 10 deletions.
61 changes: 61 additions & 0 deletions code/python/search_synonyms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/usr/bin/env python

import json
import logging
import sys
import xapian
import support

### Start of example code.
def search(dbpath, querystring, offset=0, pagesize=10):
# offset - defines starting point within result set
# pagesize - defines number of records to retrieve

# Open the database we're going to search.
db = xapian.WritableDatabase(dbpath)

# Start of adding synonyms
db.add_synonym("time", "calendar")
# End of adding synonyms

# Set up a QueryParser with a stemmer and suitable prefixes
queryparser = xapian.QueryParser()
queryparser.set_stemmer(xapian.Stem("en"))
queryparser.set_stemming_strategy(queryparser.STEM_SOME)
# Start of prefix configuration.
queryparser.add_prefix("title", "S")
queryparser.add_prefix("description", "XD")
# End of prefix configuration.

# Start of set database
queryparser.set_database(db)
# End of set database

# And parse the query
query = queryparser.parse_query(querystring, queryparser.FLAG_SYNONYM)

# Use an Enquire object on the database to run the query
enquire = xapian.Enquire(db)
enquire.set_query(query)

# And print out something about each match
matches = []
for match in enquire.get_mset(offset, pagesize):
fields = json.loads(match.document.get_data())
print u"%(rank)i: #%(docid)3.3i %(title)s" % {
'rank': match.rank + 1,
'docid': match.docid,
'title': fields.get('TITLE', u''),
}
matches.append(match.docid)

# Finally, make sure we log the query and displayed results
support.log_matches(querystring, offset, pagesize, matches)
### End of example code.

if len(sys.argv) < 3:
print "Usage: %s DBPATH QUERYTERM..." % sys.argv[0]
sys.exit(1)

logging.basicConfig(level=logging.INFO)
search(dbpath = sys.argv[1], querystring = " ".join(sys.argv[2:]))
10 changes: 10 additions & 0 deletions code/python/search_synonyms.py.db_time.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
1: #065 Electric time piece with hands but without dial (no pendulum
2: #058 The "Empire" clock, to show the time at various longitudes,
3: #041 Frequency and time measuring instrument type TSA3436 by Venn
4: #056 Single sandglass in 4 pillared wood mount, running time 15 1
5: #043 Loughborough-Hayes automatic timing apparatus. Used by the R
6: #011 "Timetrunk" by Hines and Co., Glasgow (a sandglass for timin
7: #016 Copy of the gearing of the Byzantine sundial-calendar (1983-
8: #045 Master clock of the "Silent Electric" type made by the Magne
9: #018 Solar/Sidereal verge watch with epicyclic maintaining power
INFO:xapian.search:'time'[0:10] = 65 58 41 56 43 11 16 45 18
11 changes: 11 additions & 0 deletions code/python/search_synonyms.py.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
1: #016 Copy of the gearing of the Byzantine sundial-calendar (1983-
2: #072 German Perpetual Calendar in gilt metal
3: #065 Electric time piece with hands but without dial (no pendulum
4: #068 Ornate brass Perpetual Calendar
5: #058 The "Empire" clock, to show the time at various longitudes,
6: #041 Frequency and time measuring instrument type TSA3436 by Venn
7: #056 Single sandglass in 4 pillared wood mount, running time 15 1
8: #043 Loughborough-Hayes automatic timing apparatus. Used by the R
9: #026 Sundial and compass with perpetual calendar and lunar circles
10: #036 Universal 'Tri-Compax' chronographic wrist watch
INFO:xapian.search:'~time'[0:10] = 16 72 65 68 58 41 56 43 26 36
61 changes: 51 additions & 10 deletions howtos/synonyms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,35 @@ synonym operator (``~``).
.. note::
Xapian doesn't offer automated generation of the synonym dictionary.

.. todo:: write an example using one of our data sets.
Here is an example of search program with synonym functionality.

.. xapianexample:: search_synonyms

You can see the search results without `~` operator.

.. xapianrunexample:: index1
:silent:
:args: data/100-objects-v1.csv db

.. xapianrunexample:: delete1
:silent:
:args: db 1953-448 1985-438

.. xapianrunexample:: search_synonyms
:args: db time

Notice the difference with the `~` operator with `time` were `calendar` is specified as its synonym.

.. xapianrunexample:: index1
:silent:
:args: data/100-objects-v1.csv db

.. xapianrunexample:: delete1
:silent:
:args: db 1953-448 1985-438

.. xapianrunexample:: search_synonyms
:args: db ~time

Model
=====
Expand All @@ -27,13 +55,23 @@ terms can have one or more synonym terms. A group of consecutive terms is
specified in the dictionary by simply joining them with a single space between
each one.

.. todo:: Discuss interactions with stemming (ie, should the input and/or
output values in the synonym table be stemmed).
If a term to be synonym expanded will be stemmed by the QueryParser, then
synonyms will be checked for the unstemmed form first, and then for the stemmed
form, so you can provide different synonyms for particular unstemmed forms
if you want to.

.. todo:: Discuss interactions with stemming (ie, should the input and/or output values in the synonym table be stemmed).

Adding Synonyms
===============

.. todo:: Document this!
The synonyms can be added by the :xapian-method:`WritableDatabase::add_synonym()`. In the following
example ``calender`` is specified as a synonym for ``time``. Users may similarly write a loop to load all
the synonyms from a dictionary file.

.. xapianexample:: search_synonyms
:start-after: Start of adding synonyms
:end-before: End of adding synonyms

QueryParser Integration
=======================
Expand All @@ -42,21 +80,24 @@ In order for any of the synonym features of the QueryParser to work, you must
call :xapian-method:`QueryParser::set_database()` to specify the database to
use.

.. xapianexample:: search_synonyms
:start-after: Start of set database
:end-before: End of set database

If ``FLAG_SYNONYM`` is passed to :xapian-method:`QueryParser::parse_query()`
then the QueryParser will recognise ``~`` in front of a term as indicating a
request for synonym expansion. If ``FLAG_LOVEHATE`` is also specified, you can
request for synonym expansion.

If ``FLAG_LOVEHATE`` is also specified, you can
use ``+`` and ``-`` before the ``~`` to indicate that you love or hate the
synonym expanded expression.

.. todo:: Just check if following statement is correct!

A synonym-expanded term becomes the term itself OR-ed with any listed synonyms,
so ``~truck`` might expand to ``truck OR lorry OR van``. A group of terms is
handled in much the same way.

If a term to be synonym expanded will be stemmed by the QueryParser, then
synonyms will be checked for the unstemmed form first, and then for the stemmed
form, so you can provide different synonyms for particular unstemmed forms
if you want to.

If ``FLAG_AUTO_SYNONYMS`` is passed to
:xapian-method:`QueryParser::parse_query()` then the QueryParser will
automatically expand any term which has synonyms, unless the term is in a phrase
Expand Down

0 comments on commit ebec589

Please sign in to comment.