Skip to content

Commit

Permalink
Remove references to TextIndexNG3 and FieldedTextIndex [ci skip]
Browse files Browse the repository at this point in the history
.. as both are obsolete.

modified:   docs/zopebook/SearchingZCatalog.rst
  • Loading branch information
jugmac00 committed Dec 3, 2019
1 parent 4265356 commit 960ee2d
Showing 1 changed file with 2 additions and 207 deletions.
209 changes: 2 additions & 207 deletions docs/zopebook/SearchingZCatalog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1527,9 +1527,8 @@ results. For example::
This method, however, does not remove duplicates, so
a painting of a sunset by VanGogh would appear twice.

For alternate strategies about searching in two places,
see **PrincipiaSearchSource** and **FieldedTextIndex**, below,
both of which can be used as possible workarounds.
For an alternate strategy about searching in two places,
see **PrincipiaSearchSource** below.

Indexing a Field With Two Index Types
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -1657,210 +1656,6 @@ objects have a **SearchableText** method that returns things
like title, description, body, etc., so that they can be
general-text searched.

Add-On Index Types
------------------

TextIndexNG
~~~~~~~~~~~

TextIndexNG is a new text index that competes with ZCTextIndex.
Unlike ZCTextIndex, TextIndexNG is an add-on product that must be
separately installed. It offers a large number of features:

- Document Converters

If your attribute value isn't plain text, TextIndexNG can convert
it to text to index it. This will allow you to store, for
instance, a PDF file in Zope
and be able to search the text of that PDF file. Current
formats it can convert are: HTML, PDF, Postscript, Word,
Powerpoint, and OpenOffice.

- Stemmer Support

Reduces words to a stem (removes verb endings and
plural-endings), so a user can search for "car" and get "car"
and "cars", without having to try the search twice. It
knows how to perform stemming in 13 different languages.

- Similarity Search

Can find words that are "similar" to your words, based on
the Levenshtein algorithm. Essentially, this measures the
distance between two terms using indicators such as how
many letters differ from one to another.

- Near Search

Can look for words that are near each other. For example,
a search for "Zope near Book" would find results where
these words were close to each other in the document.

- Customizable Parsers

Rather than having only one way to express a query, TextIndexNG
uses a "pluggable" architecture where a Python programmers can
create new parsers. For example, to find a document that
includes the word "snake" but not the word "python", you'd
search for "snake andnot python" in the default parser.
However, given your users expectations (and native language),
they might prefer to say "snake and not python" or "snake
-python" or such. TextIndexNG comes with three different
parsers: a rich, default one, a simple one that is suitable for
more general serarching, and a German one that uses
german-language words ("nicht" for "not", for example).
Although writing a new parser is an advanced task, it would be
possible for you to do so if you wanted to let users express
the question in a different form.

- Stop Words

You can customize the list of "stop words" that are too common
to both indexing or search for.

- Wilcard Search

You can use a "wildcard" to search for part of a word, such as
"doc*" to find all words starting with "doc". Unlike
ZCTextIndex, you can also use wildcards are the start of a
word, such as "\*doc" to find all words ending with "doc", as
well.

- Normalization Support

Removing accented characters so that users can search for an
accented word without getting the accents exactly right.

- Auto-Expansion

This optional feature allows you to get better search results
when some of the query terms could not be found. In this
case, it uses a similarity matching to "expand" the query
term to find more matches.

- Ranking Support

Sorting of results based on their word frequencies,
similar to the sorting capabilities of ZCTextIndex.

TextIndexNG is an excellent replacement for ZCTextIndex,
especially if you have non-English language documents or expect to
have users that will want to use a rich query syntax.

Full information on TextIndexNG is available at
https://pypi.org/project/Products.TextIndexNG3/.

FieldedTextIndex
~~~~~~~~~~~~~~~~

FieldedTextIndex is a new index type that is not (yet) a standard
part of Zope, but is a separate product that can be installed
and used with a standard catalog.

Often, a site will have a combined field (normally
'PrincipiaSearchSource' or 'SearchableText', as described above)
for site-wide searching, and individual fields for more
content-aware searching, such as the indexes on 'latin_name',
'environment', etc.

Since it's slows down performance to concatenate catalog result
sets directly, the best strategy for searching across many fields
is often use the 'PrincipiaSearchSource'/'SearchableText'
strategy of a single text index. However, this can be *too*
limiting, as sometimes users want to search in several fields at
once, rather than in all.

FieldedTextIndex solves these problems by extending the standard
ZCTextIndex so that it can receive and index the textual data of an
object's field attributes as a mapping of field names to field
text. The index itself performs the aggregation of the fielded
data and allows queries to be performed across all fields (like a
standard text index) or any subset of the fields which have been
encountered in the objects indexed.

In other words, a normal 'PrincipiaSearchSource' method would
look something like this::

# concatenate all fields user might want to search
def PrincipiaSearchSource(self):
return self.title + ' ' + self.description \
+ self.latin_name + ' ' + self.environment

However, you have to search this all at once--you can't opt to
search just 'title' and 'latin_name', unless you created separate
indexes for these fields. Creating separate indexes for these
fields is a waste of space and memory, though, as the same
information is indexed several times.

With FieldedTextIndex, your 'PrincipiaSearchSource' method would
look like this::

# return all fields user might want to search
def PrincipiaSearchSource(self):
return { 'title':self.title,
'description':self.description,
'latin_name':self.latin_name,
'environment':self.environment }

This index can be searched with the normal methods::

# search like a normal index
zcat=context.AnimalCatalog
results=zcat({'PrincipiaSearchSource':'jungle'})

In addition, it can be searched indicating which fields you want
to search::

# search only specific fields
zcat=context.AnimalCatalog
results=zcat(
{'PrincipiaSearchSource':'query':'jungle',
'fields':['title','latin_name']})

In this second example, only 'title' and 'latin_name' will be
searched.

In addition, FieldedTextIndexes support *weighing*, so that
different fields "weigh" more in the query weigh, and a match in
that field influences the results so that it appears earlier in the
result list. For example, in our zoo, matching part of an animals
'latin_name' should count very highly, matching part of the
'title' should count highly, and matching part of the description
should count less so.

We can specify the weighing like this::

# search with weighing
zcat=context.AnimalCatalog
results=zcat(
{'PrincipiaSearchSource':'query':'jungle',
'field_weights':{
'latin_name':3,
'title':2,
'description':1 }})

This is a *very* powerful feature for building a comprehensive
search strategy for a site, since it lets us control the results to
better give the user what they probaby want, rather than returning
documents based solely on how many times their search word appears.

The examples given here are for searching a FieldedIndex using
PythonScripts, however they can be searched directly from the
REQUEST in a form like other fields.

Since a FieldedTextIndex can act just like a normal ZCTextIndex if
queried with just a search string, yet offer additional features
above and beyond the normal ZCTextIndex, it's a good idea to use
this for any text index where you'd concatenate more than one
attribute or method result together, such as for 'SearchableText'
or 'PrincipiaSearchSource'.

FieldedTextIndex can be downloaded at
http://old.zope.org/Members/Caseman/FieldedTextIndex/folder_contents.
Full documentation on how to create this type of index, and further
information on how to search it, including how to search it from
web forms, is available in the README file that comes with this
product.

Conclusion
----------
Expand Down

0 comments on commit 960ee2d

Please sign in to comment.