diff --git a/ZCatalog.txt b/ZCatalog.txt new file mode 100755 index 00000000..bc78752a --- /dev/null +++ b/ZCatalog.txt @@ -0,0 +1,464 @@ +ZCatalog Tutorial + + This document provides a tutorial for 'ZCatalog', the new search + engine machinery in Zope. The audience for the document is content + managers. + + Contents + + o What is it? What's it for? Why's it so cool? + + o Installing ZCatalog + + o ZCatalog Objects + + o Example using ZCatalog + + o Creating Search Forms And Result Reports + + o Using ZCatalog In A Zope Site + + o ZCatalog vs. Catalog + + What is it? What's it for? Why's it so cool? + + The 'ZCatalog' provides powerful indexing and searching on a Zope + database using a Zope management interface. A 'ZCatalog' is a + Zope object that can be added to a Folder, managed through the + web, and extended in many ways. + + The 'ZCatalog' is a very significant project, providing a number + of compelling features: + + o **Searches are fast**. The data structures used by the index + provide extremely quick searches without consuming much memory. + + o **Searches are robust**. The 'ZCatalog' supports boolean + search terms, proximity searches, synonyms and stopwords. + + o **Indexing is wildly flexible**. A 'ZCatalog' can catalog + custom properties and track unique values. Since 'ZCatalog' + catalogs objects instead of file handles, you can index any + content that can have a Python object wrapped around it. This + also lets objects participate in how they are cataloged, + e.g. de-HTML-ifying contents or extracting PDF properties. + + o **Usable outside of Zope**. The software is broken into a + Python 'Catalog' which wrapped by a 'ZCatalog'. The Python + 'Catalog' can be used in any Python program; all it requires is + the Z object database and the indexing machinery from Zope. + + o **Transactional**. An indexing operation is part of a Zope + transaction. If something goes wrong after content is indexed, + the index is restored to its previous condition. This also means + that Undo will restore an index to its previous condition. + Finally, a 'ZCatalog' can be altered privately in a Version, + meaning no one else can see the changes to the index. + + o **Cache-friendly**. The index is internally broken into + different "buckets", with each bucket being a separate Zope + database object. Thus, only the part of the index that is needed + is loaded into memory. Alternatively, an un-needed part of the + index can be removed from memory. + + o **Results are lazy**. A search that returns a tremendous + number of matches won't return a large result set. Only the + part of the results, such as the second batch of twenty, are + returned. + + The 'ZCatalog' is a free, Open Source part of the Zope software + repository and thus is covered under the same license as Zope. It + is being developed in conjunction with the Zope Portal Toolkit + effort. However, the 'ZCatalog' product is managed as its own + module in CVS. + + Installing ZCatalog + + 'ZCatalog' can be downloaded from the Zope download area and is + also a module in the public CVS for Zope. Untar it while in the + root directory of your Zope installation:: + + $ cd Zope-2.0.0a3-src/ + $ tar xzf ../ZCatalog-x.x.tgz + + Windows users can use WinZip or a similar utility to accomplish + the same thing. + + Also, Zope 2.0.0a3 does not have the latest version of UnIndex and + UnTextIndex which fix a couple of bugs in the alpha 3 versions. + The latest CVS of the SearchIndex packages *must* be used. + + Remember, you have to restart your Zope server before you will see + 'ZCatalog'. + + ZCatalog Objects + + A 'ZCatalog' performs two activities: indexing information and + performing searches. + + Most the work is done in the first step, which is getting objects + into the index. This is done in two ways. First, if your objects + are ZCatalog-aware they automatically update the index when the + are added, edited or directly deleted. A ZCatalog-aware object is + one that is an instance of a 'Z Class' that informs the 'ZCatalog' + of changes. *Directly deleted* means the object was deleted from + a Folder, not the deletion of a containing Folder. + + The second way that site contents get updated is by "finding" + information "into" the 'ZCatalog'. An operation based on Zope's + Find view traverses Folders looking for objects matching the + criterion. The objects are then registered with the Catalog. + Objects in the index but no longer in the site are removed from + the Catalog. + + Either way -- automatically updating or walking the Folders -- + 'ZCatalog' indexes the objects it finds. The 'ZCatalog' is set up + to look for properties, each of which are added to the index. + + There are two kinds of indexes, called FieldIndex and TextIndex. + FieldIndex indexes treat data atomicly. The entire contents of a + FieldIndex-indexed property is treated as a unit. With a + TextIndex index, it is broken into words which are indexed + individually. A TextIndex is also known as *full-text index*. + + Note that the 'ZCatalog' doesn't track ZCatalog-unaware objects + after it has indexed them. This means that the 'ZCatalog' must + reindex its objects occasionally when the objects have been + chanced. Out of date indexes can be prevented by inheriting from + a ZCatalog-aware class which can tell the 'ZCatalog' to reindex it + whenever a change is made. Just such a class will be included + with the Portal toolkit. + + ZCatalogs are "searchable objects", meaning they cooperate with Z + Search Interfaces documented in Z SQL Methods. Creating a search + form for a 'ZCatalog' is a simple matter of adding a Z Search + Interface from the management screen and filling in a form. + ZCatalogs can also be queried directly from DTML, as shown in the + example below. + + Example using Z Classes + + The first example shows how to give your Zope site a long-desired + feature: full text-searches of your content. The example assumes + you already have a number of DTML Methods/Documents to catalog. + + o Install 'ZCatalog' as instructed above + + o In the root folder of your Zope server, add a 'ZCatalog'. + + o Type in the id 'catalog' and hit 'Add'. + + You now have a brand new 'ZCatalog' named 'catalog' in your root + folder. + + o Click on it. + + Now you are looking at the 'ZCatalog' 'Contents' view. It says + the catalog is empty. We'll catalog some objects in a moment, but + first we have to tell it what portions of objects we are + specifically interested in. + + o Click on 'Indexes'. + + This management view is where the attributes to be indexed are + defined. + + o In the 'Add index' field, type 'raw'. + + o Click 'Add'. + + Now that the indexes are defined, a set of objects can be selected + for cataloging. + + o Click on 'Find items to ZCatalog'. + + For this example, we are only interested in DTML Documents and + Methods. + + o Deselect 'All type'. + + o Select 'DTML Method' and 'DTML Document'. + + o Click 'Find'. + + ZCatalog will report how many items it found, and then present an + interface for excluding specific objects. + + o Click 'Catalog Items'. + + Great, now that the catalog is stocked, we can create a user + interface to it. + + o Return to the root folder's management view. + + o Add a 'Z Search Interface'. + + 'ZCatalog' participates in the Zope Search architecture. You + simply have to fill in this form, and a basic user interface will + be created. + + o Select 'catalog' in the list beside 'Select one or more searchable + objects'. + + o Beside 'Report Id', type 'report'. + + o Beside 'Search Input Id', type 'search'. + + 'report' and 'search' are the Ids of two DTML Methods which will + be created in your root folder. + + o Click 'Add'. + + Congratulations, if all has gone well, you can now find references + to any word in your DTML pages. Try it by viewing 'search'. Type + a common word in the 'Raw' field, and you should be presented with + a list of hits. However, none of the results returned can be + clicked on. To fix this, go to the management view of 'report'. + 'report' is called by 'search' to display the results from + 'catalog'. 'report' is just a simple '' loop + with a few refinements. 'catalog' knows which results to return + by looking at the REQUEST variable, which contains the input from + the 'search' form. + + o In the source of 'report', find the following line:: + +