diff --git a/ZCatalog.txt b/ZCatalog.txt new file mode 100755 index 00000000..bc78752a --- /dev/null +++ b/ZCatalog.txt @@ -0,0 +1,464 @@ +ZCatalog Tutorial + + This document provides a tutorial for 'ZCatalog', the new search + engine machinery in Zope. The audience for the document is content + managers. + + Contents + + o What is it? What's it for? Why's it so cool? + + o Installing ZCatalog + + o ZCatalog Objects + + o Example using ZCatalog + + o Creating Search Forms And Result Reports + + o Using ZCatalog In A Zope Site + + o ZCatalog vs. Catalog + + What is it? What's it for? Why's it so cool? + + The 'ZCatalog' provides powerful indexing and searching on a Zope + database using a Zope management interface. A 'ZCatalog' is a + Zope object that can be added to a Folder, managed through the + web, and extended in many ways. + + The 'ZCatalog' is a very significant project, providing a number + of compelling features: + + o **Searches are fast**. The data structures used by the index + provide extremely quick searches without consuming much memory. + + o **Searches are robust**. The 'ZCatalog' supports boolean + search terms, proximity searches, synonyms and stopwords. + + o **Indexing is wildly flexible**. A 'ZCatalog' can catalog + custom properties and track unique values. Since 'ZCatalog' + catalogs objects instead of file handles, you can index any + content that can have a Python object wrapped around it. This + also lets objects participate in how they are cataloged, + e.g. de-HTML-ifying contents or extracting PDF properties. + + o **Usable outside of Zope**. The software is broken into a + Python 'Catalog' which wrapped by a 'ZCatalog'. The Python + 'Catalog' can be used in any Python program; all it requires is + the Z object database and the indexing machinery from Zope. + + o **Transactional**. An indexing operation is part of a Zope + transaction. If something goes wrong after content is indexed, + the index is restored to its previous condition. This also means + that Undo will restore an index to its previous condition. + Finally, a 'ZCatalog' can be altered privately in a Version, + meaning no one else can see the changes to the index. + + o **Cache-friendly**. The index is internally broken into + different "buckets", with each bucket being a separate Zope + database object. Thus, only the part of the index that is needed + is loaded into memory. Alternatively, an un-needed part of the + index can be removed from memory. + + o **Results are lazy**. A search that returns a tremendous + number of matches won't return a large result set. Only the + part of the results, such as the second batch of twenty, are + returned. + + The 'ZCatalog' is a free, Open Source part of the Zope software + repository and thus is covered under the same license as Zope. It + is being developed in conjunction with the Zope Portal Toolkit + effort. However, the 'ZCatalog' product is managed as its own + module in CVS. + + Installing ZCatalog + + 'ZCatalog' can be downloaded from the Zope download area and is + also a module in the public CVS for Zope. Untar it while in the + root directory of your Zope installation:: + + $ cd Zope-2.0.0a3-src/ + $ tar xzf ../ZCatalog-x.x.tgz + + Windows users can use WinZip or a similar utility to accomplish + the same thing. + + Also, Zope 2.0.0a3 does not have the latest version of UnIndex and + UnTextIndex which fix a couple of bugs in the alpha 3 versions. + The latest CVS of the SearchIndex packages *must* be used. + + Remember, you have to restart your Zope server before you will see + 'ZCatalog'. + + ZCatalog Objects + + A 'ZCatalog' performs two activities: indexing information and + performing searches. + + Most the work is done in the first step, which is getting objects + into the index. This is done in two ways. First, if your objects + are ZCatalog-aware they automatically update the index when the + are added, edited or directly deleted. A ZCatalog-aware object is + one that is an instance of a 'Z Class' that informs the 'ZCatalog' + of changes. *Directly deleted* means the object was deleted from + a Folder, not the deletion of a containing Folder. + + The second way that site contents get updated is by "finding" + information "into" the 'ZCatalog'. An operation based on Zope's + Find view traverses Folders looking for objects matching the + criterion. The objects are then registered with the Catalog. + Objects in the index but no longer in the site are removed from + the Catalog. + + Either way -- automatically updating or walking the Folders -- + 'ZCatalog' indexes the objects it finds. The 'ZCatalog' is set up + to look for properties, each of which are added to the index. + + There are two kinds of indexes, called FieldIndex and TextIndex. + FieldIndex indexes treat data atomicly. The entire contents of a + FieldIndex-indexed property is treated as a unit. With a + TextIndex index, it is broken into words which are indexed + individually. A TextIndex is also known as *full-text index*. + + Note that the 'ZCatalog' doesn't track ZCatalog-unaware objects + after it has indexed them. This means that the 'ZCatalog' must + reindex its objects occasionally when the objects have been + chanced. Out of date indexes can be prevented by inheriting from + a ZCatalog-aware class which can tell the 'ZCatalog' to reindex it + whenever a change is made. Just such a class will be included + with the Portal toolkit. + + ZCatalogs are "searchable objects", meaning they cooperate with Z + Search Interfaces documented in Z SQL Methods. Creating a search + form for a 'ZCatalog' is a simple matter of adding a Z Search + Interface from the management screen and filling in a form. + ZCatalogs can also be queried directly from DTML, as shown in the + example below. + + Example using Z Classes + + The first example shows how to give your Zope site a long-desired + feature: full text-searches of your content. The example assumes + you already have a number of DTML Methods/Documents to catalog. + + o Install 'ZCatalog' as instructed above + + o In the root folder of your Zope server, add a 'ZCatalog'. + + o Type in the id 'catalog' and hit 'Add'. + + You now have a brand new 'ZCatalog' named 'catalog' in your root + folder. + + o Click on it. + + Now you are looking at the 'ZCatalog' 'Contents' view. It says + the catalog is empty. We'll catalog some objects in a moment, but + first we have to tell it what portions of objects we are + specifically interested in. + + o Click on 'Indexes'. + + This management view is where the attributes to be indexed are + defined. + + o In the 'Add index' field, type 'raw'. + + o Click 'Add'. + + Now that the indexes are defined, a set of objects can be selected + for cataloging. + + o Click on 'Find items to ZCatalog'. + + For this example, we are only interested in DTML Documents and + Methods. + + o Deselect 'All type'. + + o Select 'DTML Method' and 'DTML Document'. + + o Click 'Find'. + + ZCatalog will report how many items it found, and then present an + interface for excluding specific objects. + + o Click 'Catalog Items'. + + Great, now that the catalog is stocked, we can create a user + interface to it. + + o Return to the root folder's management view. + + o Add a 'Z Search Interface'. + + 'ZCatalog' participates in the Zope Search architecture. You + simply have to fill in this form, and a basic user interface will + be created. + + o Select 'catalog' in the list beside 'Select one or more searchable + objects'. + + o Beside 'Report Id', type 'report'. + + o Beside 'Search Input Id', type 'search'. + + 'report' and 'search' are the Ids of two DTML Methods which will + be created in your root folder. + + o Click 'Add'. + + Congratulations, if all has gone well, you can now find references + to any word in your DTML pages. Try it by viewing 'search'. Type + a common word in the 'Raw' field, and you should be presented with + a list of hits. However, none of the results returned can be + clicked on. To fix this, go to the management view of 'report'. + 'report' is called by 'search' to display the results from + 'catalog'. 'report' is just a simple '' loop + with a few refinements. 'catalog' knows which results to return + by looking at the REQUEST variable, which contains the input from + the 'search' form. + + o In the source of 'report', find the following line:: + + + + o Replace it with this:: + + + "> + + + + + This is a little confusing at first. Keep in mind that ZCatalog + does not return a list of your database objects. What it returns + are actually fairly unintelligent instances of a Record subclass. + These record objects contain copies of data from attributes of + catalog objects. The 'ZCatalog' 'MetaData Table' view defines + which attributes are copied. + + (By default, these record objects are just SLIGHTLY more + intelligent than a raw tuple. 'Catalog' can be told to use a + custom, intelligent class for results. Please see the 'Catalog' + __init__ method in 'lib/python/Products/ZCatalog/Catalog.py' for + more information.) + + Fortunately, ZCatalog provides a utility function for going from + result objects to the object's path. It is called, aptly enough, + 'getpath'. 'getpath' expects to be passed the unique integer + identifier of the cataloged object. Results store that id as + 'data_record_id_'. + + Commit this change, and perform another search. Now the title can + be clicked on to take you to the full page. + + Example cataloging custom objects + + As if full-text searches of your entire site weren't good enough, + ZCatalog can also catalog Z Classes, Products, and in fact any + Python object you can put in a ZODB. Here is an example using a Z + Class, but the principles apply to any kind of object. + + First, we're going to need something to catalog. Follow the 'Z + Class' tutorial to create the CD 'Z Class'. Back? Good. + + o Create a folder, 'CDs', and create a number of instances of + the CD Z Class in it. + + 'cd1' through 'cd5' should be plenty. Remember to fill them each + in from their Properties view. + + Now we want to create a searchable catalog of CDs. + + o Go to the 'CDs' folder and create a 'ZCatalog' with an ID 'cd_cat'. + + o Click on the objects Indexes view. + + This screen shows that, by default, 'ZCatalog' is interested in an + object's 'id','title', 'meta_type', and + 'bobobase_modification_time'. You will almost always want to + index additional information. In this case, we would also like to + index the artist and description of CDs. + + o Type 'artist' into the 'Add Index' field. + + For the sake of example, we're going to use a FieldIndex index for + artist. This will give us the option of putting an HTML SELECT + box for artists on the search form. + + o Select FieldIndex from the Index type drop down, and click + 'Add'. + + o Also add an index for 'description', but leave TextIndex + selected. + + This will allow us to search for individual words within the + description. + + o Click on 'MetaData Table'. + + This is where we tell the 'ZCatalog' what attributes of cataloged + objects to cache. These cached values are available from search + results without having to look up the actual indexed object. The + tradeoff for the speed is extra memory, as information from the + content is duplicated in the 'ZCatalog'. + + You will probably want to keep the schema light-weight, so we're + not going to add 'description' to it. Type 'artist' in the 'Add + column' field and click 'Add'. + + o Click on the 'Find Items to Catalog' view. + + This is the interface you use to tell the 'ZCatalog' which items + to index. Right now, beside 'Find objects of type:', 'All types' + is selected. + + o Deselect 'All types'. + + O Scroll down and select CD. + + You could use the rest of the form to be more specific, but since + we want to catalog all the CDs, + + o Click 'Find'. + + 'ZCatalog' will report 'Found 5 items.' It is now giving you an + opportunity to exclude some of the matched items from the index. + Again, we want all of them, so, + + o Click 'Catalog Items'. + + You should at this point see a list of the indexed objects. Also + of note is the 'Update Catalog' button. You have to use it + whenever you want your 'ZCatalog' to notice changes you've made to + the objects it's indexed. + + Creating Search Forms And Result Reports + + This catalog isn't much good without some way of querying it. + + o Go back to your 'CDs' folder's management screen and add a Z + Search Interface. + + The search add form will automatically detect your cd_cat + 'ZCatalog' and offer it as a searchable document. Make sure it is + selected. + + o Fill in 'cd_report' for 'Report ID' and 'cd_search' for + 'Search Input ID'. + + Those are the ids of two DTML methods that will be generated in + the 'CDs' folder. + + o Click 'Add'. + + o View the 'cd_search' Catalog (at, for example, + http://localhost:9673/CDs/cd_search). + + You will see a basic search interface, with fields for searching + on 'title', modification date, 'id', 'artist', 'meta type' and + 'description'. If you fill in one more more of the fields and + click 'Submit Query', cd_report will be displayed. It is passed + the search criteria and uses it to get a list from cd_cat to + iterate over. It is merely displaying the information from the + ZCatalog's MetaData table, but of course it can be enriched. + + Try a few more searches. You'll find that you can type any single + word from the title or description and get a match, but for artist + you must type the exact string. That's because artist was indexed + as a FieldIndex, which gives us an opportunity to present a more + convenient interface. + + Go back to the 'cd_search' management interface, and change + it's source to look like this:: + + + <!--#var standard_html_header--> + <form action="cd_report" method="get"> + <h2><!--#var document_title--></h2> + Enter query parameters:<br><table> + <tr><th>Title</th> + <td><input name="title" + width=30 value=""></td></tr> + <tr><th>Artist</th> + <td> + <select name="artist"> + <option value="">All</option> + <!--#in expr="cd_cat.uniqueValuesFor('artist')"--> + <option value="<!--#var sequence-item-->"> + <!--#var sequence-item--> + </option> + <!--#/in--> + </select> + </td> + </tr> + <tr><th>Description</th> + <td><input name="description" + width=30 value=""></td></tr> + <tr><td colspan=2 align=center> + <input type="SUBMIT" name="SUBMIT" value="Submit Query"> + </td></tr> + </table> + </form> + <!--#var standard_html_footer--> + + + This is a search form somewhat more appropriate for the CD 'Z + Class'. Unrelated fields have been removed, and the 'artist' + field has been changed to a drop-down menu. Let's augment the + output of 'cd_report' to make the title a link to the actual CD + object. + + Taking a look at 'cd_report', note that the search results are + obtained with a simple '' tag. The search + criteria is automatically obtained by the 'ZCatalog' from the form + input. The line we're interested in is this one:: + + + + Change it to read:: + + + "> + + + + + Now, assuming you have added the index_html document template to + your CD 'Z Class', clicking on a search result will take you to + the CD's detailed display. + + Using 'ZCatalog' In A Zope Site + + The 'ZCatalog' provides high-speed access to what is on your site. + Thus, the 'ZCatalog' can be used to re-engineer the way your site + is laid out. + + For instance, a Slashdot-style presentation is simple. Just + insert some DTML that asks the 'ZCatalog' for recent items. + Alternatively, a Site Map is nothing more than presenting the + contents of the catalog. A page with tree-based browsing of + software packages by category is also easy. Perhaps you'd like to + provide a link that lists all the packages the current user has + authored. + + Thus, the 'ZCatalog' isn't just about searching. It can be used + as the DTML-scriptable engine for browsing a site as well. + + Since the 'ZCatalog' is a normal Zope folderish object, you can + also create DTML Methods inside it to present the catalog + contents. For instance, perhaps you'd like to dump the contents + of the site as an RDF stream, or do content syndication with RSS. + These are just DTML Methods that change the 'Content-Type:' and + send back XML. All without actually waking up any of the content + objects in the site. + + ZCatalog vs. Catalog + + The real star of this package is the 'Catalog' module. All the + heavy lifting is done by 'Catalog'. 'ZCatalog' is basically a + Zope-aware wrapper around Catalog, which can be used on it's own + outside the Zope framework. The only requirement is that you are + using ZODB as your object store.