Permalink
Browse files

Proper handling of stop words (separately from dict/affix files). Add…

…ed ability to reset the segment with shared_ispell_reset() function, easier to get info about the memory usage with shared_ispell_mem_used() and shared_ispell_mem_available().
  • Loading branch information...
1 parent ccd9ef0 commit 84ad0312f95144659bd270e29c19d37122a3ad0d @tvondra committed Jan 3, 2012
Showing with 288 additions and 72 deletions.
  1. +24 −4 README
  2. +21 −4 sql/shared_ispell--1.0.0.sql
  3. +231 −64 src/shared_ispell.c
  4. +12 −0 src/spell.h
View
28 README
@@ -51,12 +51,19 @@ the shared segment. This is a hard limit, the shared segment is not
extensible and you need to set it so that all the dictionaries fit
into it and not much memory is wasted.
-Set it higher than you need, load all the dictionaries and check the
-log - after loading each dictionary, there's a LOG message with info
-about how much memory is available. Use that to tweak the GUC.
+To find out how much memory you actually need, use a large value
+(e.g. 200MB) and load all the dictionaries you want to use. Then use
+the shared_ispell_mem_used() function to find out how much memory
+was actually used (and set the max_size GUC variable accordingly).
+
+Don't set it exactly to that value, leave there some free space,
+so that you can reload the dictionaries without changing the GUC
+max_size limit (which requires a restart of the DB). Ssomething
+like 512kB should be just fine.
The shared segment can contain several dictionaries at the same time,
-the amount of memory is the only limit.
+the amount of memory is the only limit. There's no limit on number
+of dictionaries / words etc. Just the max_size GUC variable.
Using the dictionary
@@ -84,3 +91,16 @@ and then do the usual stuff, e.g.
SELECT ts_lexize('czech_shared', 'automobile');
or whatever you want.
+
+
+Resetting the dictionary
+------------------------
+If you need to reset the dictionary (e.g. so that you can reload the
+updated files from disk), use shared_ispell_reset() function. Eveyone
+who already uses the dictionaries will be forced to reinitialize the
+data (first one will rebuild the dictionary in shared segment, the
+other ones will use this).
+
+ SELECT shared_ispell_reset();
+
+That's all for now ...
@@ -1,18 +1,34 @@
-CREATE OR REPLACE FUNCTION shared_dispell_init(internal)
+CREATE OR REPLACE FUNCTION shared_ispell_init(internal)
RETURNS internal
AS 'MODULE_PATHNAME', 'dispell_init'
LANGUAGE C IMMUTABLE;
-CREATE OR REPLACE FUNCTION shared_dispell_lexize(internal,internal,internal,internal)
+CREATE OR REPLACE FUNCTION shared_ispell_lexize(internal,internal,internal,internal)
RETURNS internal
AS 'MODULE_PATHNAME', 'dispell_lexize'
LANGUAGE C IMMUTABLE;
+CREATE OR REPLACE FUNCTION shared_ispell_reset()
+ RETURNS void
+ AS 'MODULE_PATHNAME', 'dispell_reset'
+ LANGUAGE C IMMUTABLE;
+
+CREATE OR REPLACE FUNCTION shared_ispell_mem_used()
+ RETURNS integer
+ AS 'MODULE_PATHNAME', 'dispell_mem_used'
+ LANGUAGE C IMMUTABLE;
+
+CREATE OR REPLACE FUNCTION shared_ispell_mem_available()
+ RETURNS integer
+ AS 'MODULE_PATHNAME', 'dispell_mem_available'
+ LANGUAGE C IMMUTABLE;
+
CREATE TEXT SEARCH TEMPLATE shared_ispell (
- INIT = shared_dispell_init,
- LEXIZE = shared_dispell_lexize
+ INIT = shared_ispell_init,
+ LEXIZE = shared_ispell_lexize
);
+/*
CREATE TEXT SEARCH DICTIONARY czech_shared (
TEMPLATE = shared_ispell,
DictFile = czech,
@@ -26,3 +42,4 @@ ALTER TEXT SEARCH CONFIGURATION czech_shared
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH czech_shared;
+*/
Oops, something went wrong.

0 comments on commit 84ad031

Please sign in to comment.