Updates to Zope's keyphrase extractor (forked from 1.1.0)
Clone or download
Latest commit b7d8fda Nov 25, 2011


This package determines important terms within a given piece of content. It
uses linguistic tools such as Parts-Of-Speech (POS) and some simple
statistical analysis to determine the terms and their strength.

NOTE: This is a fork by Joseph Turian of topia.termextract 1.1.0
    * Unicode alphabetic characters are tokenized correctly.
    I changed TERM_SPEC in topic.termextract.tag:
        Old = [u'S', u'\xe3o', u'Paulo', u'was', u'home', u'to']
        New = [u'S\xe3o', u'Paulo', u'was', u'home', u'to']
    * extractor.extract() now has a parameter KEEP_ORIGINAL_SPACING=True,
    which allows you to keep the original spacing of the term:
        Old = [u'Mr . Smith']
        New = [u'Mr. Smith']
    * Fixed a bug where a term wouldn't be found if it was literally
    the last token of the sentence.
    * Fixed a bug (?) where unigram terms were included even if their
    tokens were part of a multiterm.