Permalink
Browse files

added list of stopwords, much more aggressive than before, feel free …

…to use your own
  • Loading branch information...
1 parent 16f0b5f commit fa9ad21f2bed1a17551806aa0b3b3f1cec00b79b @thedatachef committed Apr 29, 2011
Showing with 425 additions and 3 deletions.
  1. +0 −1 FIXME.txt
  2. +420 −0 src/main/java/varaha/text/StopWords.java
  3. +5 −2 src/main/java/varaha/text/TokenizeText.java
View
@@ -1,2 +1 @@
-- text tokenizer needs some aggressive filtering of stopwords. Look at other Lucene analyzers
- standardize the way jars are dealt with, using relative paths in the scripts themselves is crufty and doesn't scale
Oops, something went wrong.

0 comments on commit fa9ad21

Please sign in to comment.