Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
pytextpreprocess ================ written by Joseph Turian released under a BSD license Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.) REQUIREMENTS: * My Python common library: http://github.com/turian/common and sub-requirements thereof. * NLTK, for word tokenization e.g. apt-get install python-nltk * Splitta if you want to sentence tokenize The English stoplist is from: http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop However, I added words at the top (above "a").