Common Lisp NLP toolset
Common Lisp Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
api Added softmax (based on gradient descent learning), fixed full-text-t… Jul 31, 2017
contrib Missing contrib packages update Aug 12, 2017
corpora Improvements to lemmatizer, mem-vecs, learning, misc utils Feb 10, 2017
coursera-nlang
data
docs doc: rename `text-tokens' to `text-tokenized' Apr 2, 2016
langs Tokenization improvements May 26, 2017
lib
models/pos-tagging
nltk Improvements to lemmatizer, mem-vecs, learning, misc utils Feb 10, 2017
src Trim-white and dep-tail (instead of child). Oct 5, 2017
test Trim-white and dep-tail (instead of child). Oct 5, 2017
.gitignore Trim-white and dep-tail (instead of child). Oct 5, 2017
.travis.yml Trim-white and dep-tail (instead of child). Oct 5, 2017
LICENSE
README.md doc update Jan 11, 2018
cl-nlp-api.asd Added embeddings API Jul 21, 2017
cl-nlp-contrib.asd
cl-nlp.asd Added softmax (based on gradient descent learning), fixed full-text-t… Jul 31, 2017
cl-nltk.asd Added ngrams and language-models. Added *stopwords-en*, find-collocat… Feb 26, 2013
coursera-nlang.asd Added greedy average perceptron tagging. Added learning package. Adde… Jul 4, 2014
mkdocs.yml Minor docs fixes Nov 29, 2014
run-api.lisp fix Apr 19, 2017
version.txt

README.md

Build Status Documentation Status

CL-NLP -- a Lisp NLP toolkit

Brief description

Eventually, CL-NLP will provide a comprehensive and extensible set of tools to solve natural language processing problems in Common Lisp.

The goals of the project include the following:

  • support for constructing arbitrary NLP pipelines on top of it
  • support for easy and fast experimentation and development of new models and approaches
  • serve as a good framework for teaching NLP concepts

It comprises of a number of utility/horizontal and end-user/vertical modules that implement the basic functions and provide a way to add own extensions and models.

The utility layer includes:

  • tools for transforming raw natural language text, as well as various corpora into a form suitable for further processing
  • basic support for language modelling
  • support for a number of linguistic concepts
  • support for working with machine learning models and a number of training algorithms

The end-user layer will provide:

  • POS taggers
  • constituency parsers
  • dependency parsers
  • other stuff (will be added step-by-step, suggestions are welcome)

How to start working with CL-NLP

The project has already reached a stage of usefulness for the primary author: for instance, it supports my current language modelling experiments by providing easy access to treebanks and other utilities.

Yet, it is far from being production-ready. So, if you want to use it for production tasks, expect to bleed on the bleeding edge.

Otherwise, if you want to contribute to developing the toolkit, you're very welcome. Here are a few write-ups to give you the sense of the project and to help get started:

You'll also, probably, need to track the latest version of RUTILS from git.

For CL-NLP to reach v.0.1 that may be considered suitable for limited use by non-contributors, the following things should be finished (work-in-progress):

  • implement a comprehensive test-suite and fix all bugs encountered in the process
  • describe available models and their quality metrics

Technical notes

Dependencies

For development:

License

The license of CL-NLP is Apache 2.0.

Specific models may have different license due to the limitations of the dataset they are built with. Please see a <model>.license file accompanying each model for details.

(c) 2013-2014, Vsevolod Dyomkin vseloved@gmail.com