Common Lisp NLP toolset
Common Lisp
Latest commit 795d97a Apr 3, 2016 @vseloved Merge pull request #28 from dkochmanski/master
replace +project-root+
Failed to load latest commit information.
contrib fix: change +project-root+ references to (asdf:system-definition-path… Apr 3, 2016
corpora Reworked corpora/contrib.corpora modules, added tests and readme, a W… Jun 25, 2015
coursera-nlang Added greedy average perceptron tagging. Added learning package. Adde… Jul 4, 2014
data Added Wordnet interface to contrib. Added utils to dowload and save f… Apr 11, 2013
docs doc: rename `text-tokens' to `text-tokenized' Apr 2, 2016
lib v.0.0.9: learning module, avg-perceptron model, greedy-ap tagger, Pen… Sep 3, 2014
models/pos-tagging Added prototype versions of decision tree and random forest classifie… May 28, 2015
nltk
src typo fix Aug 12, 2015
test Reworked corpora/contrib.corpora modules, added tests and readme, a W… Jun 25, 2015
.gitignore Add mkdocs site to ignore list Nov 25, 2014
.travis.yml API to run tests and collect test suite status Nov 16, 2014
LICENSE Added copyright and license information Feb 3, 2013
README.md
cl-nlp-contrib.asd fixes Jun 25, 2015
cl-nlp.asd Reworked corpora/contrib.corpora modules, added tests and readme, a W… Jun 25, 2015
cl-nltk.asd Added ngrams and language-models. Added *stopwords-en*, find-collocat… Feb 26, 2013
coursera-nlang.asd Added greedy average perceptron tagging. Added learning package. Adde… Jul 4, 2014
mkdocs.yml Minor docs fixes Nov 29, 2014
version.txt Reverted accidentally broken test. Nov 30, 2014

README.md

Build Status Documentation Status

CL-NLP -- a Lisp NLP toolkit

Brief description

Eventually, CL-NLP will provide a comprehensive and extensible set of tools to solve natural language processing problems in Common Lisp.

The goals of the project include the following:

  • support for constructing arbitrary NLP pipelines on top of it
  • support for easy and fast experimentation and development of new models and approaches
  • serve as a good framework for teaching NLP concepts

It comprises of a number of utility/horizontal and end-user/vertical modules that implement the basic functions and provide a way to add own extensions and models.

The utility layer includes:

  • tools for transforming raw natural language text, as well as various corpora into a form suitable for further processing
  • basic support for language modelling
  • support for a number of linguistic concepts
  • support for working with machine learning models and a number of training algorithms

The end-user layer will provide:

  • POS taggers
  • constituency parsers
  • dependency parsers
  • other stuff (will be added step-by-step, suggestions are welcome)

How to start working with CL-NLP

The project has already reached a stage of usefulness for the primary author: for instance, it supports my current language modelling experiments by providing easy access to treebanks and other utilities.

Yet, it is far from being production-ready. So, if you want to use it for production tasks, expect to bleed on the bleeding edge.

Otherwise, if you want to contribute to developing the toolkit, you're very welcome. Here are a few write-ups to give you the sense of the project and to help get started:

You'll also, probably, need to track the latest version of RUTILS from git.

For CL-NLP to reach v.0.1 that may be considered suitable for limited use by non-contributors, the following things should be finished (work-in-progress):

  • implement a comprehensive test-suite and fix all bugs encountered in the process
  • describe available models and their quality metrics

Technical notes

Current limitations:

  • targeted at English language only

Dependencies

For development:

License

The license of CL-NLP is Apache 2.0.

Specific models may have different license due to the limitations of the dataset they are built with. Please see a <model>.license file accompanying each model for details.

(c) 2013-2014, Vsevolod Dyomkin vseloved@gmail.com