TexNLP: Texas Natural Language Processing tools
Java Shell Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



TexNLP: Texas Natural Language Processing tools

This is the site for the TexNLP code used in the following papers:

  • Jason Baldridge. 2008. Weakly supervised supertagging with grammar-informed initialization. In Proceedings of COLING-2008. Manchester, UK. PDF

  • Jason Baldridge and Alexis Palmer. 2009. How well does active learning actually work? Time-based evaluation of cost-reduction strategies for language documentation. In Proceedings of EMNLP-09. Singapore. PDF

  • Alexis Palmer, Taesun Moon, Jason Baldridge, Katrin Erk, Eric Campbell, and Telma Can. 2010. Computational strategies for reducing annotation effort in language documentation: A case study in creating interlinear texts for Uspanteko. Linguistic Issues in Language Technology. 3(4):1-42. PDF

The code supports supervised and semi-supervised learning for Hidden Markov Models for tagging, and standard supervised Maximum Entropy Markov Models (using the TADM toolkit). There is additional support for working with categories of Combinatory Categorial Grammar, especially with respect to supertagging for CCGbank.

Please reference Baldridge (2008) if you use this software. Please note that it is not user-friendly and is poorly documented – please email Jason Baldridge (jbaldrid@mail.utexas.edu) if you have questions about getting things working.

Download: TexNLP v0.2.0

License: LGPL

Contributors: Jason Baldridge, Taesun Moon, Elias Ponvert

This development of the software and the research behind it was done as part of the EARL project, supported under NSF grant No. 06651988, "Reducing Annotation Effort in the Documentation of Languages using Machine Learning and Active Learning."