@trec-kba

TREC KBA & StreamCorpus

common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text

  • stop word lists in several languages

    Python 15 14 Updated Mar 25, 2017
  • framework for making streamcorpus data

    HTML 9 5 MIT Updated Mar 11, 2017
  • MOVED to

    Updated Dec 31, 2014
  • Tools for working with TREC KBA entities, training data, and run submissions

    Python 5 3 Updated Nov 16, 2014
  • scoring tools for TREC KBA

    Python 1 5 MIT Updated Nov 16, 2014
  • Python 1 MIT Updated Aug 25, 2014
  • MIT Updated Jul 20, 2014
  • integrate factorie language analyzer into streamcorpus-pipeline

    Python 1 MIT Updated Jun 26, 2014
  • Wrappers for generating one-word-per-line output representing all the goodies from Stanford CoreNLP, so we can include it in the KBA stream corpus.

    Java 4 Updated Jan 17, 2013
  • Tools for working with TREC KBA Corpora

    Python 4 4 Updated Dec 14, 2012
  • This project contains some Hadoop code for working with the TREC Knowledge Base Acceleration dataset. In particular, it provides classes to read/write topic files, read/write run files, and expose the documents in the Thrift files as Hadoop-readable objects.

    Java 5 Updated Jul 24, 2012