Grow your team on GitHub
GitHub is home to over 28 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.Sign up
stop word lists in several languages
framework for making streamcorpus data
Tools for working with TREC KBA entities, training data, and run submissions
scoring tools for TREC KBA
integrate factorie language analyzer into streamcorpus-pipeline
Wrappers for generating one-word-per-line output representing all the goodies from Stanford CoreNLP, so we can include it in the KBA stream corpus.
Tools for working with TREC KBA Corpora
This project contains some Hadoop code for working with the TREC Knowledge Base Acceleration dataset. In particular, it provides classes to read/write topic files, read/write run files, and expose the documents in the Thrift files as Hadoop-readable objects.