Grow your team on GitHub
GitHub is home to over 28 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.Sign up
Generic code for slidehub pages
The repository for the data in the SIGIR paper "A User Study on Snippet Generation: Text Reuse vs. Paraphrases"
Draft of the new Webis hub page
Web Annotation Tool
Code for constructing TLDR corpus from Reddit dataset
The downloads directory for the webis.de web page. History will be deleted irregularly.
Source code and scripts for the Webis Web Archiver
Implementation of the tri-training algorithm for authorship attribution described in a paper by Qian et al. 2014
Attempt to implement "Authorship Attribution and Verification with Many Authors and Limited Data" by Luyckx et al. [2008
Implementation of Disjoint Author-Document Topic Model
Implementation of "Author Identification by Automatic Learning" from Fréry et al.
This is a reimplementation of the approach to authorship attribution originally described in S. Raghavan, A. Kovashka, and R. Mooney. Authorship Attribution Using Probabilistic Context-Free Grammars. In Proc. of the ACL 2010 Conference Short Papers (pp. 38-42). Association for Computational Linguistics, 2010.
Python-Translation of the implemented algorithm
2017 Studienstiftung Natur- und Ingenieurswissenschaftliches Forschungskolleg
Tries to implement the algorithms in the paper 'Determining if two Documents are written by the same author' by Koppel and Winter from 2014
Tracking current Wikipedia edits from IRC to detect vandalism
Wikipedia Vandalism Detection with JRuby
The experiment software underlying two papers published at ECIR-2015 and SEMEVAL-2015.
Analyzing Wikipedia Vandalism with JRuby and Hadoop