A geiger counter for online radioactivity. This repo emerged as a playground for text classification approaches to a variety of competitions and shared texts related to online toxicity and abuse detection, including:
A makefile has been supplied for conveniently downloading resources and data.
make coling-english
: downloads and unpacks the english training and development data from the Coling Trolling, Aggression, and Cyber-bullying shared task and places underdata/
.make fastata
: download fastText vectors toresources/
make install
: pull submodules dependencies including Babylon's fastText multilingual and python library dependencies.make toxic
: downloads and unpacks dataset from the Toxic Classification Challenge. Note that this requires kaggles cli to be installed and properly configured.make clean
: removes data inresources/