Skip to content

TransparencyToolkit/Catalyst

Repository files navigation

Catalyst

Catalyst is text mining software designed to help automatically process data collected by Harvester and add useful information to LookingGlass archives.

Dependencies

  • rails
  • ruby
  • openjdk-8-jdk
  • openjdk-8-jre
  • libcurl3
  • libcurl3-gnutls
  • libcurl4-openssl-dev
  • sqlite3
  • hunspell
  • libidn11-dev
  • libsqlite3-dev
  • DocManager
  • Stanford NER

Setup

Install Dependencies

Please install DocManager (https://github.com/TransparencyToolkit/DocManager) and LookingGlass (https://github.com/TransparencyToolkit/LookingGlass) first. If you have DocManager and LookingGlass, you will already have the correct version of many dependencies.

Aside from installing LookingGlass and DocManager, you should run-

apt-get install openjdk-8-jdk openjdk-8-jre libcurl3 libcurl3-gnutls \
                libcurl4-openssl-dev sqlite3

Download Stanford NER

Download and unzip Stanford NER from https://nlp.stanford.edu/software/CRF-NER.html#Download

Install Gems

bundle install

If cld fails to install, you may need to run: CFLAGS="-Wno-narrowing" CXXFLAGS="$CFLAGS" gem install cld

Setup database

rake db:create
rake db:reset

Preparing to Run Catalyst

Start DocManager and (optionally) LookingGlass

Please see the LG and DocManager repos for current instructions-

Start Named Entity Recognition

cd into the stanford-ner directory, then run-

java -mx1000m -cp stanford-ner.jar:lib/* edu.stanford.nlp.ie.NERServer \
     -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz \
     -port 9002 -outputFormat inlineXML

Start Catalyst

From the Catalyst repository directory, run:

rails server -p 9004

Run Catalyst

Run a script that tells Catalyst what to do.

Releases

No releases published

Packages

No packages published

Languages