Setup for generating tokens used in Tanimoto experiments
Java Shell Clojure Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
clojure
java
python
runners
.gitignore
README.markdown
clean.sh
generate_all.sh

README.markdown

Token data generation

A small collection of scripts for generating token data for use in Tanimoto query experiments. The data is gathered from molecular data (dense and sparse fingerprints, and lingos) and an actor name data set (q-grams).

To generate the data, clone the repository and run generate_all.sh. If something fails, the data can be generated again by running clean.sh and the generate_all.sh.

The setup requires

  • wget
  • curl
  • bash
  • lein
  • python

A full download and generation of data will take some time. Run main script as nohup so you can log out.