A Modern C++ Data Sciences Toolkit
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is 1405 commits behind meta-toolkit:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
deps
include
preprocessor/clang-lexer
src
.clang-format
.gitignore
.gitmodules
.travis.yml
CMakeLists.txt
LICENSE.mit
LICENSE.ncsa
classify.cpp
config.toml
corpus-gen.cpp
index.cpp
interactive-search.cpp
lda-topics.cpp
lda.cpp
meta.doxygen.in
online-classify.cpp
print_vocab.cpp
query-runner.cpp
readme.md
search-vocab.cpp
search.cpp
shuffle.cpp
style.md
travis-config.toml
unit-test.cpp
utf8-test.cpp

readme.md

MeTA: ModErn Text Analysis

Please visit our web page for information about MeTA!

Overview

MeTA is a modern C++ data sciences toolkit featuring

  • text tokenization, including deep semantic features like parse trees

  • inverted and forward indexes with compression and various caching strategies

  • various ranking functions for the indexes

  • topic modeling algorithms

  • language modeling algorithms

  • clustering and similarity algorithms

  • classification algorithms

  • wrappers for liblinear and slda

Doxygen documentation can be found here. Note that this is probably not as frequently updated as it should be.

Our current goal for MeTA is to publish in JMLR's Machine Learning Open-Source Software.

Build Status (by branch)

  • master: Build Status
  • develop: Build Status

Project setup

  • This project requires a very well conforming C++11 compiler. Currently, clang is the de-facto compiler for use with this project

  • Additionally, you will need a conformant implementation of the C++11 standard library and ABI---currently libc++ and libc++abi are the best options for this. See your distribution's package manager for more information on installing these dependencies.

  • Windows users: YMMV. It is not currently supported, but things may work. You will likely need Visual Studio 2013 for the C++11 features.

  • This project makes use of several git submodules. To initialize these, run

git submodule init
git submodule update
  • Once the submodules are instantiated, go to deps/libsvm-modules and run make in the liblinear and libsvm directories if you plan on using the svm_wrapper class.

  • To compile initially, run the following commands

mkdir build
cd build
# omit CXX=clang++ if you want to use your default compiler
CXX=clang++ cmake ../ -DCMAKE_BUILD_TYPE=Debug
make
  • There are rules for clean, tidy, and doc. (Also, once you run the cmake command once, you should be able to just run make like usual as you're developing---it'll detect when the CMakeLists.txt file has changed and rebuild Makefiles if it needs to.)