Skip to content

information theory library: entropy, chisquared, Killback-Liebler, etc...

License

Notifications You must be signed in to change notification settings

weldongoree/libith

Repository files navigation

This is a library for computing entropy and other information-theory-related functions on data. Currently it will find the entropy, Kullback-Leibler, or chi squared of a file or stdin in bits, nats, or digits over the file's bits, bytes, ints, characters, or alphabetic words. 

As of December 2013 it "works" for many values of "works", but it's a ways away from a release.

Also, it is absurdly non-optimized as of now; I'm working on making sure it's correct at the expense of higher memory usage and slower i/o.

make will produce a shared library called libith.so.0.0 and a binary called "ith". make install will move these into /usr/local/bin and /usr/local/lib respectively, as well as create the symlink libith.so. make install understands DESTDIR, PREFIX, BINDIR, LIBDIR, MANDIR, INCLUDEDIR (though those only matter for the targets man-install and header-install, respectively).

ith takes the following commands and options:

Commands:
-e print the entropy of the input (this is the default if no command switch is given)
-c print the extent to which the possible alphabet domain was not covered by the sample (only works for numeric alphabets)
-k print the Kullback-Leibler divergence of the actual distribution from a uniform distribution over the sample's alphabet
-K like -k, but use the entire possible requested alphabet (see -c)
-x print the Pearson chi-squared against the null hypothesis of uniform distribution over the requested alphabet
-X like -x, but use the entire possible requested alphabet (see -c)
-p display the PMF calculated
-h usage
-A equivalent to -e -c -k -x

Options:
-a 1 | 8 | 16 | 32 | 64 | char | wchar | word | wword (width of alphabet; consider the file a sequence of bits, bytes, shorts, etc.)
-t be terse; simply print the answer (default prints an English sentence -- I should localize this at some point...)
-u 2 | 10 | e (compute information in bits, digits, and nats respectively; bits is default)
-f file (default reads from stdin)

ith is a combined driver for what I hope will end up being a more general shared library.

About

information theory library: entropy, chisquared, Killback-Liebler, etc...

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published