GitHub - valderman/dpress: Generating random sequences using n-grams in Haskell

valderman / dpress Public

Notifications You must be signed in to change notification settings
Fork 1
Star 4

Generating random sequences using n-grams in Haskell

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
DissociatedPress		DissociatedPress
DissociatedPress.hs		DissociatedPress.hs
LICENSE		LICENSE
Makefile		Makefile
README		README
Setup.hs		Setup.hs
compile.hs		compile.hs
dpress.cabal		dpress.cabal
merge.hs		merge.hs
server.hs		server.hs
test.hs		test.hs
tget.hs		tget.hs

Repository files navigation

** Contents

1. Dependencies and building
2. Running and dictionaries
3. About

---


1. Building and running

To build the entire thing, simply run 'make'.
dpress depends on the GHC Haskell compiler and will not build without it.
DissociatedPress.Storage depends on the packages binary and zlib.

To install these on Debian or Ubuntu, just do (as root):
# apt-get install ghc libghc-{binary,zlib,utf8-string,random,network}-dev

On Windows and other platforms where the official Haskell Platform is the
preferred Haskell distribution, simply download the latest version of
Haskell Platform from http://www.haskell.org.

If you're building on Windows, you might not have make installed; if this is
the case, you can build the software manually by running any of the following:
ghc -O2 --make compile.hs
ghc -O2 --make test.hs
ghc -O2 --make server.hs
ghc -O2 --make tget.hs
ghc -O2 --make merge.hs

This will build the dictionary compiler, conversation tester, TCP server, TCP
client and dictionary merger respectively. The bare minimum to get anything out
of the software would be building compile.hs and test.hs.

---


2. Running and dictionaries

In order to do something useful, for certain very low values of 'useful', with
this package, you first need to build yourself a dictionary. To do this, find
a large-ish body of text (at the very least 2000 words or so, >10 000 strongly
recommended) and compile it:
$ ./compile < your-text-file.txt > resulting-dictionary.dict

Presently, building dictionaries from text is a pretty memory intensive task.
Building a dictionary of about a million words takes roughly 1.3 gigabytes of
RAM. Loading the resulting 1M word dictionary requires roughly 600 megabytes of
RAM. Both of these, particularly the dictionary building issue, are being
worked on and will hopefully improve in the near future.

Also, languages that don't use spaces to separate words (Japanese, Chinese, for
example) are not supported.

When you have a dictionary, you can run the conversation tester:
$ ./test resulting-dictionary.dict

Type something nice into it. You'll notice that the conversation tester learns
from your input. This is not saved to the dictionary file though.

You can also merge two or more dictionaries:
$ ./merge dict1.dict dict2.dict dict3.dict

The resulting dictionary will be called out.dict.

If you want to provide a text generation service to other people or programs,
or if you just want to keep a persistent state of the generator's learning,
run the server instead:
./server resulting-dictionary.dict

The server will be listening on port 1917. You can either telnet to it, connect
using your own TCP code or use the provided tget utility to query it:
$ ./tget localhost 1917 "is this the real life?"

The query can also be given on stdin:
$ echo 'is this just fantasy?' | ./tget localhost 1917

Like the conversation tester, the server will learn from your input, but will
not save any changes to the dictionary.

Please note that all executables in this archive are purely experimental and
intended for testing. Thus, they're rough, primitive, unpolished and lack many
features that you'd probably want in any software generating random sequences.
The preferred way to interact with dpress is using it as a library, so run
along and write your own software with it.

---


3. About

dpress (henceforth "the software") is written by Anton Ekblad (anton@ekblad.cc)
and is made available to you under the terms of the WTFPL version 2.
The full text of the license can be found at http://sam.zoy.org/wtfpl/.

The author(s) of the software do not accept any responsibility or liability for
anything, related to the software or not. The software comes with no warranty
whatsoever.

Although this is not required by the license, the author(s) would be very happy
if you contributed any improvements you make back to dpress.