-
Notifications
You must be signed in to change notification settings - Fork 1
Generating random sequences using n-grams in Haskell
License
valderman/dpress
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
** Contents 1. Dependencies and building 2. Running and dictionaries 3. About --- 1. Building and running To build the entire thing, simply run 'make'. dpress depends on the GHC Haskell compiler and will not build without it. DissociatedPress.Storage depends on the packages binary and zlib. To install these on Debian or Ubuntu, just do (as root): # apt-get install ghc libghc-{binary,zlib,utf8-string,random,network}-dev On Windows and other platforms where the official Haskell Platform is the preferred Haskell distribution, simply download the latest version of Haskell Platform from http://www.haskell.org. If you're building on Windows, you might not have make installed; if this is the case, you can build the software manually by running any of the following: ghc -O2 --make compile.hs ghc -O2 --make test.hs ghc -O2 --make server.hs ghc -O2 --make tget.hs ghc -O2 --make merge.hs This will build the dictionary compiler, conversation tester, TCP server, TCP client and dictionary merger respectively. The bare minimum to get anything out of the software would be building compile.hs and test.hs. --- 2. Running and dictionaries In order to do something useful, for certain very low values of 'useful', with this package, you first need to build yourself a dictionary. To do this, find a large-ish body of text (at the very least 2000 words or so, >10 000 strongly recommended) and compile it: $ ./compile < your-text-file.txt > resulting-dictionary.dict Presently, building dictionaries from text is a pretty memory intensive task. Building a dictionary of about a million words takes roughly 1.3 gigabytes of RAM. Loading the resulting 1M word dictionary requires roughly 600 megabytes of RAM. Both of these, particularly the dictionary building issue, are being worked on and will hopefully improve in the near future. Also, languages that don't use spaces to separate words (Japanese, Chinese, for example) are not supported. When you have a dictionary, you can run the conversation tester: $ ./test resulting-dictionary.dict Type something nice into it. You'll notice that the conversation tester learns from your input. This is not saved to the dictionary file though. You can also merge two or more dictionaries: $ ./merge dict1.dict dict2.dict dict3.dict The resulting dictionary will be called out.dict. If you want to provide a text generation service to other people or programs, or if you just want to keep a persistent state of the generator's learning, run the server instead: ./server resulting-dictionary.dict The server will be listening on port 1917. You can either telnet to it, connect using your own TCP code or use the provided tget utility to query it: $ ./tget localhost 1917 "is this the real life?" The query can also be given on stdin: $ echo 'is this just fantasy?' | ./tget localhost 1917 Like the conversation tester, the server will learn from your input, but will not save any changes to the dictionary. Please note that all executables in this archive are purely experimental and intended for testing. Thus, they're rough, primitive, unpolished and lack many features that you'd probably want in any software generating random sequences. The preferred way to interact with dpress is using it as a library, so run along and write your own software with it. --- 3. About dpress (henceforth "the software") is written by Anton Ekblad (anton@ekblad.cc) and is made available to you under the terms of the WTFPL version 2. The full text of the license can be found at http://sam.zoy.org/wtfpl/. The author(s) of the software do not accept any responsibility or liability for anything, related to the software or not. The software comes with no warranty whatsoever. Although this is not required by the license, the author(s) would be very happy if you contributed any improvements you make back to dpress.
About
Generating random sequences using n-grams in Haskell
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published