Zach Bornheimer's Morpheme Extraction Mechanism
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
corpus @ 839da3d
test-corpus
.gitignore
.gitmodules
LICENSE
Makefile
README
alphabet.c
alphabet.h
constants.h
data_types.h
directory.c
directory.h
externs.h
file.c
file.h
functions.c
functions.h
macros.h
morpheme_list_t.c
morpheme_list_t.h
morpheme_t.c
morpheme_t.h
ngram_t.c
ngram_t.h
nlp.c
nlp.h
structs.h
word_t.c
word_t.h

README

Morpheme Extraction System
==========================

This software allows for the programmatic 
extraction of morpheme candidates from a 
corpus into a defined morpheme-list location.

Licensed under the GPLv2.

If you change something or get something to 
work better, please let me know it will help
me improve in C and will help the project :-)

Research Paper that accompanied this project is coming soon.

Software Required for Functionality:
    gcc (with OpenMP compatibility enabled)
    make
    
How to install?
Choose one of the following:
    make optimized
    make debug
    make all
    

Command-line Arguments:

Verbose Mode:      --verbose
Serial Processing: --serial or --sequential --process-sequentially
Full Processing:   --process
Output File:       --output-file REL-FILE-PATH
Corpus Dir:        --corpus-dir  REL-CORPUS-PATH

where REL-FILE-PATH and REL-CORPUS-PATH are relative paths to a
desired filename and/or corpus directory.

Verbose Mode gives more visual output, however it impacts speed.

Serial Processing yields data results for each file process as
    opposed to a conglomerate data processing experience :)

Full Processing yields serial and sequential results as if you
    were to have run the program with --serial the first time
    and then a second time without that flag.

Output File is the place in which data results are appended
    (it won't overwrite existing data).

Corpus Dir is the place where all the files that need to be
    processed reside.