GitHub - zverok/stempunk: Experimenting on understanding Snowball stemmers, in Ruby

This repo contains an attempt (not very impressive, so far) to deconstruct the Snowball stemmers using Ruby.

The goals of the investigation were:

understand better how the stemmers work;
understand better Snowball programming language;
experiment with a different DSL/language to describe such algorithms.

I used Ruby as my primary "tool of thinking," and while goals (1) and (2) were successfully met, I am not impressed with my own attempts towards goal (3), and starting to think that it is not something that should be approached the way I did.

Anyway, publishing the (intermediate?) result here for posterity.

Contents of the repo:

sbl/: original stemmers in Snowball language;
data/: stemmer-testing data from the snowball-data repo;
ruby/: code of stemmers ported to Ruby:
- english_naive.rb: very straitforward word-by-word port, mostly for my own understanding that I learned to read Snowball algorithms correctly;
- english_clean.rb and ukrainian_clean.rb: DSL-y ports (no cool metaprogramming, just several helper methods to try and express the algorithm declartively);
- base.rb: those base methods/DSL definitions;
test/: code for testing ports; super-naive scripts that are used like this:
- ruby test/test_word.rb ruby/english_clean.rb international: treats the first argument as a path to a stemmer implementaiton, and the second as a word to stem, runs stemming in debug mode (with output of intermediate steps)
- ruby test/test_dictionary.rb ruby/english_clean.rb: full-dictionary test, guesses the test data by the script name, and takes from data/<lang_name>/voc.txt (input words) and output.txt (expected stemmings), writes output to test/out/<lang>_<algo>.txt

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
ruby		ruby
sbl		sbl
test		test
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

zverok/stempunk

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages