Skip to content

A pure Ruby implementation of the Aho-Corasick string matching algorithm

License

Notifications You must be signed in to change notification settings

timcowlishaw/aho_corasick

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aho-Corasick string matching

The Aho-Corasick string matching algorithm will find all instances of a set of terms that are substrings of a certain string in constant time with respect to the length of the string. It does this by pre-building a NFA-like structure which it uses to efficiently parse the string for all the terms in the set to be matched against.

Our implementation is written in pure Ruby, and is used like so:

require 'aho_corasick'
matcher = AhoCorasick.new("woodchuck", "chuck", "could")
matcher.match("How much wood would a woodchuck chuck if a woodchuck could chuck wood.")
#=> ["woodchuck", "chuck", "woodchuck", "could", "chuck"]

You can insert additional terms into the matcher after instantiation, however each call to insert takes linear time with respect to the total number of terms to be matched against. You can call insert passing multiple terms to mitigate against this:

matcher.insert("would", "wood")
matcher.match("How much wood would a woodchuck chuck if a woodchuck could chuck wood.")
#=> ["wood", "would", "wood", "woodchuck", "chuck", "wood", "woodchuck", "could", "chuck", "wood"] 

To install: gem install aho_corasick or add gem "aho_corasick" to your Gemfile. We've tested it with Ruby 1.9.2, but there's nothing in there that should stop it working with any other Ruby. If you find any bugs, let us know - Feature requests and pull requests also welcome!

About

A pure Ruby implementation of the Aho-Corasick string matching algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages