Simple library to find the longest recognised name in a piece of text.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Name Finder

Find names from a know list in a text, taking account of names that may overlap. For example, Waterloo and Waterloo East are separate stations; NameFinder, knowing both, will not give a false match for Waterloo in a text that mentions Waterloo East.


require "name_finder"

stations = [
  "South Bermondsey",
  "Waterloo East"

nf =
stations.each do |station|
  nf.add station

It can find the best matching name even when one name is the same as part of another, whether they overlap at the start:

nf.find_in "Change here for trains from Waterloo East"
# => "Waterloo East"

nf.find_in "This train terminates at Waterloo"
# => "Waterloo"

or at the end:

nf.find_in "Escalator closed at Bermondsey station"
# => "Bermondsey"

nf.find_in "Use South Bermondsey station for Millwall FC"
# => "South Bermondsey"

It can also find all the matching names, without false positives for names that are part of a longer name:

nf.find_all_in "South Bermondsey and Waterloo East"
# => ["South Bermondsey", "Waterloo East"]

Names that are part of a longer name are still found when listed separately, however:

nf.find_all_in "South Bermondsey and Bermondsey"
# => ["South Bermondsey", "Bermondsey"]


The present implementation handles only the letters A-Z. This can be customised by subclassing NameFinder and changing the implementation of normalize. The normalize method must use the same delimiter between words as is returned by the delimiter method (normally a single space).