Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Simple library to find the longest recognised name in a piece of text.
Ruby
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib
perf
test
.gitignore
COPYING.txt
Gemfile
README.md
Rakefile
name_finder.gemspec

README.md

Name Finder

Find names from a know list in a text, taking account of names that may overlap. For example, Waterloo and Waterloo East are separate stations; NameFinder, knowing both, will not give a false match for Waterloo in a text that mentions Waterloo East.

Examples

require "name_finder"

stations = [
  "Bermondsey",
  "South Bermondsey",
  "Southwark",
  "Waterloo",
  "Waterloo East"
]

nf = NameFinder.new
stations.each do |station|
  nf.add station
end

It can find the best matching name even when one name is the same as part of another, whether they overlap at the start:

nf.find_in "Change here for trains from Waterloo East"
# => "Waterloo East"

nf.find_in "This train terminates at Waterloo"
# => "Waterloo"

or at the end:

nf.find_in "Escalator closed at Bermondsey station"
# => "Bermondsey"

nf.find_in "Use South Bermondsey station for Millwall FC"
# => "South Bermondsey"

It can also find all the matching names, without false positives for names that are part of a longer name:

nf.find_all_in "South Bermondsey and Waterloo East"
# => ["South Bermondsey", "Waterloo East"]

Names that are part of a longer name are still found when listed separately, however:

nf.find_all_in "South Bermondsey and Bermondsey"
# => ["South Bermondsey", "Bermondsey"]

Limitations

The present implementation handles only the letters A-Z. This can be customised by subclassing NameFinder and changing the implementation of normalize. The normalize method must use the same delimiter between words as is returned by the delimiter method (normally a single space).

Something went wrong with that request. Please try again.