Skip to content

Commit

Permalink
Cookbook: fuzzy string search. Closes #176.
Browse files Browse the repository at this point in the history
  • Loading branch information
onyxfish committed Sep 4, 2015
1 parent 14f41d1 commit 3ccd5e7
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 5 deletions.
1 change: 1 addition & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
0.7.0
-----

* Cookbook: fuzzy string search example. (#176)
* Values to coerce to true/false can now be overridden for BooleanType.
* Values to coerce to null can now be overridden for all ColumnType subclasses. (#206)
* Add key_type argument to TableSet and Table.group_by. (#205)
Expand Down
1 change: 1 addition & 0 deletions docs/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Cookbook
cookbook/basics
cookbook/filtering
cookbook/sorting
cookbook/searching
cookbook/statistics
cookbook/calculations
cookbook/ranking
Expand Down
8 changes: 3 additions & 5 deletions docs/cookbook/calculations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ Implementing Levenshtein requires writing a custom :class:`.Computation`. To sav
.. code-block:: python
import agate
from Levenshtein.StringMatcher import StringMatcher
from Levenshtein import distance
import six
class LevenshteinDistance(agate.Computation):
Expand All @@ -101,7 +101,7 @@ Implementing Levenshtein requires writing a custom :class:`.Computation`. To sav
"""
def __init__(self, column_name, compare_string):
self._column_name = column_name
self._matcher = StringMatcher(seq2=six.text_type(compare_string))
self._compare_string = compare_string
def get_computed_column_type(self, table):
"""
Expand All @@ -127,9 +127,7 @@ Implementing Levenshtein requires writing a custom :class:`.Computation`. To sav
if val is None:
return None
self._matcher.set_seq1(val)
return self._matcher.distance()
return distance(val, self._compare_string)
This code can now be applied to any :class:`.Table` just as any other :class:`.Computation` would be:
Expand Down
27 changes: 27 additions & 0 deletions docs/cookbook/searching.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
=========
Searching
=========

Basic search
============

Find all individuals with the last_name "Groskopf":

.. code-block:: python
family = table.where(lambda r: r['last_name'] == 'Groskopf')
Fuzzy search by edit distance
=============================

Using an `existing Python library <https://pypi.python.org/pypi/python-Levenshtein/>`_ for computing the `Levenshtein edit distance <https://en.wikipedia.org/wiki/Levenshtein_distance>`_ it is trivially easy to implement a fuzzy string search.

For example, to find all names within 2 edits of "Groskopf":

.. code-block:: python
from Levenshtein import distance
fuzzy_family = table.where(lambda r: distance(r['last_name'], 'Groskopf') <= 2)
These results will now include all those "Grosskopfs" and "Groskoffs" whose mail I am always getting.

0 comments on commit 3ccd5e7

Please sign in to comment.