Skip to content

Commit

Permalink
add a lot of content and fix the example in the usage docs page
Browse files Browse the repository at this point in the history
  • Loading branch information
taleinat committed Apr 11, 2020
1 parent b252b5c commit 55fc21e
Showing 1 changed file with 55 additions and 8 deletions.
63 changes: 55 additions & 8 deletions docs/usage.rst
Expand Up @@ -10,14 +10,53 @@ parameters:

.. code:: python
>>> from fuzzysearch import find_near_matches
>>> from fuzzysearch.generic_search import find_near_matches_generic_ngrams
>>> find_near_matches('PATTERN', 'aaaPATERNaaa', max_l_dist=1)
[Match(start=3, end=9, dist=1, matched='PATERN')]
Advanced Example
----------------
If needed you can choose a specific search implementation, such as
`find_near_matches_with_ngrams()`:
Internal Functions
------------------
If needed you can choose a specific internal search implementation. These are
not as easy to use, since they do not use the same interface as exposed by
``find_near_matches()``. There is a complete working example below.

Many of these functions require passing the fuzzy matching parameters as a
``LevenshteinSearchParams`` object, which can be created after importing the
class from ``fuzzysearch.common``.

Many of these functions also often return many overlapping search results.
These may be consolidated using the ``consolidate_overlapping_matches()``
function, also found in the ``fuzzysearch.common`` module.

Finally, some of these functions are generator functions rather than returning
a list. Wrap the call with ``list()`` if needed.

The available internal functions, by module:

* ``fuzzysearch.search_exact``
* ``search_exact(subsequence, sequence, start_index=0, end_index=None)``
* ``fuzzysearch.generic_search``: Supports specifying any combination of fuzzy matching limitations.
* ``find_near_matches_generic``
* ``find_near_matches_generic_linear_programming``
* ``find_near_matches_generic_ngrams``
* ``has_near_match_generic_ngrams``
* ``fuzzysearch.levenshtein``: Supports only specifying the max. distance.
* ``find_near_matches_levenshtein``
* ``find_near_matches_levenshtein_linear_programming``
* ``find_near_matches_levenshtein_ngrams``
* ``fuzzysearch.substitutions_only``: Allow only substitutions (fast!).
* ``find_near_matches_substitutions()``
* ``has_near_match_substitutions()``
* ``find_near_matches_substitutions_lp()``
* ``find_near_matches_substitutions_ngrams()``
* ``has_near_match_substitutions_ngrams()``
* ``fuzzysearch.no_deletions``: Slightly faster when deletions are not allowed.
* ``find_near_matches_no_deletions_ngrams()``

Internal Function Usage Example
+++++++++++++++++++++++++++++++

An example of using ``find_near_matches_generic_ngrams()``:

.. code:: python
Expand All @@ -29,6 +68,14 @@ If needed you can choose a specific search implementation, such as
>>> subsequence = 'TGCACTGTAGGGATAACAAT' #distance 1
>>> max_distance = 2
>>> from fuzzysearch import find_near_matches_with_ngrams
>>> find_near_matches_with_ngrams(subsequence, sequence, max_distance)
[Match(start=3, end=24, dist=1, matched='TAGCACTGTAGGGATAACAAT')]
>>> from fuzzysearch.generic_search import find_near_matches_generic_ngrams
>>> from fuzzysearch import LevenshteinSearchParams
>>> params = LevenshteinSearchParams(max_l_dist=max_distance)
# note: this will return many overlapping results
>>> results = find_near_matches_generic_ngrams(subsequence, sequence, params)
>>> len(results)
16
# consolidate the overlapping results, keeping a "good" one from each group
>>> from fuzzysearch.common import consolidate_overlapping_matches
>>> consolidate_overlapping_matches(results)
>>> [Match(start=3, end=24, dist=1, matched='TAGCACTGTAGGGATAACAAT')]

0 comments on commit 55fc21e

Please sign in to comment.