Examples

This document contains examples of usage for the SymSpellCppPy library. This library is used for dictionary loading, spelling correction, and error fixing.

Loading the dictionary

import SymSpellCppPy
symSpell = SymSpellCppPy.SymSpell()
symSpell.load_dictionary(corpus="resources/frequency_dictionary_en_82_765.txt", term_index=0, count_index=1, separator=" ")

Checking dictionary properties

The SymSpell class provides methods to inspect the loaded dictionary:

To check the number of words in the dictionary, use the word_count() method:

print(symSpell.word_count())  # Outputs: 82781

To find the length of the longest word in the dictionary, use the max_length() method:

print(symSpell.max_length())  # Outputs: 28

To count the number of unique delete combinations formed, use the entry_count() method:

print(symSpell.entry_count())  # Outputs: 661047

Spelling correction

The lookup method allows you to find the correct spelling for a term from the dictionary:

To find the closest spelling, use SymSpellCppPy.Verbosity.CLOSEST:

terms = symSpell.lookup("tke", SymSpellCppPy.Verbosity.CLOSEST)
print(terms[0].term)  # Outputs: "take"

You can also specify a max_edit_distance to limit the search to terms within a certain edit distance:

terms = symSpell.lookup("extrine", SymSpellCppPy.Verbosity.CLOSEST, max_edit_distance=2)
print(terms[0].term)  # Outputs: "extreme"

terms = symSpell.lookup("extrine", SymSpellCppPy.Verbosity.CLOSEST, max_edit_distance=1)
print(terms)  # Outputs: []

Error fixing

SymSpellCppPy also includes features to fix compound errors and word segmentation issues in sentences:

To fix compound errors in a sentence, use the lookup_compound method:

terms = symSpell.lookup_compound("whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixthgrade and ins pired him")
print(terms[0].term)
# Outputs: "whereas to love head dated for much of theist who couldn't read in sixth grade and inspired him"

To correct word segmentation issues in a sentence, use the word_segmentation method:

segmented_info = symSpell.word_segmentation("thequickbrownfoxjumpsoverthelazydog")
print(segmented_info.segmented_string)
# Outputs: "the quick brown fox jumps over the lazy dog"

segmented_info = symSpell.word_segmentation("thequickbrownfoxjumpsoverthelazydog")
print(segmented_info.corrected_string)
# Outputs: "they quick brown fox jumps over therapy dog"

Saving and Loading SymSpell object

To save the internal representation of a loaded SymSpell for fast reuse next time, use the save_pickle method. Do not use pickle natively:

symSpell.save_pickle("symspell_binary.bin")

To load the internal representation of a loaded SymSpell from a saved binary, use the load_pickle method:

anotherSymSpell = SymSpellCppPy.SymSpell()
anotherSymSpell.load_pickle("symspell_binary.bin")
terms = anotherSymSpell.lookup("tke", SymSpellCppPy.Verbosity.CLOSEST)
print(terms[0].term)

Bigram and Trigram Suggestions

The SymSpellCppPy library also supports generating bigram and trigram suggestions:

# To generate bigram suggestions, use the `lookup_bigram` method:
terms = symSpell.lookup_bigram("in te dh", SymSpellCppPy.Verbosity.CLOSEST)
print(terms[0].term)  # Outputs: "in the dark"

# To generate trigram suggestions, use the `lookup_trigram` method:
terms = symSpell.lookup_trigram("an plesant day", SymSpellCppPy.Verbosity.CLOSEST)
print(terms[0].term)  # Outputs: "a pleasant day"

Top N suggestions

You can also request the top N suggestions for a given word:

# To get the top 5 closest terms to a given word, use the `TOP` verbosity:
terms = symSpell.lookup("huse", SymSpellCppPy.Verbosity.TOP, max_edit_distance=2, include_unknown=True)
for term in terms[:5]:
    print(term.term)
# Outputs: "house", "use", "hue", "hues", "hose"

Ignoring case and digits

By default, SymSpellCppPy is case-sensitive and considers digits as valid characters. However, you can modify this behavior:

# To ignore case when checking a term, use the `ignore_case` parameter:
terms = symSpell.lookup("THe", SymSpellCppPy.Verbosity.CLOSEST, ignore_case=True)
print(terms[0].term)  # Outputs: "the"

# To ignore digits when checking a term, use the `ignore_digit` parameter:
terms = symSpell.lookup("3rd", SymSpellCppPy.Verbosity.CLOSEST, ignore_digit=True)
print(terms[0].term)  # Outputs: "red"

Ignoring words with numbers

You may also choose to ignore words containing numbers:

# To ignore words with numbers when checking a term, use the `ignore_word_with_number` parameter:
terms = symSpell.lookup("l33t", SymSpellCppPy.Verbosity.CLOSEST, ignore_word_with_number=True)
print(terms[0].term)  # Outputs: "let"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examples.rst

Examples.rst

Examples

Loading the dictionary

Checking dictionary properties

Spelling correction

Error fixing

Saving and Loading SymSpell object

Bigram and Trigram Suggestions

Top N suggestions

Ignoring case and digits

Ignoring words with numbers

Files

Examples.rst

Latest commit

History

Examples.rst

File metadata and controls

Examples

Loading the dictionary

Checking dictionary properties

Spelling correction

Error fixing

Saving and Loading SymSpell object

Bigram and Trigram Suggestions

Top N suggestions

Ignoring case and digits

Ignoring words with numbers