Compound aware automatic spelling correction
SymSpellCompound supports compound aware automatic spelling correction of multi-word input strings.
It is built on top of SymSpell's 1 million times faster spelling correction algorithm.
1. Compound splitting & decompounding
SymSpell assumed every input string as single term. SymSpellCompound supports compound splitting / decompounding with three cases:
- mistakenly inserted space within a correct word led to two incorrect terms
- mistakenly omitted space between two correct words led to one incorrect combined term
- multiple input terms with/without spelling errors
Splitting errors, concatenation errors, substitution errors, transposition errors, deletion errors and insertion errors can by mixed within the same word.
2. Automatic spelling correction
- Large document collections make manual correction infeasible and require unsupervised, fully-automatic spelling correction.
- In conventional spelling correction of a single token, the user is presented with spelling correction suggestions.
For automatic spelling correction of long multi-word text the the algorithm itself has to make an educated choice.
- whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixthgrade and ins pired him + where is the love he had dated for much of the past who couldn't read in sixth grade and inspired him (9 edits) - in te dhird qarter oflast jear he hadlearned ofca sekretplan y iran + in the third quarter of last year he had learned of a secret plan by iran (10 edits) - the bigjest playrs in te strogsommer film slatew ith plety of funn + the biggest players in the strong summer film slate with plenty of fun (9 edits) - Can yu readthis messa ge despite thehorible sppelingmsitakes + can you read this message despite the horrible spelling mistakes (9 edits)
0.2 milliseconds / word
5000 words / second (single core on 2012 Macbook Pro)
- Query correction (10–15% of queries contain misspelled terms),
- OCR post-processing,
- Automated proofreading.