-
Notifications
You must be signed in to change notification settings - Fork 309
Description
I'm currently using both Hunspell and SymSpell as main spelling correction system. They works both ok, SymSpell works great (quality, performances, etc.) That said, I have a question about Norvig probabilistic Spell Checker, that I show up with a simple case.
In some romanized languages, there is not one-to-one relation from the source script language term to the english (romanized) language term. So given that you have the romanization of let's say Hindi, you will get more possible english words as destination. Now this is a typical output of such a system: 1 (Hindi) word -> N (eng) words.
Typically decide which of the N words is the best is done with algorithm like beam search, viterbi, etc., but there are a lot of cases where the indecision stays on.
Also in other case, we have eng (N) -> hi (M), so this function is not bijective at all.
Given that a Spell Checker have knowledge of all (most of) the words in a language, etc. and supposed I need context (like in this case) to go back from eng (N) -> hi (M), do you think that SymSpell or Norvig's probabilistic model could give a valid hint about the M choices (or the N in the opposite way)? What's your opinion on that?