Skip to content

[Question] About SymSpell model and probabilistic models (Norvig, etc.) #61

@loretoparisi

Description

@loretoparisi

I'm currently using both Hunspell and SymSpell as main spelling correction system. They works both ok, SymSpell works great (quality, performances, etc.) That said, I have a question about Norvig probabilistic Spell Checker, that I show up with a simple case.
In some romanized languages, there is not one-to-one relation from the source script language term to the english (romanized) language term. So given that you have the romanization of let's say Hindi, you will get more possible english words as destination. Now this is a typical output of such a system: 1 (Hindi) word -> N (eng) words.
Typically decide which of the N words is the best is done with algorithm like beam search, viterbi, etc., but there are a lot of cases where the indecision stays on.
Also in other case, we have eng (N) -> hi (M), so this function is not bijective at all.
Given that a Spell Checker have knowledge of all (most of) the words in a language, etc. and supposed I need context (like in this case) to go back from eng (N) -> hi (M), do you think that SymSpell or Norvig's probabilistic model could give a valid hint about the M choices (or the N in the opposite way)? What's your opinion on that?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions