Skip to content

Better transliteration for index #140

@ernestmarcinko

Description

@ernestmarcinko

While it's impossible to store all variations for transliterated string in the index table, still the fully "unaccented" versions should be stored.

$transliterator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD);
$str = $transliterator->transliterate($str);
WARNING: the Transliterator class must be checked if exists, as intl package may not be enabled.
  • Str utils class should implement removeAccents method
  • Make sure removeAccents uses remove_accents() core function as well, so some degree of accent removal is present if Transliterator is not enabled
  • Implement removeAccents to tokenizer, compare original words with tokenized and append them to additional keywords
  • An option could be added into search logics to "Force remove accents" from input phrase - bad idea for now

Metadata

Metadata

Labels

featureNew feature or request

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions