Icu4j filter plugin for Embulk
Unicode normalize string value.
Icu4j filter plugin for Embulk. see. http://site.icu-project.org/
Overview
- Plugin type: filter
Configuration
- key_names: target key names. (list, required)
- keep_input: keep input columns. (bool, default:
true
) - settings: settings. (list, required)
- suffix: output column name suffix. if null overwrite column. (string, default: null)
- transliterators: transliterator IDS(comma separated). see http://hondou.homedns.org/pukiwiki/pukiwiki.php?Java%20ICU4J. (string)
- case: upper or lower (string, default: null)
Example normalize NFKC
filters:
- type: icu4j
key_names:
- title
settings:
- { transliterators: 'Any-NFKC', case: upper }
Example
filters:
- type: icu4j
keep_input: false
key_names:
- catchcopy
settings:
- { suffix: _katakana, transliterators: 'Katakana-Hiragana,Fullwidth-Halfwidth', case: upper }
- { transliterators: 'Katakana-Hiragana', case: lower }
- { suffix: _romaji_lower, transliterators: 'Katakana-Hiragana,Hiragana-Latin', case: lower }
input
{
"catchcopy" : "ホゲホゲ"
}
As below
{
"catchcopy" : "ほげほげ",
"catchcopy_katakana" : "ホゲホゲ",
"catchcopy_romaji_lower" : "hogehoge"
}
transliterator rules
see. http://hondou.homedns.org/pukiwiki/pukiwiki.php?Java%20ICU4J
Build
$ ./gradlew gem # -t to watch change of files and rebuild continuously