Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Icu4j filter plugin for Embulk

Unicode normalize string value.

Icu4j filter plugin for Embulk. see. http://site.icu-project.org/

Overview

  • Plugin type: filter

Configuration

  • key_names: target key names. (list, required)
  • keep_input: keep input columns. (bool, default: true)
  • settings: settings. (list, required)

Example normalize NFKC

filters:
  - type: icu4j
    key_names:
      - title
    settings:
      - { transliterators: 'Any-NFKC', case: upper }

Example

filters:
  - type: icu4j
    keep_input: false
    key_names:
      - catchcopy
    settings:
      - { suffix: _katakana, transliterators: 'Katakana-Hiragana,Fullwidth-Halfwidth', case: upper }
      - { transliterators: 'Katakana-Hiragana', case: lower }
      - { suffix: _romaji_lower, transliterators: 'Katakana-Hiragana,Hiragana-Latin', case: lower }

input

{
    "catchcopy" : "ホゲホゲ"
}

As below

{
    "catchcopy" : "ほげほげ",
    "catchcopy_katakana" : "ホゲホゲ",
    "catchcopy_romaji_lower" : "hogehoge"
}

transliterator rules

see. http://hondou.homedns.org/pukiwiki/pukiwiki.php?Java%20ICU4J

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously

About

Normalize String for Embulk

Resources

License

Releases

No releases published

Packages

No packages published