Skip to content

toyama0919/embulk-filter-icu4j

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Icu4j filter plugin for Embulk

Unicode normalize string value.

Icu4j filter plugin for Embulk. see. http://site.icu-project.org/

Overview

  • Plugin type: filter

Configuration

  • key_names: target key names. (list, required)
  • keep_input: keep input columns. (bool, default: true)
  • settings: settings. (list, required)

Example normalize NFKC

filters:
  - type: icu4j
    key_names:
      - title
    settings:
      - { transliterators: 'Any-NFKC', case: upper }

Example

filters:
  - type: icu4j
    keep_input: false
    key_names:
      - catchcopy
    settings:
      - { suffix: _katakana, transliterators: 'Katakana-Hiragana,Fullwidth-Halfwidth', case: upper }
      - { transliterators: 'Katakana-Hiragana', case: lower }
      - { suffix: _romaji_lower, transliterators: 'Katakana-Hiragana,Hiragana-Latin', case: lower }

input

{
    "catchcopy" : "ホゲホゲ"
}

As below

{
    "catchcopy" : "ほげほげ",
    "catchcopy_katakana" : "ホゲホゲ",
    "catchcopy_romaji_lower" : "hogehoge"
}

transliterator rules

see. http://hondou.homedns.org/pukiwiki/pukiwiki.php?Java%20ICU4J

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously

About

Normalize String for Embulk

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published