Skip to content

yskn67/embulk-filter-split_column

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Split Column filter plugin for Embulk

A filter plugin for Embulk to split one string column to several any type columns.

Configuration

  • delimiter: delimiter for split column (string, required, default: ',')
  • is_skip: if true, skip the line when output_columns num and split target column num are not matched. if false, throw the exception (boolean, optional, default: true)
  • target_key: string column key you want to split(string, required)
  • output_columns: description (array of hash, required)
    • This values is same for columns in parser

Example

Say input.csv is as follows:

id,account,time,purchase,comment
1,32864,2015-01-27 19:23:49,20150127,a|1|1.1|True|2016-01-26
2,14824,2015-01-27 19:01:23,20150127,b|2|2.2|False|2016-01-27
3,27559,2015-01-28 02:20:02,20150128,c|3|3.3|False|2016-01-28
4,11270,2015-01-29 11:54:36,20150129,d|4|4.4|True|2016-01-29

In parse phase, split by ',':

parser:
  type: csv
  delimiter: ','
  columns:
  - {name: id, type: long}
  - {name: account, type: long}
  - {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S'}
  - {name: purchase, type: timestamp, format: '%Y%m%d'}
  - {name: comment, type: string}
+---------+--------------+-------------------------+-------------------------+--------------------------+
| id:long | account:long |          time:timestamp |      purchase:timestamp |           comment:string |
+---------+--------------+-------------------------+-------------------------+--------------------------+
|       1 |       32,864 | 2015-01-27 19:23:49 UTC | 2015-01-27 00:00:00 UTC |  a|1|1.1|True|2016-01-26 |
|       2 |       14,824 | 2015-01-27 19:01:23 UTC | 2015-01-27 00:00:00 UTC | b|2|2.2|False|2016-01-27 |
|       3 |       27,559 | 2015-01-28 02:20:02 UTC | 2015-01-28 00:00:00 UTC | c|3|3.3|False|2016-01-28 |
|       4 |       11,270 | 2015-01-29 11:54:36 UTC | 2015-01-29 00:00:00 UTC |  d|4|4.4|True|2016-01-29 |
+---------+--------------+-------------------------+-------------------------+--------------------------+

In additionally, you want to split comment by '|' using split_column filter:

filters:
  - type: split_column
    delimiter: '|'
    is_skip: true
    target_key: comment
    output_columns:
      - {name: alph, type: string}
      - {name: num, type: long}
      - {name: dbl, type: double}
      - {name: bool, type: boolean}
      - {name: ts, type: timestamp, format: '%Y-%m-%d'}
+---------+--------------+-------------------------+-------------------------+-------------+----------+------------+--------------+-------------------------+
| id:long | account:long |          time:timestamp |      purchase:timestamp | alph:string | num:long | dbl:double | bool:boolean |            ts:timestamp |
+---------+--------------+-------------------------+-------------------------+-------------+----------+------------+--------------+-------------------------+
|       1 |       32,864 | 2015-01-27 19:23:49 UTC | 2015-01-27 00:00:00 UTC |           a |        1 |        1.1 |         true | 2016-01-26 00:00:00 UTC |
|       2 |       14,824 | 2015-01-27 19:01:23 UTC | 2015-01-27 00:00:00 UTC |           b |        2 |        2.2 |        false | 2016-01-27 00:00:00 UTC |
|       3 |       27,559 | 2015-01-28 02:20:02 UTC | 2015-01-28 00:00:00 UTC |           c |        3 |        3.3 |        false | 2016-01-28 00:00:00 UTC |
|       4 |       11,270 | 2015-01-29 11:54:36 UTC | 2015-01-29 00:00:00 UTC |           d |        4 |        4.4 |         true | 2016-01-29 00:00:00 UTC |
+---------+--------------+-------------------------+-------------------------+-------------+----------+------------+--------------+-------------------------+

Todo

  • Write Test
  • Support default value

Version

  • 0.1.0: first release
  • 0.1.1: bugfix
  • 0.1.2: add confing option 'is_skip'
  • 0.1.3: add failed log to exception msg

Build

$ ./gradlew gem

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published