Embulk parser plugin for URL-encoded key value pairs
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
gemfiles
lib/embulk
test
.gitignore
.ruby-version
.travis.yml
.travis.yml.erb
CHANGELOG.md
Gemfile
LICENSE
LICENSE.txt
README.md
Rakefile
embulk-parser-query_string.gemspec
partial-config.yml

README.md

Build Status Code Climate Test Coverage

Query String parser plugin for Embulk

Transform key=value&key2=value2 line to {key: "value", key2: "value2"}. (HTTP Query String to Hash)

Currently, this plugin supports minimum case, some edge cases are unsupported as below.

  • Duplicated key (e.g. key=1&key=2)
  • Array parameter (e.g. key[]=1&key[]=2)

Overview

Required Embulk version >= 0.7.2.

NOTE: If you use Embulk < 0.7, you should use embulk-parser-query_string <= 0.1.3.

  • Plugin type: parser
  • Guess supported: yes

Configuration

  • strip_quote: If you have quoted lines file such as "foo=FOO&bar=BAR", should be true for strip their quotes. (bool, default: true)
  • strip_whitespace: Strip whitespace before parsing lines for any indented line parse correctly such as ' foo=FOO'. (bool, default: true)
  • capture: Capture valuable text from each line using Regexp. Matched first pattern (a.k.a $1) will be used. See also partial-config.yml (string, default: nil)

Example

You have such text file (target_file.txt) as below:

"user_id=42&some_param=ABC"
"user_id=43&some_param=EFG"
"user_id=44&some_param=XYZ"

And you have partial-config.yml as below:

in:
  type: file
  path_prefix: ./target_file
  parser:
    strip_quote: true
    strip_whitespace: true
exec: {}
out: {type: stdout}

Run embulk guess.

$ embulk guess -g query_string partial-config.yml -o guessed.yml

You got guessed.yml as below:

in:
  type: file
  path_prefix: ./target_file
  parser:
    strip_quote: true
    strip_whitespace: true
    charset: ISO-8859-2
    newline: CRLF
    type: query_string
    columns:
    - {name: user_id, type: long}
    - {name: some_param, type: string}
exec: {}
out: {type: stdout}

Finally, embulk run with generated guessed.yml.

$ embulk run guessed.yml

You can see the parsed records on STDOUT.

Install plugin

$ embulk gem install embulk-parser-query_string