Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Elasticsearch input plugin for Embulk Build Status Gem Version

Overview

  • Plugin type: input
  • Resume supported: yes
  • Cleanup supported: yes
  • Guess supported: no

Configuration

  • nodes: nodes (array, required)
    • host: host (string, required)
    • port: port (integer, required)
  • queries: lucene query array. (array, required)
  • index: index (string, required)
  • index_type: index_type (string)
  • request_timeout: request timeout (integer)
  • per_size: per size query. (integer, required, default: 1000)
  • limit_size: limit size unit query. (integer, default: unlimit)
  • num_threads: number of threads for queries. (integer, default: 1)
  • retry_on_failure: retry on failure. set 0 is retry forever. (integer, default: 5)
  • sort: sort order. (hash, default: nil)
  • scroll: scroll. to keep the search context. (string, default: '1m')
  • fields: fields (array, required)
    • name: name (string, required)
    • type: type (string, required)
    • metadata: metadata (boolean, default: false)
    • time_format: time_format (string)

Example

in:
  type: elasticsearch
  nodes:
    - {host: localhost, port: 9200}
  queries:
    - 'page_type: HP'
    - 'page_type: GP'
  index: crawl
  index_type: m_corporation_page
  request_timeout: 60
  per_size: 1000
  limit_size: 200000
  num_threads: 2
  sort:
    m_corporation_id: desc
    employee_range: asc
  fields:
    - { name: _id, type: string, metadata: true }
    - { name: _type, type: string, metadata: true }
    - { name: _index, type: string, metadata: true }
    - { name: _score, type: double, metadata: true }
    - { name: page_type, type: string }
    - { name: corp_name, type: string }
    - { name: corp_key, type: string }
    - { name: title, type: string }
    - { name: body, type: string }
    - { name: url, type: string }
    - { name: employee_range, type: long }
    - { name: m_corporation_id, type: long }
    - { name: cg_lv1, type: json }
    - { name: cg_lv2, type: json }
    - { name: cg_lv3, type: json }

Support Type

  • string
  • long
  • double
  • timestamp
  • json
  • boolean

test

setup

curl -o embulk.jar --create-dirs -L "http://dl.embulk.org/embulk-latest.jar"
chmod +x embulk.jar
./embulk.jar gem install bundler
./embulk.jar bundle install --path vendor/bundle

run test

./embulk.jar bundle exec rake test

Build

$ rake

About

Elasticsearch input plugin for Embulk. parallel query support.

Topics

Resources

License

Packages

No packages published

Languages