Azure Search output plugin for Embulk

embulk-output-azuresearch is an embulk output plugin that dumps records to Azure Search. Embulk is a open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services. See Embulk documentation for details.

Overview

Plugin type: output
Load all or nothing: no
Resume supported: no
Cleanup supported: yes

Installation

$ gem install fluent-plugin-azuresearch

Configuration

Azure Search

To use Microsoft Azure Search, you must create an Azure Search service in the Azure Portal. Also you must have an index, persisted storage of documents to which embulk-output-azuresearch writes event stream out. Here are instructions:

Sample Index Schema: sampleindex01

{
    "name": "sampleindex01",
    "fields": [
        { "name":"id", "type":"Edm.String", "key": true, "searchable": false },
        { "name":"title", "type":"Edm.String", "analyzer":"en.microsoft" },
        { "name":"speakers", "type":"Edm.String" },
        { "name":"url", "type":"Edm.String", "searchable": false, "filterable":false, "sortable":false, "facetable":false },
        { "name":"text", "type":"Edm.String", "filterable":false, "sortable":false, "facetable":false, "analyzer":"en.microsoft" }
    ]
}

Embulk Configuration (config.yml)

out:
  type: azuresearch
  endpoint: https://yoichikademo.search.windows.net
  api_key:  9E55964F8254BB4504DX3F66A39AF5EB
  search_index: sampleindex01
  column_names: id,title,speakers,text,url
  key_names: id,title,speakers,description,link

endpoint (required) - Azure Search service endpoint URI
api_key (required) - Azure Search API key
search_index (required) - Azure Search Index name to insert records
column_names (required) - Column names in a target Azure search index. Each column needs to be separated by a comma.
key_names (optional) - Default:nil. Key names in incomming record to insert. Each key needs to be separated by a comma. By default, key_names is as same as column_names

Sample Configurations

(1) Case: column_names and key_names are same

Suppose you have the following config.yml and sample azure search index schema written above:

config.yml

in:
  type: file
  path_prefix: samples/sample_01.csv
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: ','
    quote: '"'
    escape: '"'
    null_string: 'NULL'
    trim_if_not_quoted: false
    skip_header_lines: 1
    allow_extra_columns: false
    allow_optional_columns: false
    columns:
    - {name: id, type: string}
    - {name: title, type: string}
    - {name: speakers, type: string}
    - {name: text, type: string}
    - {name: url, type: string}
out:
  type: azuresearch
  endpoint: https://yoichikademo.search.windows.net
  api_key:  9E55964F8254BBXX04D53F66A39AF5EB
  search_index: sampleindex01
  column_names: id,title,speakers,text,url

The plugin will dump records out to Azure Ssearch like this:

Input CSV

id,title,speakers,text,url
1,Moving to the Cloud,Narayan Annamalai,Benefits of moving your applications to cloud,https://s.ch9.ms/Events/Build/2016/P576
2,Building Big Data Applications using Spark and Hadoop,Maxim Lukiyanov,How to leverage Spark to build intelligence into your application,https://s.ch9.ms/Events/Build/2016/P420
3,Service Fabric Deploying and Managing Applications with Service Fabric,Chacko Daniel,Service Fabric deploys and manages distributed applications built as microservices,https://s.ch9.ms/Events/Build/2016/P431

Output JSON Body to Azure Search

{"value":
    [
        {"id":"1","title":"Moving to the Cloud","speakers":"Narayan Annamalai","text":"Benefits of moving your applications to cloud","url":"https://s.ch9.ms/Events/Build/2016/P576","@search.action":"mergeOrUpload"},
        {"id":"2","title":"Building Big Data Applications using Spark and Hadoop","speakers":"Maxim Lukiyanov","text":"How to leverage Spark to build intelligence into your application","url":"https://s.ch9.ms/Events/Build/2016/P420","@search.action":"mergeOrUpload"},
        {"id":"3","title":"Service Fabric Deploying and Managing Applications with Service Fabric","speakers":"Chacko Daniel","text":"Service Fabric deploys and manages distributed applications built as microservices","url":"https://s.ch9.ms/Events/Build/2016/P431","@search.action":"mergeOrUpload"}
    ]
}

(2) Case: column_names and key_names are NOT same

Suppose you have the following config.yml and sample azure search index schema written above:

config.yml

in:
  type: file
  path_prefix: samples/sample_01.csv
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: ','
    quote: '"'
    escape: '"'
    null_string: 'NULL'
    trim_if_not_quoted: false
    skip_header_lines: 1
    allow_extra_columns: false
    allow_optional_columns: false
    columns:
    - {name: id, type: string}
    - {name: title, type: string}
    - {name: speakers, type: string}
    - {name: description, type: string}
    - {name: link, type: string}
out:
  type: azuresearch
  endpoint: https://yoichikademo.search.windows.net
  api_key:  9E55964F8254BBXX04D53F66A39AF5EB
  search_index: sampleindex01
  column_names: id,title,speakers,description,link
  key_names: id,title,speakers,text,url

The plugin will dump records out to Azure Ssearch like this:

Input CSV

id,title,speakers,description,link
1,Moving to the Cloud,Narayan Annamalai,Benefits of moving your applications to cloud,https://s.ch9.ms/Events/Build/2016/P576
2,Building Big Data Applications using Spark and Hadoop,Maxim Lukiyanov,How to leverage Spark to build intelligence into your application,https://s.ch9.ms/Events/Build/2016/P420
3,Service Fabric Deploying and Managing Applications with Service Fabric,Chacko Daniel,Service Fabric deploys and manages distributed applications built as microservices,https://s.ch9.ms/Events/Build/2016/P431

Output JSON Body to Azure Search

{"value":
    [
        {"id":"1","title":"Moving to the Cloud","speakers":"Narayan Annamalai","text":"Benefits of moving your applications to cloud","url":"https://s.ch9.ms/Events/Build/2016/P576","@search.action":"mergeOrUpload"},
        {"id":"2","title":"Building Big Data Applications using Spark and Hadoop","speakers":"Maxim Lukiyanov","text":"How to leverage Spark to build intelligence into your application","url":"https://s.ch9.ms/Events/Build/2016/P420","@search.action":"mergeOrUpload"},
        {"id":"3","title":"Service Fabric Deploying and Managing Applications with Service Fabric","speakers":"Chacko Daniel","text":"Service Fabric deploys and manages distributed applications built as microservices","url":"https://s.ch9.ms/Events/Build/2016/P431","@search.action":"mergeOrUpload"}
    ]
}

Build, Install, and Run

$ rake

$ embulk gem install pkg/embulk-output-azuresearch-0.1.0.gem

$ embulk preview config.yml

$ embulk run config.yml

Change log

Changelog

Links

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/yokawasa/embulk-output-azuresearch.

Copyright

Copyright	Copyright (c) 2016- Yoichi Kawasaki
License	MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
lib/embulk/output		lib/embulk/output
samples		samples
.gitignore		.gitignore
ChangeLog.md		ChangeLog.md
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
VERSION		VERSION
embulk-output-azuresearch.gemspec		embulk-output-azuresearch.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Azure Search output plugin for Embulk

Overview

Installation

Configuration

Azure Search

Embulk Configuration (config.yml)

Sample Configurations

(1) Case: column_names and key_names are same

(2) Case: column_names and key_names are NOT same

Build, Install, and Run

Change log

Links

Contributing

Copyright

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

yokawasa/embulk-output-azuresearch

Folders and files

Latest commit

History

Repository files navigation

Azure Search output plugin for Embulk

Overview

Installation

Configuration

Azure Search

Embulk Configuration (config.yml)

Sample Configurations

(1) Case: column_names and key_names are same

(2) Case: column_names and key_names are NOT same

Build, Install, and Run

Change log

Links

Contributing

Copyright

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages