embulk-output-azuresearch is an embulk output plugin that dumps records to Azure Search. Embulk is a open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services. See Embulk documentation for details.
- Plugin type: output
- Load all or nothing: no
- Resume supported: no
- Cleanup supported: yes
$ gem install fluent-plugin-azuresearch
To use Microsoft Azure Search, you must create an Azure Search service in the Azure Portal. Also you must have an index, persisted storage of documents to which embulk-output-azuresearch writes event stream out. Here are instructions:
Sample Index Schema: sampleindex01
{
"name": "sampleindex01",
"fields": [
{ "name":"id", "type":"Edm.String", "key": true, "searchable": false },
{ "name":"title", "type":"Edm.String", "analyzer":"en.microsoft" },
{ "name":"speakers", "type":"Edm.String" },
{ "name":"url", "type":"Edm.String", "searchable": false, "filterable":false, "sortable":false, "facetable":false },
{ "name":"text", "type":"Edm.String", "filterable":false, "sortable":false, "facetable":false, "analyzer":"en.microsoft" }
]
}
out:
type: azuresearch
endpoint: https://yoichikademo.search.windows.net
api_key: 9E55964F8254BB4504DX3F66A39AF5EB
search_index: sampleindex01
column_names: id,title,speakers,text,url
key_names: id,title,speakers,description,link
- endpoint (required) - Azure Search service endpoint URI
- api_key (required) - Azure Search API key
- search_index (required) - Azure Search Index name to insert records
- column_names (required) - Column names in a target Azure search index. Each column needs to be separated by a comma.
- key_names (optional) - Default:nil. Key names in incomming record to insert. Each key needs to be separated by a comma. By default, key_names is as same as column_names
Suppose you have the following config.yml and sample azure search index schema written above:
config.yml
in:
type: file
path_prefix: samples/sample_01.csv
parser:
charset: UTF-8
newline: CRLF
type: csv
delimiter: ','
quote: '"'
escape: '"'
null_string: 'NULL'
trim_if_not_quoted: false
skip_header_lines: 1
allow_extra_columns: false
allow_optional_columns: false
columns:
- {name: id, type: string}
- {name: title, type: string}
- {name: speakers, type: string}
- {name: text, type: string}
- {name: url, type: string}
out:
type: azuresearch
endpoint: https://yoichikademo.search.windows.net
api_key: 9E55964F8254BBXX04D53F66A39AF5EB
search_index: sampleindex01
column_names: id,title,speakers,text,url
The plugin will dump records out to Azure Ssearch like this:
Input CSV
id,title,speakers,text,url
1,Moving to the Cloud,Narayan Annamalai,Benefits of moving your applications to cloud,https://s.ch9.ms/Events/Build/2016/P576
2,Building Big Data Applications using Spark and Hadoop,Maxim Lukiyanov,How to leverage Spark to build intelligence into your application,https://s.ch9.ms/Events/Build/2016/P420
3,Service Fabric Deploying and Managing Applications with Service Fabric,Chacko Daniel,Service Fabric deploys and manages distributed applications built as microservices,https://s.ch9.ms/Events/Build/2016/P431
Output JSON Body to Azure Search
{"value":
[
{"id":"1","title":"Moving to the Cloud","speakers":"Narayan Annamalai","text":"Benefits of moving your applications to cloud","url":"https://s.ch9.ms/Events/Build/2016/P576","@search.action":"mergeOrUpload"},
{"id":"2","title":"Building Big Data Applications using Spark and Hadoop","speakers":"Maxim Lukiyanov","text":"How to leverage Spark to build intelligence into your application","url":"https://s.ch9.ms/Events/Build/2016/P420","@search.action":"mergeOrUpload"},
{"id":"3","title":"Service Fabric Deploying and Managing Applications with Service Fabric","speakers":"Chacko Daniel","text":"Service Fabric deploys and manages distributed applications built as microservices","url":"https://s.ch9.ms/Events/Build/2016/P431","@search.action":"mergeOrUpload"}
]
}
Suppose you have the following config.yml and sample azure search index schema written above:
config.yml
in:
type: file
path_prefix: samples/sample_01.csv
parser:
charset: UTF-8
newline: CRLF
type: csv
delimiter: ','
quote: '"'
escape: '"'
null_string: 'NULL'
trim_if_not_quoted: false
skip_header_lines: 1
allow_extra_columns: false
allow_optional_columns: false
columns:
- {name: id, type: string}
- {name: title, type: string}
- {name: speakers, type: string}
- {name: description, type: string}
- {name: link, type: string}
out:
type: azuresearch
endpoint: https://yoichikademo.search.windows.net
api_key: 9E55964F8254BBXX04D53F66A39AF5EB
search_index: sampleindex01
column_names: id,title,speakers,description,link
key_names: id,title,speakers,text,url
The plugin will dump records out to Azure Ssearch like this:
Input CSV
id,title,speakers,description,link
1,Moving to the Cloud,Narayan Annamalai,Benefits of moving your applications to cloud,https://s.ch9.ms/Events/Build/2016/P576
2,Building Big Data Applications using Spark and Hadoop,Maxim Lukiyanov,How to leverage Spark to build intelligence into your application,https://s.ch9.ms/Events/Build/2016/P420
3,Service Fabric Deploying and Managing Applications with Service Fabric,Chacko Daniel,Service Fabric deploys and manages distributed applications built as microservices,https://s.ch9.ms/Events/Build/2016/P431
Output JSON Body to Azure Search
{"value":
[
{"id":"1","title":"Moving to the Cloud","speakers":"Narayan Annamalai","text":"Benefits of moving your applications to cloud","url":"https://s.ch9.ms/Events/Build/2016/P576","@search.action":"mergeOrUpload"},
{"id":"2","title":"Building Big Data Applications using Spark and Hadoop","speakers":"Maxim Lukiyanov","text":"How to leverage Spark to build intelligence into your application","url":"https://s.ch9.ms/Events/Build/2016/P420","@search.action":"mergeOrUpload"},
{"id":"3","title":"Service Fabric Deploying and Managing Applications with Service Fabric","speakers":"Chacko Daniel","text":"Service Fabric deploys and manages distributed applications built as microservices","url":"https://s.ch9.ms/Events/Build/2016/P431","@search.action":"mergeOrUpload"}
]
}
$ rake
$ embulk gem install pkg/embulk-output-azuresearch-0.1.0.gem
$ embulk preview config.yml
$ embulk run config.yml
- http://yokawasa.github.io/embulk-output-azuresearch
- https://rubygems.org/gems/embulk-output-azuresearch/
- http://unofficialism.info/posts/embulk-plugins-for-microsoft-azure-services/
Bug reports and pull requests are welcome on GitHub at https://github.com/yokawasa/embulk-output-azuresearch.
Copyright | Copyright (c) 2016- Yoichi Kawasaki |
License | MIT |