Are you running Elasticsearch? Want to take your data and get the heck outta Dodge? Blaze provides everything you need in a neat, blazing fast package!
|Linux / OSX|
- Uses the Elasticsearch sliced scroll API to get your data hella fast.
- Written in modern C++ using libcurl and RapidJSON.
- Distributed as a single, tiny binary.
Blaze compared to other Elasticsearch dump tools. The index has ~3.5M rows and
is ~5GB in size. Each tool is timed with
time and measures the time to write
a simple JSON dump file.
Get the binary for your platform from the Releases page or compile it yourself.
If you use it often it might make sense to put it in your
$ blaze --host=http://localhost:9200 --index=massive_1 > dump.ndjson
This will connect to Elasticsearch on the specified host and start downloading
massive_1 index to stdout. Make sure to redirect this somewhere, such as
a JSON file.
Blaze will dump everything to stdout in a format compatible with the
Elasticsearch Bulk API, meaning you can use
curl to put the data back.
curl -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/other_data/_bulk --data-binary "@dump.ndjson"
One issue when working with large datasets is that Elasticsearch has an upper
limit on the size of HTTP requests (2GB). The solution is to split the file
with something like
parallel. The split should be done on even line numbers
since each command is actually two lines in the file.
cat dump.ndjson | parallel --pipe -l 50000 curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/other_data/_bulk --data-binary "@-"
Command line options
--host=<value>- the host where Elasticsearch is running.
--index=<value>- the index to dump.
--slices=<value>- (optional) the number of slices to split the scroll. Should be set to the number of shards for the index (as seen on
/_cat/indices). Defaults to 5.
--size=<value>- (optional) the size of the response (i.e, length of the
hitsarray). Defaults to 5000.
--dump-mappings- specify this flag to dump the index mappings instead of the source.
To use HTTP Basic authentication you need to pass the following options. Note that passing a password on the command line will put it in your terminal history, so please use with care.
--auth=basic- enable HTTP Basic authentication.
--basic-username=foo- the username.
--basic-password=bar- the password.
--insecure- For HTTPS connections, specify this flag to skip server certificate validation.
Building from source
Building Blaze is easy. It requires
On Linux (and OSX)
$ git submodule update --init $ make
Run it from docker
docker build -t blaze . docker run -it blaze blaze
Copyright © Viktor Elofsson and contributors.
Blaze is provided as-is under the MIT license. For more information see LICENSE.
- For libcurl, see https://curl.haxx.se/docs/copyright.html
- For RapidJSON, see https://github.com/Tencent/rapidjson/blob/master/license.txt