An Embulk filter plugin for interconversion between Protocol Buffer message and JSON
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
config/checkstyle
example
gradle/wrapper
lib/embulk/filter
src
.gitignore
LICENSE.txt
README.md
build.gradle
gradlew
gradlew.bat

README.md

Protobuf filter plugin for Embulk

An Embulk filter plugin for interconversion between Protocol Buffer message and JSON.

Overview

  • Plugin type: filter

Configuration

  • serialize, deserialize
    • whether to serialize or deserialize (boolean, default: false)
    • when serialize: true, convert JSON to encoded protobuf message
    • when deserialize: true, convert encoded protobuf message to JSON
    • either one must be true and exception is thrown when both are true or both are false
  • encoding: encoding type, currently only Base64 is supported (string, required)
  • protobuf_jar_path: jar path generated from your .proto
  • columns: Input columns (array of hash, required)
    • name: name of the column (string, required)
    • message: package namespace of the message in protobuf_jar_path(string, required)

Preparation

Install protoc

See official installation guide.

Since this plugin depends on protobuf-3.1.0,
it is recommended that you install the same version.

Get protobuf-java-x.x.x.jar

You can get protobuf-java-x.x.x.jar here.

For the same reason above, getting protobuf-java-3.1.0 is recommended.

Generate jar from your .proto

Here take addressbook.proto as an example, which is used in java tutorial of Protocol Buffer.

Commands like below will generate AddressBookProtos.jar.
You can pass this AddressBookProtos.jar to protobuf_jar_path option in the config.

$ protoc --java_out=./ addressbook.proto
$ javac -classpath protobuf-java-3.1.0.jar com/example/tutorial/AddressBookProtos.java
$ jar cvf AddressBookProtos.jar com/example/tutorial/AddressBookProtos*.class

You should already have your .proto file,
so generate .jar file from your .proto and pass it to protobuf_jar_path option.

Example

Run commands below.

$ ./gradlew package
$ cd example
$ ./generate_jar_from_proto

This will generate example/AddressBookProtosProto2Syntax.jar and example/AddressBookProtosProto3Syntax.jar

Serialization Example (JSON -> encoded protobuf message)

See example_serialize.yml and example_serialize.json for details.

input:

// John as JSON generated from proto2-syntax .proto
{"name":"John Doe","id":1234,"email":"jdoe@example.com","phone":[{"number":"111-0000","type":"MOBILE"},
{"number":"555-4321","type":"HOME"}]}
// John as JSON generated from proto3-syntax .proto
{"name":"John Doe","id":1234,"email":"jdoe@example.com","phone":[{"number":"111-0000"},{"number":"555-4321","type":"HOME"}]}
// Jane as JSON generated from proto2-syntax .proto
{"name":"Jane Doe","id":1235,"phone":[{"number":"999-8888","type":"MOBILE"}]}
// Jane as JSON generated from proto3-syntax .proto
{"name":"Jane Doe","id":1235,"phone":[{"number":"999-8888"}]}

When you pass example/AddressBookProtosProto2Syntax.jar to protobuf_jar_path:

$ embulk run example/example_serialize.yml
CghKb2huIERvZRDSCRoQamRvZUBleGFtcGxlLmNvbSIMCggxMTEtMDAwMBAAIgwKCDU1NS00MzIxEAE=
CghKb2huIERvZRDSCRoQamRvZUBleGFtcGxlLmNvbSIKCggxMTEtMDAwMCIMCgg1NTUtNDMyMRAB
CghKYW5lIERvZRDTCSIMCgg5OTktODg4OBAA
CghKYW5lIERvZRDTCSIKCgg5OTktODg4OA==

When you pass example/AddressBookProtosProto3Syntax.jar to protobuf_jar_path:

$ embulk run example/example_serialize.yml
CghKb2huIERvZRDSCRoQamRvZUBleGFtcGxlLmNvbSIKCggxMTEtMDAwMCIMCgg1NTUtNDMyMRAB
CghKb2huIERvZRDSCRoQamRvZUBleGFtcGxlLmNvbSIKCggxMTEtMDAwMCIMCgg1NTUtNDMyMRAB
CghKYW5lIERvZRDTCSIKCgg5OTktODg4OA==
CghKYW5lIERvZRDTCSIKCgg5OTktODg4OA==

Using .jar generated from proto3-syntax .proto,
json with or without enum default values are converted to the same encoded string
because default values is not presented in message in proto3.

See Default Values in proto3 language guide.

Deserialization Example (encoded protobuf message -> JSON)

See example_deserialize.yml and example_deserialize.csv for more details.

input:

// John as encoded generated from proto2-syntax .proto
CghKb2huIERvZRDSCRoQamRvZUBleGFtcGxlLmNvbSIMCggxMTEtMDAwMBAAIgwKCDU1NS00MzIxEAE=
// John as encoded generated from proto3-syntax .proto
CghKb2huIERvZRDSCRoQamRvZUBleGFtcGxlLmNvbSIKCggxMTEtMDAwMCIMCgg1NTUtNDMyMRAB
// Jane as encoded generated from proto3-syntax .proto
CghKYW5lIERvZRDTCSIMCgg5OTktODg4OBAA
// Jane as encoded generated from proto3-syntax .proto
CghKYW5lIERvZRDTCSIKCgg5OTktODg4OA==

When you pass example/AddressBookProtosProto2Syntax.jar to protobuf_jar_path:

$ embulk run example/example_deserialize.yml
{"name":"John Doe","id":1234,"email":"jdoe@example.com","phone":[{"number":"111-0000","type":"MOBILE"},{"number":"555-4321","type":"HOME"}]}
{"name":"John Doe","id":1234,"email":"jdoe@example.com","phone":[{"number":"111-0000"},{"number":"555-4321","type":"HOME"}]}
{"name":"Jane Doe","id":1235,"phone":[{"number":"999-8888","type":"MOBILE"}]}
{"name":"Jane Doe","id":1235,"phone":[{"number":"999-8888"}]}

When you pass example/AddressBookProtosProto3Syntax.jar to protobuf_jar_path:

$ embulk run example/example_deserialize.yml
{"name":"John Doe","id":1234,"email":"jdoe@example.com","phone":[{"number":"111-0000"},{"number":"555-4321","type":"HOME"}]}
{"name":"John Doe","id":1234,"email":"jdoe@example.com","phone":[{"number":"111-0000"},{"number":"555-4321","type":"HOME"}]}
{"name":"Jane Doe","id":1235,"phone":[{"number":"999-8888"}]}
{"name":"Jane Doe","id":1235,"phone":[{"number":"999-8888"}]}

Here again, encoded messages with or without default values are converted to the same JSON.
This is because of the same reason mentioned above.

TODO

  • Support other encoding method (Base16, Base32, ...)
  • Allow JSON type for input in serialization and output in deserialiazation

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously