Protobuf filter plugin for Embulk
An Embulk filter plugin for interconversion between Protocol Buffer message and JSON.
Overview
- Plugin type: filter
Configuration
- serialize, deserialize
- whether to serialize or deserialize (boolean, default:
false
) - when
serialize: true
, convert JSON to encoded protobuf message - when
deserialize: true
, convert encoded protobuf message to JSON - either one must be
true
and exception is thrown when both aretrue
or both arefalse
- whether to serialize or deserialize (boolean, default:
- encoding: encoding type, currently only Base64 is supported (string, required)
- protobuf_jar_path: jar path generated from your .proto
- columns: Input columns (array of hash, required)
- name: name of the column (string, required)
- message: package namespace of the message in
protobuf_jar_path
(string, required)
Preparation
Install protoc
See official installation guide.
Since this plugin depends on protobuf-3.1.0,
it is recommended that you install the same version.
Get protobuf-java-x.x.x.jar
You can get protobuf-java-x.x.x.jar here.
For the same reason above, getting protobuf-java-3.1.0 is recommended.
Generate jar from your .proto
Here take addressbook.proto
as an example, which is used in java tutorial of Protocol Buffer.
Commands like below will generate AddressBookProtos.jar
.
You can pass this AddressBookProtos.jar
to protobuf_jar_path
option in the config.
$ protoc --java_out=./ addressbook.proto
$ javac -classpath protobuf-java-3.1.0.jar com/example/tutorial/AddressBookProtos.java
$ jar cvf AddressBookProtos.jar com/example/tutorial/AddressBookProtos*.class
You should already have your .proto
file,
so generate .jar
file from your .proto
and pass it to protobuf_jar_path
option.
Example
Run commands below.
$ ./gradlew package
$ cd example
$ ./generate_jar_from_proto
This will generate example/AddressBookProtosProto2Syntax.jar
and example/AddressBookProtosProto3Syntax.jar
Serialization Example (JSON -> encoded protobuf message)
See example_serialize.yml and example_serialize.json for details.
input:
// John as JSON generated from proto2-syntax .proto
{"name":"John Doe","id":1234,"email":"jdoe@example.com","phone":[{"number":"111-0000","type":"MOBILE"},
{"number":"555-4321","type":"HOME"}]}
// John as JSON generated from proto3-syntax .proto
{"name":"John Doe","id":1234,"email":"jdoe@example.com","phone":[{"number":"111-0000"},{"number":"555-4321","type":"HOME"}]}
// Jane as JSON generated from proto2-syntax .proto
{"name":"Jane Doe","id":1235,"phone":[{"number":"999-8888","type":"MOBILE"}]}
// Jane as JSON generated from proto3-syntax .proto
{"name":"Jane Doe","id":1235,"phone":[{"number":"999-8888"}]}
When you pass example/AddressBookProtosProto2Syntax.jar
to protobuf_jar_path
:
$ embulk run example/example_serialize.yml
CghKb2huIERvZRDSCRoQamRvZUBleGFtcGxlLmNvbSIMCggxMTEtMDAwMBAAIgwKCDU1NS00MzIxEAE=
CghKb2huIERvZRDSCRoQamRvZUBleGFtcGxlLmNvbSIKCggxMTEtMDAwMCIMCgg1NTUtNDMyMRAB
CghKYW5lIERvZRDTCSIMCgg5OTktODg4OBAA
CghKYW5lIERvZRDTCSIKCgg5OTktODg4OA==
When you pass example/AddressBookProtosProto3Syntax.jar
to protobuf_jar_path
:
$ embulk run example/example_serialize.yml
CghKb2huIERvZRDSCRoQamRvZUBleGFtcGxlLmNvbSIKCggxMTEtMDAwMCIMCgg1NTUtNDMyMRAB
CghKb2huIERvZRDSCRoQamRvZUBleGFtcGxlLmNvbSIKCggxMTEtMDAwMCIMCgg1NTUtNDMyMRAB
CghKYW5lIERvZRDTCSIKCgg5OTktODg4OA==
CghKYW5lIERvZRDTCSIKCgg5OTktODg4OA==
Using .jar
generated from proto3-syntax .proto
,
json with or without enum default values are converted to the same encoded string
because default values is not presented in message in proto3.
See Default Values in proto3 language guide.
Deserialization Example (encoded protobuf message -> JSON)
See example_deserialize.yml and example_deserialize.csv for more details.
input:
// John as encoded generated from proto2-syntax .proto
CghKb2huIERvZRDSCRoQamRvZUBleGFtcGxlLmNvbSIMCggxMTEtMDAwMBAAIgwKCDU1NS00MzIxEAE=
// John as encoded generated from proto3-syntax .proto
CghKb2huIERvZRDSCRoQamRvZUBleGFtcGxlLmNvbSIKCggxMTEtMDAwMCIMCgg1NTUtNDMyMRAB
// Jane as encoded generated from proto3-syntax .proto
CghKYW5lIERvZRDTCSIMCgg5OTktODg4OBAA
// Jane as encoded generated from proto3-syntax .proto
CghKYW5lIERvZRDTCSIKCgg5OTktODg4OA==
When you pass example/AddressBookProtosProto2Syntax.jar
to protobuf_jar_path
:
$ embulk run example/example_deserialize.yml
{"name":"John Doe","id":1234,"email":"jdoe@example.com","phone":[{"number":"111-0000","type":"MOBILE"},{"number":"555-4321","type":"HOME"}]}
{"name":"John Doe","id":1234,"email":"jdoe@example.com","phone":[{"number":"111-0000"},{"number":"555-4321","type":"HOME"}]}
{"name":"Jane Doe","id":1235,"phone":[{"number":"999-8888","type":"MOBILE"}]}
{"name":"Jane Doe","id":1235,"phone":[{"number":"999-8888"}]}
When you pass example/AddressBookProtosProto3Syntax.jar
to protobuf_jar_path
:
$ embulk run example/example_deserialize.yml
{"name":"John Doe","id":1234,"email":"jdoe@example.com","phone":[{"number":"111-0000"},{"number":"555-4321","type":"HOME"}]}
{"name":"John Doe","id":1234,"email":"jdoe@example.com","phone":[{"number":"111-0000"},{"number":"555-4321","type":"HOME"}]}
{"name":"Jane Doe","id":1235,"phone":[{"number":"999-8888"}]}
{"name":"Jane Doe","id":1235,"phone":[{"number":"999-8888"}]}
Here again, encoded messages with or without default values are converted to the same JSON.
This is because of the same reason mentioned above.
TODO
- Support other encoding method (Base16, Base32, ...)
- Allow JSON type for input in serialization and output in deserialiazation
Build
$ ./gradlew gem # -t to watch change of files and rebuild continuously