Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Avro Resolving Serialization #55

Closed
wants to merge 17 commits into from

2 participants

@mumrah

Adding support for use of resolving Avro decoder. Schemas are versioned and stored in the element in stores.xml (like the JSON schemas). The schema version is used as the first byte in the serialized value.

Let me know if there is anything I need to add to contribute this (docs, more tests, etc)

mumrah and others added some commits
@mumrah mumrah Begin AvroResolvingSpecificSerializer work f4a28b2
David Arthur Avro resolving serialization
New abstract class, AvroResolvingSerializer, provides more efficient Avro
serialization than was possible before. Instead of writing out the Schema along
with every entry in a Store, a 4 byte version is stored at the beginning of
every entry.

The version number is the hashCode of the string representation of the Schema,
i.e., Schema.parse(avroSchemaAsString).toString().hashCode(). The conversion of
Schema back to String is to capture any modification the Avro Schema parser
does to the schema.

Schemas are managed in stores.xml just like the JSON schemas. Internally, the
AvroResolvingSerializer uses a Map of the hashCode->Schema rather than
Integer->String like the SerializationDefinition provides. Two new serializer
types have been added to the DefaultSerializerFactory:

 * avro-specific-resolving
 * avro-generic-resolving

It it up to the user to maintain their schemas in stores.xml
6ab08c3
David Arthur Adding unit tests
A few test Avro schemas added to test/common. Only TestSpecificRecord.avsc gets
compiled into a class (during buildtest Ant target), the others are used in the
unit tests for Generic Avro records.
f96386f
David Arthur Didn't mean to commit that ad8e344
David Arthur Changed AvroResolvingGenericSerializer to be generic
Rather than requiring a GenericData.Record
622c2ec
David Arthur Merge branch 'avro-resolving' into integration 9612180
@mumrah mumrah Begin AvroResolvingSpecificSerializer work 3486dcb
@mumrah mumrah Avro resolving serialization
New abstract class, AvroResolvingSerializer, provides more efficient Avro
serialization than was possible before. Instead of writing out the Schema along
with every entry in a Store, a 4 byte version is stored at the beginning of
every entry.

The version number is the hashCode of the string representation of the Schema,
i.e., Schema.parse(avroSchemaAsString).toString().hashCode(). The conversion of
Schema back to String is to capture any modification the Avro Schema parser
does to the schema.

Schemas are managed in stores.xml just like the JSON schemas. Internally, the
AvroResolvingSerializer uses a Map of the hashCode->Schema rather than
Integer->String like the SerializationDefinition provides. Two new serializer
types have been added to the DefaultSerializerFactory:

 * avro-specific-resolving
 * avro-generic-resolving

It it up to the user to maintain their schemas in stores.xml
85be38b
@mumrah mumrah Adding unit tests
A few test Avro schemas added to test/common. Only TestSpecificRecord.avsc gets
compiled into a class (during buildtest Ant target), the others are used in the
unit tests for Generic Avro records.
27e23fd
@mumrah mumrah Didn't mean to commit that d2f8d63
@mumrah mumrah Changed AvroResolvingGenericSerializer to be generic
Rather than requiring a GenericData.Record
375221e
@mumrah mumrah Use Schema.hashCode() instead of Schema.toString().hashCode()
Adding a unit test to make sure Schema.hashCode() works like we expect it to.
Namely, want to make sure it ignores the "doc" field
a480588
@mumrah mumrah Adding unit test for avro resolving serializer in stores.xml 68e249b
David Arthur Use version from the stores.xml schema-info
Seems to be some wierdness with using Schema.hashCode. Plus this way we only
use up one extra byte for storing the schema version rather than four
a88e5d5
David Arthur Updated unit tests for new versioning 3e0b860
David Arthur Merge branch 'avro-resolving'
Conflicts:
	src/java/voldemort/serialization/avro/AvroResolvingGenericSerializer.java
	src/java/voldemort/serialization/avro/AvroResolvingSerializer.java
	src/java/voldemort/serialization/avro/AvroResolvingSpecificSerializer.java
	test/common/voldemort/VoldemortTestConstants.java
75e959b
David Arthur Fixing unit tests for versioning 6c9e99d
@icefury71
Collaborator

I eyeballed the diffs - looks fine from the first pass. I'll investigate further whether we really need versioning support for Avro.
If this is meant to accommodate non-backwards compatible schema changes it might be needed.

@icefury71
Collaborator

After some discussion, it seems that we should not do explicit versioning with Avro. This goes against the basic philosophy of Avro being able to support evolutionary changes. Yes it is there in Json, but for historical reasons. We should not propagate that forward in the new serialization formats.

Having said that, do you have a compelling use case for this ?

@mumrah

The initial motivation for this was:

  • Desire for typed fields (along with generated Java classes)
  • Large number of small payloads
  • Evolvable schema

In our system, ZooKeeper is used as a repository of schemas. When an application loads, it reads in known schemas from ZooKeeper and maintains a cache of (de)serializers. Having a schema stored along with the data (one of the Avro principles) is nice when you're dealing with data files where len(data) >> len(schema), but this isn't the case in Voldemort. Each schema gets serialized with the data where len(schema) > len(data) in most cases.

Now, due to string de-duplication of the file compression (gzip or whatever Voldemort/BDB is using) the data files will probably not be significantly bigger with the JSON schemas prepended to each payload - however it will significantly impact how much data is moving around on the wire.

There might also be some performance gain by not having to parse a schema and create a DatumReader each time you deserialize a payload.

It's been quite a while since I looked at this code, and I don't actually work for the company that I wrote it for in the first place. If it doesn't seem like a good fit - please feel free to close this issue.

Cheers,
David

@icefury71
Collaborator

I"d argue that len(schema) is not greater than len(data) in most cases. In our LinkedIn production environment, we see a common average value size of about 50 - 100KB (some are even more). I agree, it would benefit the small value use case though.

Again, I'd like to emphasize on the seamless philosophy - having multiple versions is kinda ugly.

@icefury71 icefury71 closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Sep 9, 2011
  1. @mumrah

    Begin AvroResolvingSpecificSerializer work

    mumrah authored David Arthur committed
  2. Avro resolving serialization

    David Arthur authored
    New abstract class, AvroResolvingSerializer, provides more efficient Avro
    serialization than was possible before. Instead of writing out the Schema along
    with every entry in a Store, a 4 byte version is stored at the beginning of
    every entry.
    
    The version number is the hashCode of the string representation of the Schema,
    i.e., Schema.parse(avroSchemaAsString).toString().hashCode(). The conversion of
    Schema back to String is to capture any modification the Avro Schema parser
    does to the schema.
    
    Schemas are managed in stores.xml just like the JSON schemas. Internally, the
    AvroResolvingSerializer uses a Map of the hashCode->Schema rather than
    Integer->String like the SerializationDefinition provides. Two new serializer
    types have been added to the DefaultSerializerFactory:
    
     * avro-specific-resolving
     * avro-generic-resolving
    
    It it up to the user to maintain their schemas in stores.xml
  3. Adding unit tests

    David Arthur authored
    A few test Avro schemas added to test/common. Only TestSpecificRecord.avsc gets
    compiled into a class (during buildtest Ant target), the others are used in the
    unit tests for Generic Avro records.
  4. Didn't mean to commit that

    David Arthur authored
  5. Changed AvroResolvingGenericSerializer to be generic

    David Arthur authored
    Rather than requiring a GenericData.Record
  6. Merge branch 'avro-resolving' into integration

    David Arthur authored
  7. @mumrah
  8. @mumrah

    Avro resolving serialization

    mumrah authored
    New abstract class, AvroResolvingSerializer, provides more efficient Avro
    serialization than was possible before. Instead of writing out the Schema along
    with every entry in a Store, a 4 byte version is stored at the beginning of
    every entry.
    
    The version number is the hashCode of the string representation of the Schema,
    i.e., Schema.parse(avroSchemaAsString).toString().hashCode(). The conversion of
    Schema back to String is to capture any modification the Avro Schema parser
    does to the schema.
    
    Schemas are managed in stores.xml just like the JSON schemas. Internally, the
    AvroResolvingSerializer uses a Map of the hashCode->Schema rather than
    Integer->String like the SerializationDefinition provides. Two new serializer
    types have been added to the DefaultSerializerFactory:
    
     * avro-specific-resolving
     * avro-generic-resolving
    
    It it up to the user to maintain their schemas in stores.xml
  9. @mumrah

    Adding unit tests

    mumrah authored
    A few test Avro schemas added to test/common. Only TestSpecificRecord.avsc gets
    compiled into a class (during buildtest Ant target), the others are used in the
    unit tests for Generic Avro records.
  10. @mumrah

    Didn't mean to commit that

    mumrah authored
  11. @mumrah

    Changed AvroResolvingGenericSerializer to be generic

    mumrah authored
    Rather than requiring a GenericData.Record
Commits on Sep 12, 2011
  1. @mumrah

    Use Schema.hashCode() instead of Schema.toString().hashCode()

    mumrah authored
    Adding a unit test to make sure Schema.hashCode() works like we expect it to.
    Namely, want to make sure it ignores the "doc" field
  2. @mumrah
Commits on Sep 23, 2011
  1. Use version from the stores.xml schema-info

    David Arthur authored
    Seems to be some wierdness with using Schema.hashCode. Plus this way we only
    use up one extra byte for storing the schema version rather than four
  2. Updated unit tests for new versioning

    David Arthur authored
Commits on Nov 7, 2011
  1. Merge branch 'avro-resolving'

    David Arthur authored
    Conflicts:
    	src/java/voldemort/serialization/avro/AvroResolvingGenericSerializer.java
    	src/java/voldemort/serialization/avro/AvroResolvingSerializer.java
    	src/java/voldemort/serialization/avro/AvroResolvingSpecificSerializer.java
    	test/common/voldemort/VoldemortTestConstants.java
  2. Fixing unit tests for versioning

    David Arthur authored
This page is out of date. Refresh to see the latest.
View
20 build.xml
@@ -86,6 +86,7 @@
</target>
<target name="buildtest" description="Compile test classes">
+ <java-avro-compiler src="${commontestsrc.dir}/voldemort/config" classpath="test-classpath"/>
<replace-dir dir="${testclasses.dir}" />
<copy todir="${testclasses.dir}">
<fileset dir="${commontestsrc.dir}">
@@ -130,7 +131,7 @@
<arg line="${proto.sources}"/>
</exec>
</target>
-
+
<target name="jar" depends="build" description="Build server jar file">
<jar destfile="${dist.dir}/${name}-${curr.release}.jar">
<fileset dir="${classes.dir}">
@@ -336,6 +337,23 @@
</tar>
</sequential>
</macrodef>
+
+ <macrodef name="java-avro-compiler">
+ <attribute name="src"/>
+ <attribute name="classpath"/>
+ <sequential>
+ <taskdef name="schema" classname="org.apache.avro.specific.SchemaTask">
+ <classpath refid="@{classpath}" />
+ </taskdef>
+
+ <schema destdir="${commontestsrc.dir}">
+ <fileset dir="@{src}">
+ <include name="**/*.avsc" />
+ </fileset>
+ </schema>
+ </sequential>
+ </macrodef>
+
<target name="snapshot" description="Create a release-snapshot zip file with everything pre-built.">
<create-release-artifacts version="${curr.release.snapshot}" />
View
10 src/java/voldemort/serialization/DefaultSerializerFactory.java
@@ -24,6 +24,8 @@
import voldemort.serialization.avro.AvroGenericSerializer;
import voldemort.serialization.avro.AvroReflectiveSerializer;
+import voldemort.serialization.avro.AvroResolvingGenericSerializer;
+import voldemort.serialization.avro.AvroResolvingSpecificSerializer;
import voldemort.serialization.avro.AvroSpecificSerializer;
import voldemort.serialization.json.JsonTypeDefinition;
import voldemort.serialization.json.JsonTypeSerializer;
@@ -49,6 +51,8 @@
private static final String AVRO_GENERIC_TYPE_NAME = "avro-generic";
private static final String AVRO_SPECIFIC_TYPE_NAME = "avro-specific";
private static final String AVRO_REFLECTIVE_TYPE_NAME = "avro-reflective";
+ private static final String AVRO_RESOLVING_SPECIFIC_TYPE_NAME = "avro-specific-resolving";
+ private static final String AVRO_RESOLVING_GENERIC_TYPE_NAME = "avro-generic-resolving";
public Serializer<?> getSerializer(SerializerDefinition serializerDef) {
String name = serializerDef.getName();
@@ -72,13 +76,17 @@
} else if(name.equals(PROTO_BUF_TYPE_NAME)) {
return new ProtoBufSerializer<Message>(serializerDef.getCurrentSchemaInfo());
} else if(name.equals(THRIFT_TYPE_NAME)) {
- return new ThriftSerializer<TBase<?,?>>(serializerDef.getCurrentSchemaInfo());
+ return new ThriftSerializer<TBase<?, ?>>(serializerDef.getCurrentSchemaInfo());
} else if(name.equals(AVRO_GENERIC_TYPE_NAME)) {
return new AvroGenericSerializer(serializerDef.getCurrentSchemaInfo());
} else if(name.equals(AVRO_SPECIFIC_TYPE_NAME)) {
return new AvroSpecificSerializer<SpecificRecord>(serializerDef.getCurrentSchemaInfo());
} else if(name.equals(AVRO_REFLECTIVE_TYPE_NAME)) {
return new AvroReflectiveSerializer<Object>(serializerDef.getCurrentSchemaInfo());
+ } else if(name.equals(AVRO_RESOLVING_SPECIFIC_TYPE_NAME)) {
+ return new AvroResolvingSpecificSerializer<SpecificRecord>(serializerDef);
+ } else if(name.equals(AVRO_RESOLVING_GENERIC_TYPE_NAME)) {
+ return new AvroResolvingGenericSerializer(serializerDef);
} else {
throw new IllegalArgumentException("No known serializer type: "
+ serializerDef.getName());
View
53 src/java/voldemort/serialization/avro/AvroResolvingGenericSerializer.java
@@ -0,0 +1,53 @@
+package voldemort.serialization.avro;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericDatumWriter;
+import org.apache.avro.io.DatumReader;
+import org.apache.avro.io.DatumWriter;
+
+import voldemort.serialization.SerializerDefinition;
+
+public class AvroResolvingGenericSerializer<T> extends AvroResolvingSerializer<T> {
+
+ public AvroResolvingGenericSerializer(SerializerDefinition serializerDef) {
+ super(serializerDef);
+ }
+
+ @Override
+ protected DatumWriter<T> createDatumWriter(Schema schema) {
+ return new GenericDatumWriter<T>(schema);
+ }
+
+ @Override
+ protected DatumReader<T> createDatumReader(Schema writerSchema, Schema readerSchema) {
+ return new GenericDatumReader<T>(writerSchema, readerSchema);
+ }
+
+ @Override
+ protected Map<Byte, Schema> loadSchemas(Map<Integer, String> allSchemaInfos) {
+ Map<Byte, Schema> schemaVersions = new HashMap<Byte, Schema>();
+ for(Map.Entry<Integer, String> entry: allSchemaInfos.entrySet()) {
+ // Make sure we can parse the schema
+ Schema schema = Schema.parse(entry.getValue());
+ // Check that the version is less than 256
+ Integer version = entry.getKey();
+ if(version > Byte.MAX_VALUE) {
+ throw new IllegalArgumentException("Cannot have schema version higher than "
+ + Byte.MAX_VALUE);
+ }
+ schemaVersions.put(version.byteValue(), schema);
+ LOG.info("Loaded schema version (" + version + ")");
+ }
+ return schemaVersions;
+ }
+
+ @Override
+ protected Schema getCurrentSchema(SerializerDefinition serializerDef) {
+ String schemaInfo = serializerDef.getCurrentSchemaInfo();
+ return Schema.parse(schemaInfo);
+ }
+}
View
105 src/java/voldemort/serialization/avro/AvroResolvingSerializer.java
@@ -0,0 +1,105 @@
+package voldemort.serialization.avro;
+
+import java.io.ByteArrayOutputStream;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.avro.Schema;
+import org.apache.avro.io.BinaryEncoder;
+import org.apache.avro.io.DatumReader;
+import org.apache.avro.io.DatumWriter;
+import org.apache.avro.io.Decoder;
+import org.apache.avro.io.DecoderFactory;
+import org.apache.avro.io.Encoder;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+
+import voldemort.serialization.SerializationException;
+import voldemort.serialization.Serializer;
+import voldemort.serialization.SerializerDefinition;
+
+public abstract class AvroResolvingSerializer<T> implements Serializer<T> {
+
+ protected static final Log LOG = LogFactory.getLog(AvroResolvingSerializer.class);
+ private final Map<Byte, Schema> avroSchemaVersions;
+ private final Schema currentSchema;
+ private Byte currentAvroSchemaVersion;
+ private final DatumWriter<T> writer;
+ private final Map<Byte, DatumReader<T>> readers = new HashMap<Byte, DatumReader<T>>();
+ private DecoderFactory decoderFactory = new DecoderFactory();
+
+ public AvroResolvingSerializer(SerializerDefinition serializerDef) {
+ Map<Integer, String> allSchemaInfos = serializerDef.getAllSchemaInfoVersions();
+
+ // Parse the SerializerDefinition and load up the Schemas into a map
+ avroSchemaVersions = loadSchemas(allSchemaInfos);
+
+ // Make sure the "current" schema is loaded
+ currentSchema = getCurrentSchema(serializerDef);
+ for(Map.Entry<Byte, Schema> entry: avroSchemaVersions.entrySet()) {
+ if(entry.getValue().equals(currentSchema)) {
+ currentAvroSchemaVersion = entry.getKey();
+ break;
+ }
+ }
+ if(currentAvroSchemaVersion == null) {
+ throw new IllegalArgumentException("Most recent Schema is not included in the schema-info");
+ }
+
+ // Create a DatumReader for each schema and a DatumWriter for the
+ // current schema
+ for(Map.Entry<Byte, Schema> entry: avroSchemaVersions.entrySet()) {
+ readers.put(entry.getKey(), createDatumReader(entry.getValue(), currentSchema));
+ }
+ writer = createDatumWriter(currentSchema);
+ }
+
+ protected abstract Schema getCurrentSchema(SerializerDefinition serializerDef);
+
+ protected abstract Map<Byte, Schema> loadSchemas(Map<Integer, String> allSchemaInfos);
+
+ protected abstract DatumWriter<T> createDatumWriter(Schema schema);
+
+ protected abstract DatumReader<T> createDatumReader(Schema writerSchema, Schema readerSchema);
+
+ public byte[] toBytes(T object) {
+ try {
+ ByteArrayOutputStream out = new ByteArrayOutputStream();
+ // Write the version as the first byte
+ byte[] versionBytes = ByteBuffer.allocate(1).put(currentAvroSchemaVersion).array();
+ out.write(versionBytes);
+ // Write the serialized Avro object as the remaining bytes
+ Encoder encoder = new BinaryEncoder(out);
+ writer.write(object, encoder);
+ encoder.flush();
+ out.close();
+ // Convert to byte[] and return
+ return out.toByteArray();
+ } catch(Exception e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ public T toObject(byte[] bytes) {
+ try {
+ ByteBuffer bb = ByteBuffer.wrap(bytes);
+ // First byte is the version
+ Byte version = bb.get();
+ if(avroSchemaVersions.containsKey(version) == false) {
+ throw new SerializationException("Unknown Schema version (" + version
+ + ") found in serialized value");
+ }
+ // Read the remaining bytes, this is the serialized Avro object
+ byte[] b = new byte[bb.remaining()];
+ bb.get(b);
+
+ // Read the bytes into T object
+ DatumReader<T> datumReader = readers.get(version);
+ Decoder decoder = decoderFactory.createBinaryDecoder(b, null);
+ return datumReader.read(null, decoder);
+ } catch(Exception e) {
+ throw new RuntimeException(e);
+ }
+ }
+}
View
84 src/java/voldemort/serialization/avro/AvroResolvingSpecificSerializer.java
@@ -0,0 +1,84 @@
+package voldemort.serialization.avro;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.avro.Schema;
+import org.apache.avro.io.DatumReader;
+import org.apache.avro.io.DatumWriter;
+import org.apache.avro.specific.SpecificDatumReader;
+import org.apache.avro.specific.SpecificDatumWriter;
+import org.apache.avro.specific.SpecificRecord;
+
+import voldemort.serialization.SerializationException;
+import voldemort.serialization.SerializerDefinition;
+
+public class AvroResolvingSpecificSerializer<T extends SpecificRecord> extends
+ AvroResolvingSerializer<T> {
+
+ public AvroResolvingSpecificSerializer(SerializerDefinition serializerDef) {
+ super(serializerDef);
+ }
+
+ @Override
+ protected DatumWriter<T> createDatumWriter(Schema schema) {
+ return new SpecificDatumWriter<T>(schema);
+ }
+
+ @Override
+ protected DatumReader<T> createDatumReader(Schema writerSchema, Schema readerSchema) {
+ return new SpecificDatumReader<T>(writerSchema, readerSchema);
+ }
+
+ @Override
+ protected Map<Byte, Schema> loadSchemas(Map<Integer, String> allSchemaInfos) {
+ Map<Byte, Schema> schemaVersions = new HashMap<Byte, Schema>();
+ String fullName = null;
+ for(Map.Entry<Integer, String> entry: allSchemaInfos.entrySet()) {
+ Schema schema = Schema.parse(entry.getValue());
+ // Make sure each version of the Schema is for the same class name
+ if(fullName == null) {
+ fullName = schema.getFullName();
+ } else {
+ if(schema.getFullName().equals(fullName) == false) {
+ throw new IllegalArgumentException("Avro schema must all reference the same class");
+ }
+ }
+ // Make sure the Schema is a Record
+ if(schema.getType() != Schema.Type.RECORD) {
+ throw new IllegalArgumentException("Avro schema must be a \"record\" type schema");
+ }
+ Integer version = entry.getKey();
+ if(version > Byte.MAX_VALUE) {
+ throw new IllegalArgumentException("Cannot have schema version higher than "
+ + Byte.MAX_VALUE);
+ }
+ schemaVersions.put(version.byteValue(), schema);
+ LOG.info("Loaded schema version (" + version + ")");
+ }
+ return schemaVersions;
+ }
+
+ @Override
+ protected Schema getCurrentSchema(SerializerDefinition serializerDef) {
+ try {
+ // The current schema is the one extracted from the class
+ String schemaInfo = serializerDef.getCurrentSchemaInfo();
+ Schema schema = Schema.parse(schemaInfo);
+ // Make sure we can instantiate the class, and that it extends
+ // SpecificRecord
+ String fullName = schema.getFullName();
+ Class<T> clazz = (Class<T>) Class.forName(fullName);
+ if(!SpecificRecord.class.isAssignableFrom(clazz))
+ throw new IllegalArgumentException("Class provided should implement SpecificRecord");
+ T inst = clazz.newInstance();
+ return inst.getSchema();
+ } catch(ClassNotFoundException e) {
+ throw new SerializationException(e);
+ } catch(IllegalAccessException e) {
+ throw new SerializationException(e);
+ } catch(InstantiationException e) {
+ throw new SerializationException(e);
+ }
+ }
+}
View
23 test/common/voldemort/VoldemortTestConstants.java
@@ -118,4 +118,27 @@ public static String getSingleStore322Xml() {
return readString("config/single-store-322.xml");
}
+ public static String getTestSpecificRecordSchemaWithNamespace1() {
+ return readString("config/TestSpecificRecordNS.avsc.v1");
+ }
+
+ public static String getTestSpecificRecordSchema1() {
+ return readString("config/TestSpecificRecord.avsc.v1");
+ }
+
+ public static String getTestSpecificRecordSchemaWithNamespace2() {
+ return readString("config/TestSpecificRecordNS.avsc.v2");
+ }
+
+ public static String getTestSpecificRecordSchema2() {
+ return readString("config/TestSpecificRecord.avsc.v2");
+ }
+
+ public static String getTestSpecificRecordSchemaActual() {
+ return readString("config/TestSpecificRecord.avsc");
+ }
+
+ public static String getAvroStoresXml() {
+ return readString("config/avro-store.xml");
+ }
}
View
10 test/common/voldemort/config/TestSpecificRecord.avsc
@@ -0,0 +1,10 @@
+{
+ "type": "record",
+ "name": "TestRecord",
+ "namespace": "voldemort",
+ "fields" : [
+ {"name": "f1", "type": "string"},
+ {"name": "f2", "type": "string", "default": "d2"},
+ {"name": "f3", "type": "int", "default": 3}
+ ]
+}
View
7 test/common/voldemort/config/TestSpecificRecord.avsc.v1
@@ -0,0 +1,7 @@
+{
+ "type": "record",
+ "name": "TestRecord",
+ "fields" : [
+ {"name": "f1", "type": "string"}
+ ]
+}
View
8 test/common/voldemort/config/TestSpecificRecord.avsc.v2
@@ -0,0 +1,8 @@
+{
+ "type": "record",
+ "name": "TestRecord",
+ "fields" : [
+ {"name": "f1", "type": "string"},
+ {"name": "f2", "type": "string", "default": "d2"}
+ ]
+}
View
8 test/common/voldemort/config/TestSpecificRecordNS.avsc.v1
@@ -0,0 +1,8 @@
+{
+ "type": "record",
+ "name": "TestRecord",
+ "namespace": "voldemort",
+ "fields" : [
+ {"name": "f1", "type": "string"}
+ ]
+}
View
9 test/common/voldemort/config/TestSpecificRecordNS.avsc.v2
@@ -0,0 +1,9 @@
+{
+ "type": "record",
+ "name": "TestRecord",
+ "namespace": "voldemort",
+ "fields" : [
+ {"name": "f1", "type": "string"},
+ {"name": "f2", "type": "string", "default": "d2"}
+ ]
+}
View
22 test/common/voldemort/config/avro-store.xml
@@ -0,0 +1,22 @@
+<?xml version="1.0"?>
+<stores>
+ <store>
+ <name>test-avro-evolving-schema</name>
+ <persistence>memory</persistence>
+ <routing>client</routing>
+ <replication-factor>1</replication-factor>
+ <preferred-reads>1</preferred-reads>
+ <required-reads>1</required-reads>
+ <preferred-writes>1</preferred-writes>
+ <required-writes>1</required-writes>
+ <key-serializer>
+ <type>string</type>
+ <schema-info>UTF-8</schema-info>
+ </key-serializer>
+ <value-serializer>
+ <type>avro-generic-resolving</type>
+ <schema-info version="1">{"name":"TestRecord","type":"record","fields":[{"name":"f1","type":"string"}]}</schema-info>
+ <schema-info version="2">{"name":"TestRecord","type":"record","fields":[{"name":"f1","type":"string"},{"name":"f2","type":"string","default":"d2"}]}</schema-info>
+ </value-serializer>
+ </store>
+</stores>
View
222 test/unit/voldemort/serialization/avro/AvroResolvingSerializerTest.java
@@ -0,0 +1,222 @@
+package voldemort.serialization.avro;
+
+import java.io.StringReader;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import junit.framework.TestCase;
+
+import org.apache.avro.Schema;
+import org.apache.avro.SchemaParseException;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.specific.SpecificRecord;
+import org.apache.avro.util.Utf8;
+
+import voldemort.TestRecord;
+import voldemort.VoldemortTestConstants;
+import voldemort.serialization.Serializer;
+import voldemort.serialization.SerializerDefinition;
+import voldemort.store.StoreDefinition;
+import voldemort.utils.ByteUtils;
+import voldemort.xml.StoreDefinitionsMapper;
+
+public class AvroResolvingSerializerTest extends TestCase {
+
+ String SCHEMA1 = VoldemortTestConstants.getTestSpecificRecordSchema1();
+ String SCHEMA2 = VoldemortTestConstants.getTestSpecificRecordSchema2();
+ String SCHEMA1NS = VoldemortTestConstants.getTestSpecificRecordSchemaWithNamespace1();
+ String SCHEMA2NS = VoldemortTestConstants.getTestSpecificRecordSchemaWithNamespace2();
+ String SCHEMA = TestRecord.SCHEMA$.toString();
+
+ public void testSchemaHashCode() {
+ // Check various things that affect hashCode
+ // Base case
+ assertEquals(Schema.parse(SCHEMA1).hashCode(), Schema.parse(SCHEMA1).hashCode());
+ // Check docs
+ Schema s1 = Schema.parse("{\"type\":\"record\",\"name\":\"TestRecord\",\"fields\":[{\"name\":\"f1\",\"type\":\"string\"}]}");
+ Schema s2 = Schema.parse("{\"type\":\"record\",\"name\":\"TestRecord\",\"fields\":[{\"name\":\"f1\",\"doc\":\"field docs\",\"type\":\"string\"}]}");
+ assertEquals(s1.hashCode(), s2.hashCode());
+ // Check field order
+ s1 = Schema.parse("{\"type\":\"record\",\"name\":\"TestRecord\",\"fields\":[{\"name\":\"f1\",\"type\":\"string\"},{\"name\":\"f2\",\"type\":\"string\"}]}");
+ s2 = Schema.parse("{\"type\":\"record\",\"name\":\"TestRecord\",\"fields\":[{\"name\":\"f2\",\"type\":\"string\"},{\"name\":\"f1\",\"type\":\"string\"}]}");
+ assertNotSame(s1.hashCode(), s2.hashCode());
+ // Check namespace
+ assertNotSame(Schema.parse(SCHEMA1).hashCode(), Schema.parse(SCHEMA1NS).hashCode());
+ // Different fields
+ assertNotSame(Schema.parse(SCHEMA1).hashCode(), Schema.parse(SCHEMA2).hashCode());
+ // Field types
+ s1 = Schema.parse("{\"type\":\"record\",\"name\":\"TestRecord\",\"fields\":[{\"name\":\"f1\",\"type\":\"string\"}]}");
+ s2 = Schema.parse("{\"type\":\"record\",\"name\":\"TestRecord\",\"fields\":[{\"name\":\"f1\",\"type\":\"int\"}]}");
+ assertNotSame(s1.hashCode(), s2.hashCode());
+ }
+
+ public void testStoresXML() {
+ StoreDefinitionsMapper mapper = new StoreDefinitionsMapper();
+ List<StoreDefinition> storeDefs = mapper.readStoreList(new StringReader(VoldemortTestConstants.getAvroStoresXml()));
+ StoreDefinition storeDef = storeDefs.get(0);
+ SerializerDefinition serializerDef = storeDef.getValueSerializer();
+
+ // Create a serializer with version=1
+ Map<Integer, String> schemaInfo = new HashMap<Integer, String>();
+ schemaInfo.put(1, serializerDef.getSchemaInfo(1));
+ SerializerDefinition newSerializerDef = new SerializerDefinition("test",
+ schemaInfo,
+ true,
+ null);
+
+ Serializer<GenericData.Record> serializer = new AvroResolvingGenericSerializer<GenericData.Record>(newSerializerDef);
+ GenericData.Record datum = new GenericData.Record(Schema.parse(newSerializerDef.getCurrentSchemaInfo()));
+ datum.put("f1", new Utf8("foo"));
+ byte[] bytes = serializer.toBytes(datum);
+
+ // Create a serializer with all versions
+ serializer = new AvroResolvingGenericSerializer<GenericData.Record>(serializerDef);
+ GenericData.Record datum1 = serializer.toObject(bytes);
+ assertTrue(datum1.get("f1").equals(datum.get("f1")));
+ assertTrue(datum1.get("f2")
+ .equals(new Utf8(Schema.parse(serializerDef.getCurrentSchemaInfo())
+ .getField("f2")
+ .defaultValue()
+ .getTextValue())));
+ }
+
+ public void testSingleInvalidSchemaInfo() {
+ SerializerDefinition serializerDef = new SerializerDefinition("test", "null");
+ try {
+ new AvroResolvingSpecificSerializer<SpecificRecord>(serializerDef);
+ } catch(SchemaParseException e) {
+ return;
+ }
+ fail("Should have failed due to an invalid schema");
+ }
+
+ public void testOneSpecificSchemaInfo() {
+ SerializerDefinition serializerDef = new SerializerDefinition("test", SCHEMA);
+ Serializer<SpecificRecord> serializer = new AvroResolvingSpecificSerializer<SpecificRecord>(serializerDef);
+ TestRecord obj = new TestRecord();
+ obj.f1 = new Utf8("foo");
+ obj.f2 = new Utf8("bar");
+ obj.f3 = 42;
+ byte[] bytes1 = serializer.toBytes(obj);
+ byte[] bytes2 = serializer.toBytes(obj);
+ assertEquals(ByteUtils.compare(bytes1, bytes2), 0);
+ assertTrue(obj.equals(serializer.toObject(bytes1)));
+ assertTrue(obj.equals(serializer.toObject(bytes2)));
+ }
+
+ public void testManySpecificSchemaInfo() {
+ Map<Integer, String> schemaInfo = new HashMap<Integer, String>();
+ schemaInfo.put(1, SCHEMA1NS);
+ schemaInfo.put(2, SCHEMA2NS);
+ schemaInfo.put(3, SCHEMA);
+ SerializerDefinition serializerDef = new SerializerDefinition("test",
+ schemaInfo,
+ true,
+ null);
+ Serializer<TestRecord> serializer = new AvroResolvingSpecificSerializer<TestRecord>(serializerDef);
+ TestRecord obj = new TestRecord();
+ obj.f1 = new Utf8("foo");
+ obj.f2 = new Utf8("bar");
+ obj.f3 = 42;
+ byte[] bytes1 = serializer.toBytes(obj);
+ byte[] bytes2 = serializer.toBytes(obj);
+ assertEquals(ByteUtils.compare(bytes1, bytes2), 0);
+ assertTrue(obj.equals(serializer.toObject(bytes1)));
+ assertTrue(obj.equals(serializer.toObject(bytes2)));
+ }
+
+ public void testVersionBytes() {
+ SerializerDefinition serializerDef = new SerializerDefinition("test", SCHEMA);
+ Serializer<SpecificRecord> serializer = new AvroResolvingSpecificSerializer<SpecificRecord>(serializerDef);
+ TestRecord obj = new TestRecord();
+ obj.f1 = new Utf8("foo");
+ obj.f2 = new Utf8("bar");
+ obj.f3 = 42;
+ byte[] bytes = serializer.toBytes(obj);
+ Byte versionByte = bytes[0];
+ assertEquals((Integer) (versionByte.intValue()),
+ (Integer) serializerDef.getCurrentSchemaVersion());
+ }
+
+ public void testMissingSchema() {
+ SerializerDefinition serializerDef = new SerializerDefinition("test", SCHEMA);
+ Serializer<SpecificRecord> serializer = new AvroResolvingSpecificSerializer<SpecificRecord>(serializerDef);
+ TestRecord obj = new TestRecord();
+ obj.f1 = new Utf8("foo");
+ obj.f2 = new Utf8("bar");
+ obj.f3 = 42;
+ byte[] bytes = serializer.toBytes(obj);
+ // Change the version bytes to something bogus
+ bytes[0] = ((Integer) 0xFF).byteValue();
+ try {
+ serializer.toObject(bytes);
+
+ } catch(Exception e) {
+ return;
+ }
+ fail("Should have failed due to a missing Schema");
+ }
+
+ public void testMigrateSpecificDatum() {
+ // First serializer
+ SerializerDefinition serializerDef1 = new SerializerDefinition("test", SCHEMA1);
+ Serializer<GenericData.Record> serializer1 = new AvroResolvingGenericSerializer<GenericData.Record>(serializerDef1);
+
+ // Second serializer
+ Map<Integer, String> schemaInfo = new HashMap<Integer, String>();
+ schemaInfo.put(1, SCHEMA1NS);
+ schemaInfo.put(2, SCHEMA2NS);
+ schemaInfo.put(3, SCHEMA);
+ SerializerDefinition serializerDef = new SerializerDefinition("test",
+ schemaInfo,
+ true,
+ null);
+ Serializer<TestRecord> serializer = new AvroResolvingSpecificSerializer<TestRecord>(serializerDef);
+ GenericData.Record datum = new GenericData.Record(Schema.parse(SCHEMA1));
+ datum.put("f1", new Utf8("foo"));
+
+ // Write it as the current Schema
+ byte[] bytes = serializer1.toBytes(datum);
+ // Fix the version bytes so it resolves the correct Schema
+ bytes[0] = ((Integer) 0x01).byteValue();
+
+ TestRecord datum1 = serializer.toObject(bytes);
+ assertTrue(datum1.f1.equals(datum.get("f1")));
+ assertTrue(datum1.f2.equals(new Utf8(Schema.parse(SCHEMA)
+ .getField("f2")
+ .defaultValue()
+ .getTextValue())));
+ assertEquals(datum1.f3, Schema.parse(SCHEMA).getField("f3").defaultValue().getIntValue());
+ }
+
+ public void testMigrateGenericDatum() {
+ // First serializer
+ SerializerDefinition serializerDef1 = new SerializerDefinition("test", SCHEMA1);
+ Serializer<GenericData.Record> serializer1 = new AvroResolvingGenericSerializer<GenericData.Record>(serializerDef1);
+
+ // Second serializer
+ Map<Integer, String> schemaInfo2 = new HashMap<Integer, String>();
+ schemaInfo2.put(0, SCHEMA1);
+ schemaInfo2.put(1, SCHEMA2);
+ SerializerDefinition serializerDef2 = new SerializerDefinition("test",
+ schemaInfo2,
+ true,
+ null);
+ Serializer<GenericData.Record> serializer2 = new AvroResolvingGenericSerializer<GenericData.Record>(serializerDef2);
+
+ GenericData.Record datum = new GenericData.Record(Schema.parse(SCHEMA1));
+ datum.put("f1", new Utf8("foo"));
+
+ // Write it as the current Schema
+ byte[] bytes = serializer1.toBytes(datum);
+
+ // Read it back as a different Schema
+ GenericData.Record datum1 = serializer2.toObject(bytes);
+ assertTrue(datum1.get("f1").equals(datum.get("f1")));
+ assertTrue(datum1.get("f2").equals(new Utf8(Schema.parse(SCHEMA2)
+ .getField("f2")
+ .defaultValue()
+ .getTextValue())));
+ }
+}
View
173 test/unit/voldemort/serialization/avro/AvroResolvingSpecificSerializerTest.java
@@ -0,0 +1,173 @@
+package voldemort.serialization.avro;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import junit.framework.TestCase;
+
+import org.apache.avro.Schema;
+import org.apache.avro.SchemaParseException;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.specific.SpecificRecord;
+import org.apache.avro.util.Utf8;
+
+import voldemort.TestRecord;
+import voldemort.VoldemortTestConstants;
+import voldemort.serialization.Serializer;
+import voldemort.serialization.SerializerDefinition;
+import voldemort.utils.ByteUtils;
+
+public class AvroResolvingSpecificSerializerTest extends TestCase {
+
+ String SCHEMA1 = VoldemortTestConstants.getTestSpecificRecordSchema1();
+ String SCHEMA2 = VoldemortTestConstants.getTestSpecificRecordSchema2();
+ String SCHEMA1NS = VoldemortTestConstants.getTestSpecificRecordSchemaWithNamespace1();
+ String SCHEMA2NS = VoldemortTestConstants.getTestSpecificRecordSchemaWithNamespace2();
+ String SCHEMA = TestRecord.SCHEMA$.toString();
+
+ public void testSingleInvalidSchemaInfo() {
+ SerializerDefinition serializerDef = new SerializerDefinition("test", "null");
+ try {
+ new AvroResolvingSpecificSerializer<SpecificRecord>(serializerDef);
+ } catch(SchemaParseException e) {
+ return;
+ }
+ fail("Should have failed due to an invalid schema");
+ }
+
+ public void testOneSpecificSchemaInfo() {
+ SerializerDefinition serializerDef = new SerializerDefinition("test", SCHEMA);
+ Serializer<SpecificRecord> serializer = new AvroResolvingSpecificSerializer<SpecificRecord>(serializerDef);
+ TestRecord obj = new TestRecord();
+ obj.f1 = new Utf8("foo");
+ obj.f2 = new Utf8("bar");
+ obj.f3 = 42;
+ byte[] bytes1 = serializer.toBytes(obj);
+ byte[] bytes2 = serializer.toBytes(obj);
+ assertEquals(ByteUtils.compare(bytes1, bytes2), 0);
+ assertTrue(obj.equals(serializer.toObject(bytes1)));
+ assertTrue(obj.equals(serializer.toObject(bytes2)));
+ }
+
+ public void testManySpecificSchemaInfo() {
+ Map<Integer, String> schemaInfo = new HashMap<Integer, String>();
+ schemaInfo.put(1, SCHEMA1NS);
+ schemaInfo.put(2, SCHEMA2NS);
+ schemaInfo.put(3, SCHEMA);
+ SerializerDefinition serializerDef = new SerializerDefinition("test",
+ schemaInfo,
+ true,
+ null);
+ Serializer<TestRecord> serializer = new AvroResolvingSpecificSerializer<TestRecord>(serializerDef);
+ TestRecord obj = new TestRecord();
+ obj.f1 = new Utf8("foo");
+ obj.f2 = new Utf8("bar");
+ obj.f3 = 42;
+ byte[] bytes1 = serializer.toBytes(obj);
+ byte[] bytes2 = serializer.toBytes(obj);
+ assertEquals(ByteUtils.compare(bytes1, bytes2), 0);
+ assertTrue(obj.equals(serializer.toObject(bytes1)));
+ assertTrue(obj.equals(serializer.toObject(bytes2)));
+ }
+
+ public void testVersionBytes() {
+ Map<Integer, String> schemaInfo = new HashMap<Integer, String>();
+ schemaInfo.put(1, SCHEMA);
+ SerializerDefinition serializerDef = new SerializerDefinition("test",
+ schemaInfo,
+ true,
+ null);
+ Serializer<SpecificRecord> serializer = new AvroResolvingSpecificSerializer<SpecificRecord>(serializerDef);
+ TestRecord obj = new TestRecord();
+ obj.f1 = new Utf8("foo");
+ obj.f2 = new Utf8("bar");
+ obj.f3 = 42;
+ byte[] bytes = serializer.toBytes(obj);
+ assertEquals(bytes[0], 1);
+ }
+
+ public void testMissingSchema() {
+ SerializerDefinition serializerDef = new SerializerDefinition("test", SCHEMA);
+ Serializer<SpecificRecord> serializer = new AvroResolvingSpecificSerializer<SpecificRecord>(serializerDef);
+ TestRecord obj = new TestRecord();
+ obj.f1 = new Utf8("foo");
+ obj.f2 = new Utf8("bar");
+ obj.f3 = 42;
+ byte[] bytes = serializer.toBytes(obj);
+ // Change the version bytes to something bogus
+ bytes[0] ^= (byte) 0xFF;
+ try {
+ serializer.toObject(bytes);
+
+ } catch(Exception e) {
+ return;
+ }
+ fail("Should have failed due to a missing Schema");
+ }
+
+ public void testMigrateSpecificDatum() {
+ // First serializer
+ Map<Integer, String> schemaInfo1 = new HashMap<Integer, String>();
+ schemaInfo1.put(1, SCHEMA1NS);
+ SerializerDefinition serializerDef1 = new SerializerDefinition("test",
+ schemaInfo1,
+ true,
+ null);
+ Serializer<GenericData.Record> serializer1 = new AvroResolvingGenericSerializer<GenericData.Record>(serializerDef1);
+
+ // Second serializer
+ Map<Integer, String> schemaInfo2 = new HashMap<Integer, String>();
+ schemaInfo2.put(1, SCHEMA1NS);
+ schemaInfo2.put(2, SCHEMA);
+ SerializerDefinition serializerDef2 = new SerializerDefinition("test",
+ schemaInfo2,
+ true,
+ null);
+ Serializer<TestRecord> serializer2 = new AvroResolvingSpecificSerializer<TestRecord>(serializerDef2);
+
+ // Write it as the old Schema
+ GenericData.Record datum = new GenericData.Record(Schema.parse(SCHEMA1NS));
+ datum.put("f1", new Utf8("foo"));
+ byte[] bytes = serializer1.toBytes(datum);
+ // Fix the version bytes so it resolves the correct Schema
+ bytes[0] = 1;
+
+ // Read is with the new serializer
+ Schema schema2 = Schema.parse(SCHEMA2NS);
+ TestRecord datum1 = serializer2.toObject(bytes);
+ assertTrue(datum1.f1.equals(datum.get("f1")));
+ assertTrue(datum1.f2.equals(new Utf8(schema2.getField("f2").defaultValue().getTextValue())));
+ }
+
+ public void testMigrateGenericDatum() {
+ // First serializer
+ SerializerDefinition serializerDef1 = new SerializerDefinition("test", SCHEMA1);
+ Serializer<GenericData.Record> serializer1 = new AvroResolvingGenericSerializer<GenericData.Record>(serializerDef1);
+
+ // Second serializer
+ Map<Integer, String> schemaInfo2 = new HashMap<Integer, String>();
+ schemaInfo2.put(1, SCHEMA1);
+ schemaInfo2.put(2, SCHEMA2);
+ SerializerDefinition serializerDef2 = new SerializerDefinition("test",
+ schemaInfo2,
+ true,
+ null);
+ Serializer<GenericData.Record> serializer2 = new AvroResolvingGenericSerializer<GenericData.Record>(serializerDef2);
+
+ GenericData.Record datum = new GenericData.Record(Schema.parse(SCHEMA1));
+ datum.put("f1", new Utf8("foo"));
+
+ // Write it as the current Schema
+ byte[] bytes = serializer1.toBytes(datum);
+ // Fix the version byte
+ bytes[0] = 1;
+
+ // Read it back as a different Schema
+ GenericData.Record datum1 = serializer2.toObject(bytes);
+ assertTrue(datum1.get("f1").equals(datum.get("f1")));
+ assertTrue(datum1.get("f2").equals(new Utf8(Schema.parse(SCHEMA2)
+ .getField("f2")
+ .defaultValue()
+ .getTextValue())));
+ }
+}
Something went wrong with that request. Please try again.