Fixes #12: Implement Schema.serializable #83

thinkharderdev · 2021-06-27T13:27:57Z

Create a MetaSchema data structure to represent the abstract structure of a schema and an associated Schema.Meta(spec: MetaSchema) extends Schema[Schema[_]] which is the "serialized" representation of a Schema[A].

This is about 90% implemented. The only outstanding items are to support diffing of Schema.Meta and add some additional unit tests.

Limitations and known issues:

The main limitation is that we lose all information about a Schema[A] except for the structure when serializing it (since we cannot serialized transform, extractFieldN, deconstruct, construct lambdas). As such the reified Schema that you get when you deserialize a Schema.Meta is always going to be a generic version of the original (i.e. Schema.CaseClassN -> Schema.GenericRecord and Schema.EnumN -> Schema.Enumeration).
It's not really obvious what we should do with Schema.CaseObject and the current approach discards all type information about the original object represented by the schema.

jdegoes · 2021-06-28T01:25:50Z

@thinkharderdev Thanks for working on this!

Rather than add a new subtype of Schema, I think you can implement Schema#serializable just by dynamically building up a new Schema which does not include Transform nodes (and which genericizes enums and records, I suppose). Now, this is not "type safe", because there is no way to know, looking at the schema, that it can be serialized and deserialized. But if we want to address that, we can in other ways (e.g. phantom type, path-dependent type, fixed point data, etc.).
I think the problem with case object is suggesting we do not have a correct representation for that. Probably, we need to modify CaseObject to store an id (as a String?), and then transform it. Or maybe change it to mean some fixed, static value, which can then be transformed to the case object. The key thing is we are missing a generic notion of "this is fixed, static data" in a way that doesn't bundle it with a user-defined data type; whereas for records and enumerations, we have the ability to "lose" the user-defined data types by genericizing (converting to generic records / generic enumerations with just the labels and schemas for the terms).

jdegoes · 2021-06-28T01:42:10Z

Case objects are isomorphic to unit. So maybe that's another way to handle them: represent them as unit, which is then transformed to the case object. In a sum of case objects, the term ids would be the case object names (names of the subtype of the sum type).

jdegoes · 2021-06-29T20:05:30Z

zio-schema/shared/src/main/scala/zio/schema/MetaSchema.scala

+    case Schema.Sequence(schema, _, _)                  => Sequence(fromSchema(schema))
+    case Schema.Fail(message)                           => Fail(message)
+    case Schema.Transform(schema, _, _)                 => fromSchema(schema)
+    case lzy @ Schema.Lazy(_)                           => fromSchema(lzy.schema)


We should test this on a schema like JSON which is recursive. I think our "schema language" may have to embed references in order to handle recursion (or some sort of "fixed point" operator).

Yeah, that is going to be a problem. I can work on this today.

The JSON Schema doc uses "pointers" for recursion.

To track recursion, you can use a map based on object identity. This will allow you to detect if you processed the same field before. It might also be possible to define equals / hashCode on the ordinary Schema classes in such a way that it terminates even for recursive structures.

jdegoes · 2021-06-29T20:06:17Z

zio-schema/shared/src/main/scala/zio/schema/MetaSchema.scala

+  final case class Duration(units: TemporalUnit) extends MetaSchema
+
+  object Duration {
+    implicit val schema: Schema[Duration] = Schema[String].transformOrFail(


Really nice!

jdegoes · 2021-06-29T20:06:38Z

zio-schema/shared/src/main/scala/zio/schema/MetaSchema.scala

+    implicit val chunkSchema: Schema[Chunk[MetaCase]] = Schema.chunk(schema)
+  }
+
+  final case class Value(valueType: StandardType[_]) extends MetaSchema


Love the added type safety. ❤️

This allows us to check for cyclic references while building the tree and replace recursive references with pointers. Also fixed a bug with JSON encoding/decoding of recursive data types and fixed derivation of recursive ADTs by inserting laziness into the implicit conversions.

thinkharderdev · 2021-07-04T14:41:16Z

Went back to the drawing board and came up with what I think is a much better encoding for the schema serialization.

Modelling the the "meta schema" as an AST now which has two nice benefits:

We can handle recursive references when we are building the tree and replace recursive subtrees with a pointer to the ancestor.
The resulting JSON encoding is more intuitive and human-readable. I think it will more naturally transform to other encodings (JSON schema, Avro schema, etc) as well (I think....)

Also fixed the issue with JSON serialization/deserialization which turns out was just a regular bug. Still having an issue with the protobuf codecs on recursive data types but I assume that is just a bug as well. Need to do some more debugging though to be sure.

…finite recurse

jdegoes · 2021-07-11T18:24:40Z

zio-schema/shared/src/test/scala/zio/schema/SchemaAssertions.scala

-    case (_: Schema.Lazy[_], _) => true // equalsAst(expected.schema, actual)
-    case (_, _: Schema.Lazy[_]) => true // equalsAst(expected, actual.schema)
-    case _                      => false
+      equalsAst(expected, actual, depth)


Laziness can be tested via reference equality, assuming we are not making mistakes anywhere. It should not be tested for structural equality since that will lead to infinite checks (= stack overflows).

I think reference equality isn't what we want here. We need to test that that expected and actual have the same AST. They may not have (and in most cases we care about w/r/t tests will not) have the same type.

jdegoes · 2021-07-11T22:36:29Z