-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #12: Implement Schema.serializable #83
Conversation
@thinkharderdev Thanks for working on this!
|
Related: Case objects are isomorphic to unit. So maybe that's another way to handle them: represent them as unit, which is then transformed to the case object. In a sum of case objects, the term ids would be the case object names (names of the subtype of the sum type). |
case Schema.Sequence(schema, _, _) => Sequence(fromSchema(schema)) | ||
case Schema.Fail(message) => Fail(message) | ||
case Schema.Transform(schema, _, _) => fromSchema(schema) | ||
case lzy @ Schema.Lazy(_) => fromSchema(lzy.schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should test this on a schema like JSON which is recursive. I think our "schema language" may have to embed references in order to handle recursion (or some sort of "fixed point" operator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is going to be a problem. I can work on this today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON Schema doc uses "pointers" for recursion.
To track recursion, you can use a map based on object identity. This will allow you to detect if you processed the same field before. It might also be possible to define equals / hashCode on the ordinary Schema
classes in such a way that it terminates even for recursive structures.
final case class Duration(units: TemporalUnit) extends MetaSchema | ||
|
||
object Duration { | ||
implicit val schema: Schema[Duration] = Schema[String].transformOrFail( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice!
implicit val chunkSchema: Schema[Chunk[MetaCase]] = Schema.chunk(schema) | ||
} | ||
|
||
final case class Value(valueType: StandardType[_]) extends MetaSchema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the added type safety. ❤️
This allows us to check for cyclic references while building the tree and replace recursive references with pointers. Also fixed a bug with JSON encoding/decoding of recursive data types and fixed derivation of recursive ADTs by inserting laziness into the implicit conversions.
Went back to the drawing board and came up with what I think is a much better encoding for the schema serialization. Modelling the the "meta schema" as an AST now which has two nice benefits:
Also fixed the issue with JSON serialization/deserialization which turns out was just a regular bug. Still having an issue with the protobuf codecs on recursive data types but I assume that is just a bug as well. Need to do some more debugging though to be sure. |
case (_: Schema.Lazy[_], _) => true // equalsAst(expected.schema, actual) | ||
case (_, _: Schema.Lazy[_]) => true // equalsAst(expected, actual.schema) | ||
case _ => false | ||
equalsAst(expected, actual, depth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Laziness can be tested via reference equality, assuming we are not making mistakes anywhere. It should not be tested for structural equality since that will lead to infinite checks (= stack overflows).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think reference equality isn't what we want here. We need to test that that expected
and actual
have the same AST. They may not have (and in most cases we care about w/r/t tests will not) have the same type.
|
||
import zio.{ Chunk, ChunkBuilder } | ||
|
||
sealed trait Ast { self => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern here is with the name Ast
. Because we are in the zio.schema
package, it would be nice if we can shoot for import zioi.schema._
in most code bases. But the name Ast
is quite common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good paint. Maybe SchemaAst
?
case Schema.Meta(ast) => ast | ||
} | ||
|
||
def subtree(schema: Schema[_], lineage: Chunk[Int], optional: Boolean = false, dimensions: Int = 0): Ast = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this one package private?
} | ||
} | ||
|
||
def materialize(ast: Ast, refs: Map[Int, Ast] = Map.empty): Schema[_] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Package private?
|
||
} | ||
|
||
object AstRenderer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Package private?
} | ||
} | ||
|
||
implicit lazy val schema: Schema[Ast] = DeriveSchema.gen[Ast] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe expand for long-term stability? What do you think?
A giant undertaking. Let's do any revisions as followups. Thanks for your work on this! |
* Remove setup gpg * Update setup-java * Test * test2 * test 3 * Fixes zio#78: Implement diffing between instances described by a schema * Specs and additional implementations * Rename DiffAlgorithm to Differ * Add diffs for product and sum types * Copy Myers diff implementation from zio-test * Implement binary diff and unit tests * Fix various temporal diffs and add unit tests * formatting * Test coverage and tweaks * Fixes zio#12: Implement Schema.serializable * Formatting * Remove commented code * Refactoring for clarity and type safety * Refactor for clarity * Test coverage and (ignored) test cases for recursive data types * linting * Better encoding of serializable schema with an Abstract Syntax Tree. This allows us to check for cyclic references while building the tree and replace recursive references with pointers. Also fixed a bug with JSON encoding/decoding of recursive data types and fixed derivation of recursive ADTs by inserting laziness into the implicit conversions. * Fix bugs in protobuf decoding * 2.12 does not suport by-name implicit parameters * Re-enable dynamic value test for recursive data types * AST materialization and unit tests * Remove ref method from Schema * Preserve ref map on recursive materializations * Add multi-dimensionality to schema to account for sequences of sequences * Do AST comparison on Lazy schemas by limiting stack depth to avoid infinite recurse * Rename Ast to SchemaAst to avoid conflicts * Appease the 2.12 compiler Co-authored-by: thinkharder <thinkharderdev@users.noreply.github.com>
Create a
MetaSchema
data structure to represent the abstract structure of a schema and an associatedSchema.Meta(spec: MetaSchema) extends Schema[Schema[_]]
which is the "serialized" representation of aSchema[A]
.This is about 90% implemented. The only outstanding items are to support diffing of
Schema.Meta
and add some additional unit tests.Limitations and known issues:
Schema[A]
except for the structure when serializing it (since we cannot serialized transform, extractFieldN, deconstruct, construct lambdas). As such the reifiedSchema
that you get when you deserialize aSchema.Meta
is always going to be a generic version of the original (i.e.Schema.CaseClassN
->Schema.GenericRecord
andSchema.EnumN
->Schema.Enumeration
).Schema.CaseObject
and the current approach discards all type information about the originalobject
represented by the schema.