Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Quill-Application-Types (Quats) to AST #1911

Merged
merged 4 commits into from Jul 22, 2020

Conversation

deusaquilus
Copy link
Collaborator

@deusaquilus deusaquilus commented Jul 17, 2020

Here's the brief summary:

  • Many features are not possible to implement in Quill without having a proper type-system i.e. every Ident knowing it's type. This includes proper handling of booleans in SQL databases which do not support true/false values (SQL Server Boolean Literals Incorrectly Translated #1685).
  • Query Expansion with Spark does not work properly when tuples are used, this is because Spark needs to select aliases of inner queries by name instead of by an expansion the way regular SQL does (i.e. in ExpandNestedQueries). Issues Incorrect SQL Expansion of Joined Query With Impure Infix #1843 and Improper Handling of Spark Nested Aliases for Tuple Selection Clauses #1912 are symptoms of this.
  • As it is, ExpandNestedQueries is buggy. For example, a subquery with .nested will not always include every needed field in the super-query select expansion . Additionally, ExpandNestedQueries has never been able to properly expand things like escaped clauses. For example, the Oracle functionality to escape all clauses starting with the "_" character still does not properly work .
  • The phase RenameProperties has become very complicated as a result of having to propagate schemas of embedded objects, a mechanism called "Schema Protraction" has been created to address this issue but it substantially adds to the knowledge-debt of this phase. It has had a continual stream of issues due to its complex nature here and with the added complexity, it is becoming unmaintainable.
  • Many phases such as ExpandDistinct are hard to reason about because they do not adhere to any notion of type-fullness. This has caused things like Additional Fixes for Embedded Entities in Nested Queries #1628.
  • In order to provide a way forward, a basic Quill-specific type system must be introduced which can incorporate more and more use-cases in the future. Not all Quill query transformation functionality should be replaced at the same time as that would be too great an impact. Rather, an evolutionary strategy going forward should be introduced (metaphorically speaking, think Introns/Exons in flagella that took everything to the next level ;-)).

Here's the slightly longer version. I have been maintaining Quill for several years now and as I began fixing some complex issues, a pattern seemed to emerge where more and more problems began to arise as I introduced more and more complex behaviors to remedy various bugs. At some point, I realized that Quill can only be stabilized if the space of possibility of the AST itself is substantially reduced.

More specifically, unless we introduce some notion of types into AST, we will never be able to expand subqueries correctly, treat expressions correctly, or even alias/rename clauses correctly. This problem most notably manifests itself in ExpandNestedQueries and RenameProperties where we have built very, very complex machinery to unfold parent-queries into sub-queries in order to understand what is actually going on. As I piled more and more complexity into this machinery in order to fix existing bugs, more and more bugs appeared on the horizon and at some point, I realized that I was proverbially chasing my own tail. Trying to expand sub-queries based on things selected from super-queries is a fundamentally flawed proposition. We need to know what fields an alias actually has inside of it to get this kind of expansion correct.

To make matters worse, strange behaviors began to emerge. In order to fix tuple handling in quill-spark, I had to treat every single kind of SelectValue Ident as a tuple which required prefixing things with "_1" which where single-value selections. Queries would start or stop working of .nested was applied to contexts that seemed to make no sense. Much of this resulted from relaxing restrictions about the kinds of expressions that infix and express (see #1534) which added to the variety of nested-ASTs possible, which then caused a multitude of nested query expansion bugs like #1864 and #1905.

Reflecting on various approaches to types, a bare minimal strategy was chosen. Every AST node will have a .quat field that represents it's Quill-Application-Type. Since most AST elements merely reflect their inner types, only things like Ident, Infix, and several others actually need to have a Quat-type directly inferenced from the parser. Things like FlatMap(a,b,c).quat for example are just c.quat.
For now, I have decided to only introduce three Quat-Types with the intention of adding more (e.g. Booleans) in the future. The are:

Quat.Product
  Representing both Case Classes and Tuples. The latter which will have all their fields as _1, _2 etc...

Quat.Value
  Representing values (e.g. which can potentially be encoded/decoded). Since checking this requires 
  summoning encoders over and over again which is compotationally expensive a "rule of thumb" can 
  be introduced that treates anything which is not a case class type, a type-bounds type and/or things 
  that are final as a Quat.Value. This may need to be refined in the future and various caches may also 
  need to be added.
  (Currently there is a type->quat cache but a encoder cache may also be needed)

Quat.Null and Quat.Generic
  These are needed to represent types whose values are not fully known yet. Typically they should be
  "filled in" by beta reduction of various transformation phases.

Fixes #1912, #1913, and paves the way for many others.

(Continue reading below)

@deusaquilus deusaquilus changed the title Adding Quill-Application-Types (Quats) to AST [WIP] Adding Quill-Application-Types (Quats) to AST Jul 17, 2020
@deusaquilus deusaquilus force-pushed the quats_compressed branch 3 times, most recently from 0b52d0d to a164791 Compare Jul 21, 2020
@deusaquilus
Copy link
Collaborator Author

deusaquilus commented Jul 21, 2020

Adding Quats to the Quill AST is a very large endeavor because all of the transformation phases now need to synthesize Quat-Type-correct trees. ApplyMap for example needs to replace Map-subtrees and reduce various Idents, these now need to be typed as Quat Tuples.

In SymbolicReduction, the AttachToEntity phase can no longer arbitrarily use an "x" alias to synthesize an unused Ident because this can and will conflict with "x" aliases from outer FlatMap nodes, causing the Normalization to think that they belong to those outer clauses. For this reason, I have made AttachToEntity create an Alias which uses a UUID (i.e. so it is not guaranteed to conflict with anything) and I have added a final stage in AvoidAliasConflict Normalization to change it back to "x" or the first free x[1-9]+ Ident available.

I have left ExpandJoin alone despite a desire to re-write it. Since this phase introduces aliases that technically don't exist in the functional code (i.e. the Scala equivalent of the Quill transformation), I have decided to leave it alone for now. The following gist presents an alternative implementation that is type-safe. (More details are presented in the Comments there).

I have decided to add a check to BetaReduction that makes sure that any reduction of one Ast element into another has a corresponding Quat that can actually be reduced. That is to say, for any beta reduction (x:Ast) -> (y:Ast), the y.quat must be equivalent or a sub-type of x.quat. Since it is impractical to implement a whole inheritance hierarchy into Quats, I have decided to use the duck-typing paradigm instead and make sure to check that if y.quat contains all of the fields of x.quat, the replacement can indeed be done. The one exception to this is if either x.quat or y.quat are Quat.Generic or Quat.Null. In this case, the Quat of the Beta Reduction will be whatever quat (of either x or y) is more concrete. If one of these Quats is a Quat.Product that has nulls or generics inside, this rule is applied recursively.

In general, I have tried to add various comments to Transformation phases that I have found particularly difficult to understand (e.g. the first case in Symbolic reduction, some ApplyMap cases etc...). This by no means is a complete coverage but should provide the beginnings of a documentation body of Quill internals.

@deusaquilus
Copy link
Collaborator Author

deusaquilus commented Jul 21, 2020

One note about AST changes. In things like Ident it is important to exclude the Quat construct from the hashcode/equals methods that BetaReduction relies on since Quat.Generic or Quat.Null (or even a sub-type Quat.Product) can be reduced to/from an Ident that should match (i.e. because it has the same alias). For this reason, I have introduced elements such as IdentId into the AST. These are strictly responsible for only the equals/hashcode methods of things like Ident so that it is properly treated in beta-reduction.

Another important thing to mention is the performance impact of such a change. I have profiled the QuatMaking class which parses Scala AST into a Quat in order to detect macro slowness. The best indication of whether a Type is actually a Quat.Value is actually to check if there is an Encoder for that type. Since it is a very large performance impact to attempt summoning an encoder for every single Scala Ident's q"$id".tpe, I have decided to check this only in a last resort and also treat any final class as a value class. A Encoder only needs to be summoned in the event that there is a type-boundary element in order to clearly indicate that the plugged-in expression should be polymorphic.
Additionally, since when resolving a generic parameter, the asSeenFrom command needs to be used (and this is also quite slow), a cache of what Quats have been generated from what Types has been added.

Since the change introducing Quats generally has required additional normalizations resulting from the new requirements (see #1913 for more detail, also thanks @mjsmith707 for bringing this to my attention), it has become necessary to address the inefficiencies of BetaReduction that have been introduced as a result of Ast Opinions. This is fairly straightforward because Opinions are never supposed to be considered part of the equals/hashcode methods in the first place.

Another performance impact has affected the build itself. Because Quill currently builds quill-core and quill-sql (both JVM and JS variants) inside of the db_build which among other things requires pulling a 16GB Oracle image, there does not remain enough memory on the machine to build the quill-sqlJS deliverable. For this reason, I have moved the JS builds out into a separate container and have allocated more memory to their sbt launches.

@deusaquilus
Copy link
Collaborator Author

deusaquilus commented Jul 21, 2020

One other consequence of using reflection to produce Quats is that this needs to be done from the Dynamic API as well as the static one. The Dynamic API relies on ClassTags as opposed to Type tags and generally has looser restrictions on types since fewer inferences are possible as type-erasure has already happened. For this reason, the phase RepropagateQuats has been introduced which takes Quats from Entities inside AST elements such as Map(a,b,c) and FlatMap(a,b,c) writing them into the Idents (i.e. from a to b variables in the example above). This has been added to translateDynamic in ContextMacro.

@deusaquilus deusaquilus changed the title [WIP] Adding Quill-Application-Types (Quats) to AST Adding Quill-Application-Types (Quats) to AST Jul 21, 2020
@juliano
Copy link
Collaborator

juliano commented Jul 21, 2020

Thank you for your work here @deusaquilus, it is remarkable. A very smart approach, clearly based on your background maintaining the tool for quite some time, which can address most of our current problems, while paving Quill's future! 👏

Given the amount of knowledge you have gathered, I would suggest to open loads of issues regarding the addition of other Quats (and probably there are many more related changes). I believe it will help to spread the knowledge among current maintainers/contributors and possibly welcome new ones, assuming it would be a good amount of new issues and possibly some of them could be labelled as "good-first-issue".

@deusaquilus deusaquilus merged commit 47690c1 into zio:master Jul 22, 2020
1 check passed
@deusaquilus
Copy link
Collaborator Author

deusaquilus commented Jul 22, 2020

@juliano

I would suggest to open loads of issues regarding the addition of other Quats

Here's a post which is hopefully the first of many. This is how to implement a fix booleans in SQL Server and Oracle using Quats and StatelessTransformation:
#1685 (comment)

I have also added the quats-feature to mark issues that are now fixable with Quats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants