Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Express Infix Clauses not in Select List #1597

Merged
merged 2 commits into from
Sep 11, 2019
Merged

Conversation

deusaquilus
Copy link
Collaborator

@deusaquilus deusaquilus commented Sep 8, 2019

Fixes (first issue) #1583, (second issue) #1598, (third issue) #1564
Also fixes #1580

Problem

First Issue

As explained in #1583, select elements that are infix clauses cannot be simply removed from a query when the are not being selected (at least if they are not pure). For example:

val q = quote {
  query[Person].map(p => (infix"DISTINCT ON (${p.other})".as[Int], p.name, p.id)).map(t => (t._2, t._3))
}
// Normally Produces:
// SELECT p.name, p.id FROM person p
// ... this totally breaks the intent of the query

This typically occurs where multiple map clauses are combined into a single one. Since clearly this cannot be done for the aforementioned infix clauses, we need to express the chained map clauses as sub-super queries which Quill already does, however, when deciding which select-values should be in the sub-query Quill excludes the values of elements that are not in the outer select list. This is perfectly acceptable in normal cases e.g:

run {
  query[Person].map(p => (p.id, p.name, p.other)).nested.map(t => (t._1, t._2))
}
// SELECT p.id, p.name FROM (SELECT x.id, x.name FROM person x) AS p
// The 'p.other' property is excluded, this is perfectly reasonable and actually is an optimization.
// (SIDE NOTE: Although technically the SQL optimizer should do it for us... it doesn't always do it, especially when this column is actually coming from a view, from a view, from a view etc...)

However not in situations where the are infix clauses being used as explained above.
For this reason, we need to find which infixes of every sub-query have not been expressed in the expandSelect (now refactored into ExpandSelect) method and then put them back into the sub-query. This is tricky because these infixes could be inside of tuples or case classes whose sibling elements have been selected so blindly including all SelectValue objects with infixes will cause duplicate fields to be selected. For this reason, we need to recursively traverse all SelectValue objects containing case classes and tuples and extract the infix values inside.

During the traversal, we need to keep track of which element inside the respective case class or tuple we are in and check if that element has already been expressed as a SelectValue because it matched some Property of the output. Since arbitrary things can be selected inside of arbitrary paths inside case-classes and tuples, we need to keep track of not only the order of elements in the initial selection (of the sub-query) but also which element of the respective tuple (or sub-tuple, or sub-sub tuple) the infix is in. For this reason, we created the OrderedSelect object which keeps a List[Int] that represents the "path" of an element inside a list of select values. For example, say you have a query like this:

case class Person(id:Int, name:String, age:String)
val q = quote {
  query[Person]
  .map(p => (p.id, (p.name, infix"RANK() OVER(ORDER BY ${p.age})")))
  .map(t => (t._1, t._2._1))
}

Once the SqlQuery(ast) has been created, it looks something like this:

Map(
  Map(
    Entity("Person", List()),
    Ident("p"),
    Tuple(List(Property(Ident("p"), "id"), Tuple(List(Property(Ident("p"), "name"), Infix(List("RANK() OVER(ORDER BY ", ")"), List(Property(Ident("p"), "age")), false)))))
  ),
  Ident("t"),
  Tuple(List(Property(Ident("t"), "_1"), Property(Property(Ident("t"), "_2"), "_1")))
)

The "orders" of the elements are the following:
Order = 1 - Property(Ident("p"), "id")
Order = 2 - Tuple(List(Property(Ident("p"), "name"), Infix(List("RANK() OVER(ORDER BY ", ")"), List(Property(Ident("p"), "age")), false)
Order = 2,1 - Property(Ident("p"), "name")
Order = 2,2 - Infix(List("RANK() OVER(ORDER BY ", ")")

Now, since we have already selected element 2,2 (i.e. Property(Ident("p"), "name")) we cannot just select element 2 (i.e. the entire tuple since that would cause the name property to be selected twice (i.e. SELECT ... FROM (SELECT p.id, p.name, p.name, RANK() ...)). For this reason, we need to search down into element 2 into 2,2 and then pull out the infix. Then we need to put this infix into the correct place in the resulting query.

Second Issue

A related issue #1583 is where tuples are mapped to ad-hoc case classes and then are part of a nested select. This typically breaks because when in certain situations, multiple nested clauses exist in the AST (i.e. Nested(Nested(q)) and sub-query fields cannot be attached to the corresponding elements because of how the nested query expansion works.

case class TestEntity(s: String, i: Int, l: Long, o: Option[Int]) extends Embedded
case class Dual(ta: TestEntity, tb: TestEntity)

val qr1 = quote {
  query[TestEntity]
}

val q = quote {
  qr1.join(qr1).on((a, b) => a.i == b.i).nested.map(both => both match { case (a, b) => Dual(a, b) }).nested
}

println(run(q).string)

This occurs:

cmd21.sc:1: exception during macro expansion: 
java.lang.IndexOutOfBoundsException: 1
	at scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
	at scala.collection.immutable.List.apply(List.scala:84)
	at io.getquill.context.sql.norm.ExpandNestedQueries$.io$getquill$context$sql$norm$ExpandNestedQueries$$expandReference$1(ExpandNestedQueries.scala:90)
	at io.getquill.context.sql.norm.ExpandNestedQueries$.io$getquill$context$sql$norm$ExpandNestedQueries$$expandReference$1(ExpandNestedQueries.scala:80)
	at io.getquill.context.sql.norm.ExpandNestedQueries$$anonfun$expandSelect$1.apply(ExpandNestedQueries.scala:108)
	at io.getquill.context.sql.norm.ExpandNestedQueries$$anonfun$expandSelect$1.apply(ExpandNestedQueries.scala:108)
	at scala.collection.immutable.List.map(List.scala:288)

This is due to the fact that there are multiple Nested clauses in the AST:

// show(SqlNormalize(q.ast)) 
Map(
  Nested(
    Nested(
      Map(
        Join(
          InnerJoin,
          Entity("TestEntity", List()),
          Entity("TestEntity", List()),
          Ident("a"),
          Ident("b"),
          BinaryOperation(Property(Ident("a"), "i"), ==, Property(Ident("b"), "i"))
        ),
        Ident("ab"),
        Tuple(List(Ident("a"), Ident("b")))
      )
    )
  ),
  Ident("both"),
  CaseClass(List(("ta", Property(Ident("both"), "_1")), ("tb", Property(Ident("both"), "_2"))))
)

This leads to an AST that looks like this:

// show(SqlQuery(SqlNormalize(q.ast))) 
FlattenSqlQuery(
  List(
    QueryContext(
      FlattenSqlQuery(
        List(
          QueryContext(
            FlattenSqlQuery(
              List(
                JoinContext(
                  InnerJoin,
                  TableContext(Entity("TestEntity", List()), "a"),
                  TableContext(Entity("TestEntity", List()), "b"),
                  BinaryOperation(Property(Ident("a"), "i"), ==, Property(Ident("b"), "i"))
                )
              ),
              None,
              None,
              List(),
              None,
              None,
              List(SelectValue(Ident("a"), None, false), SelectValue(Ident("b"), None, false)),
              false
            ),
            "x"
          )
        ),
        None,
        None,
        List(),
        None,
        None,
        List(SelectValue(Ident("x"), None, false)), // (b) From here?
        false
      ),
      "both"
    )
  ),
  None,
  None,
  List(),
  None,
  None,
  List(
    SelectValue(
      // (a) We need to look up _1 and _2......
      CaseClass(List(("ta", Property(Ident("both"), "_1")), ("tb", Property(Ident("both"), "_2")))),
      None,
      false
    )
  ),
  false
)

Notice that we need to lookup the properties _1 and _2 from the x variable in the middle? This will obviously fail because x does not contain these values. The problem here is that the extra nested clause produces and incorrect expression between the _1 and _2 keys and the a and b select values to which they refer. Collapsing the Nested(Nested(q)) inside of SqlQuery solves this problem.

Third Issue

The third issue involves using an embedded entity inside of a query using distinct. In addition to potentially having a double nesting issue (Second Issue), this kind of query fails in the ValidateSqlQuery step because it's elements are not properly expanded. This occurs because root-level tuples in map-clauses and root-level ad-hoc case classes are not treated equivalently. Take for instance the following nested query that maps to a tuple:

case class Emb(a: Int, b: Int)
val q = quote {
  query[Emb].map(e => (1, e)).distinct 
}
run(q)
// SELECT e._1, e._2a, e._2b FROM (SELECT DISTINCT 1 AS _1, e.a AS _2a, e.b AS _2b FROM emb e) AS e

The SqlQuery gets expanded to the following:

// SqlQuery(SqlNormalize(q.ast)) 
FlattenSqlQuery(
  List(TableContext(Entity("Emb", List()), "e")),
  None,
  None,
  List(),
  None,
  None,
  List(SelectValue(Constant(1), None, false), SelectValue(Ident("e"), None, false)),
  true
)

Then take the following nested query that maps to an ad-hoc case class:

case class Parent(id: Int, emb1: Emb)
case class Emb(a: Int, b: Int) extends Embedded
val q = quote { 
  query[Emb].map(e => Parent(1, e)).distinct
}
run(q)
// java.lang.IllegalStateException: The monad composition can't be expressed using applicative joins. Faulty expression: '(1, e)'. Free variables: 'List(e)'.
// 	at io.getquill.util.Messages$.fail(Messages.scala:21)
// 	at io.getquill.context.sql.idiom.SqlIdiom$$anonfun$3.apply(SqlIdiom.scala:43)
// 	at io.getquill.context.sql.idiom.SqlIdiom$$anonfun$3.apply(SqlIdiom.scala:43)
// 	at scala.Option.map(Option.scala:146)

The SqlQuery gets expanded to the following:

// SqlQuery(SqlNormalize(q.ast)) 
FlattenSqlQuery(
  List(TableContext(Entity("Emb", List()), "e")),
  None,
  None,
  List(),
  None,
  None,
  List(SelectValue(CaseClass(List(("id", Constant(1)), ("emb1", Ident("e")))), None, false)),
  true
)

Notice that in the former, the tuple has been flattened to an array of SelectValue elements as opposed to the latter has not.

The difference in behavior also has an impact on ExpandNestedQueries. Notice for instance that tuple indices are used to de-reference the Nth element of a given select:

        case pp @ Property(_, TupleIndex(idx)) =>
            select(idx) match {
              case OrderedSelect(o, SelectValue(ast, alias, c)) =>
                OrderedSelect(o, SelectValue(ast, concat(alias, idx), c))
            }

The reason why Quill behaves differently for Tuples and Ad-Hoc case classes is due to the treason that tuples are used as both a row-coproduct type i.e. most notably from applicative joins, as well as an element type from a standard map method. Due to having this double-meaning, Quill automatically expands element-type types inside of SqlQuery so that they behave the same was as coproduct-type tuples. Now when using Ad-Hoc case classes as coproduct-types (i.e. with the use of Embedded), some additional effort needs to be taken in order to expand them properly prior to verification in VerifySqlQuery. This allows VerifySqlQuery to properly exclude sub-element identities (i.e. identities inside of SelectValue(CaseClass(...)) elements).

Potential Issues in Future

Are there situations where SELECT ... FROM (SELECT C1, .... RANK() (...)) queries will change their results by the mere exclusion of a column that C1 that is excluded from the outer select. If this is the case, there should exist a mode that dissallows ExpandSelect from excluding ANY columns from sub-queries. This is fairly straightfoward to do now since we are basically doing this for infixes, whereas instead of just infixes, we would do it for all kinds of columns.

Trace

Due to the complex nature of ExpandNestedQueries and the AST transformations in general, I have decided to add some tracing code that can give the user more insight into what is going on with these operations. This has been instrumented as a Interpolator so as to clearly distinguish itself from the surrounding code. Since these things are considered a "side effect" and are to be avoided in functional code (at least by some schools of thought), I have introduced various methods such as andReturn and andContinune to the trace Interpolator that should allow the user to keep code in the functional style, at least in some places.

Conclusion

Since map-chained infixes are the most typical cause of nested queries, and distincts are a close-second, all three of these issues are closely related and require the same set of functionality in ExpandNestedQueries and VerifySqlQuery in order to function properly. Therefore I have chosen to bundle them into a single PR.

Checklist

  • Unit test all changes
  • Update README.md if applicable
  • Add [WIP] to the pull request title if it's work in progress
  • Squash commits that aren't meaningful changes
  • Run sbt scalariformFormat test:scalariformFormat to make sure that the source files are formatted

@getquill/maintainers

@deusaquilus deusaquilus force-pushed the fix_missing_infixes branch 8 times, most recently from da90a2f to 86f9965 Compare September 10, 2019 00:06
@deusaquilus deusaquilus changed the title [WIP] Express Infix Clauses not in Select List Express Infix Clauses not in Select List Sep 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nested Query with Multi-Element Tuple Crashes
1 participant