Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect SQL Expansion of Joined Query With Impure Infix #1843

Open
deusaquilus opened this issue May 3, 2020 · 0 comments
Open

Incorrect SQL Expansion of Joined Query With Impure Infix #1843

deusaquilus opened this issue May 3, 2020 · 0 comments

Comments

@deusaquilus
Copy link
Collaborator

deusaquilus commented May 3, 2020

Version: (e.g. 3.5.1)
Module: (e.g. quill-spark)
Database: (e.g. spark)

Setup:

import io.getquill._
import io.getquill.context.spark.SparkDialect
import io.getquill.context.spark.SparkIdiom 
import QuillSparkContext._

val sparkSession =
  SparkSession
    .builder()
    .config("spark.sql.shuffle.partitions", 2) // Default shuffle partitions is 200, too much for tests
    .config("spark.ui.enabled", "false")
    .master("local[1]")
    .appName("spark test")
    .getOrCreate()

Actual Behavior

When creating a query with an impure infix and a nested join:

case class Person(id: Int, name: String, age: Int) 
case class Address(ownerFk: Int, street: String) 

val joinAddresses = quote {
  (p: Person) =>
    for { a <- query[Address].join(a => a.ownerFk == p.id) } yield a
}

// The following SQL is produced
// SELECT a._1 AS _1, a._2 AS _2, a._3 AS _3 FROM Person p, (SELECT struct(p.*), struct(a.*), foo AS _3 
// FROM INNER JOIN Address a ON a.ownerFk = p.id) AS a

The SQL syntax is invalid because FROM INNER JOIN ... needs to have a table after the FROM part e.g. FROM (ds10) INNER JOIN ....
Also, the query itself is incorrect.

Expected Behavior

First of all, the Person p clause should be in the inner select. Also, there should only be one clause in the top select.

// SELECT a._1 AS _1, a._2 AS _2, a._3 AS _3 FROM (SELECT struct(p.*), struct(a.*), foo AS _3 
// FROM Person p INNER JOIN Address a ON a.ownerFk = p.id) AS a

Additionally, the struct selects of the inner query should have aliases:

// SELECT a._1 AS _1, a._2 AS _2, a._3 AS _3 FROM (SELECT struct(p.*) AS _1, struct(a.*) as _2, foo AS _3 
// FROM Person p INNER JOIN Address a ON a.ownerFk = p.id) AS a

Note

Alternatively, if you create the a context like this:

val ctx = new SqlMirrorContext[SparkIdiom, Literal](SparkDialect, Literal) 

... and then run the query:

// The following SQL is produced (which is also invalid):
// SELECT a._1.id AS _1, a._1.name AS _2, a._1.age AS _3, a._2.ownerFk AS _4, a._2.street AS _5, a._3 AS _6 
// FROM Person p, (SELECT struct(p.*), struct(a.*), foo AS _3 FROM INNER JOIN Address a ON a.ownerFk = p.id) AS a

This query is a bit different but also invalid. They likely use a common problematic functionality.

Workaround

Do not use impure infixes with Spark queries.

@getquill/maintainers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant