-
Notifications
You must be signed in to change notification settings - Fork 213
feat: pass ignore_nulls flag to first and last #1866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2b122ed
to
692332f
Compare
The linked PR did not add a test. Would you be able to? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1866 +/- ##
============================================
+ Coverage 56.12% 59.44% +3.31%
- Complexity 976 1151 +175
============================================
Files 119 130 +11
Lines 11743 12663 +920
Branches 2251 2374 +123
============================================
+ Hits 6591 7527 +936
+ Misses 4012 3924 -88
- Partials 1140 1212 +72 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
All the tests for first and last are disabled currently: |
Yes, we should figure out a test approach for these non-deterministic functions before we start making changes to the implementation, |
@andygrove perhaps we can merge this while we wait for the tests to be made more accurate? |
I will create a PR today to add correctness tests for first/last. We can then rebase this PR and make sure that there are no regressions. |
@rluvaton Here is a test that fails in main and passes with your changes in this PR. Could you add this to test("first/last with ignore null") {
val data = Range(0, 8192).flatMap(n => Seq((n, 1), (n, 2))).toDF("a", "b")
withTempDir { dir =>
val filename = s"${dir.getAbsolutePath}/first_last_ignore_null.parquet"
data.write.parquet(filename)
withSQLConf(CometConf.COMET_BATCH_SIZE.key -> "100") {
spark.read.parquet(filename).createOrReplaceTempView("t1")
for (expr <- Seq("first", "last")) {
// deterministic query that should return one non-null value per group
val df = spark.sql(s"SELECT a, $expr(IF(b==1,null,b)) IGNORE NULLS FROM t1 GROUP BY a ORDER BY a")
checkSparkAnswerAndOperator(df)
}
}
}
} |
Which issue does this PR close?
N/A
Rationale for this change
Actually use
ignore_nulls
that was used in:
ignoreNulls
flag infirst_value
andlast_value
#1626What changes are included in this PR?
forward ignore_nulls in first and last and no longer mark as unsupported
How are these changes tested?
this have the same problem as #1626, all the tests are disabled for first and last