-
Notifications
You must be signed in to change notification settings - Fork 947
[KYUUBI #7106] Make response.results.columns optional #7107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes #7106 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #7107 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 697 697
Lines 43214 43214
Branches 5855 5855
======================================
Misses 43214 43214 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
A bugfix to handle Spark 3.5 behavior when response.results.columns is None by making columns optional in the fetch results operation.
- Introduces a new boolean flag (has_new_data) to check for actual returned data.
- Safely updates the request state to finished when no new data is available.
zip(response.results.columns, schema)] | ||
new_data = list(zip(*columns)) | ||
self._data += new_data | ||
has_new_data = (True if new_data else False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider simplifying the assignment by using 'bool(new_data)' instead of the ternary operator to improve readability and clarity.
has_new_data = (True if new_data else False) | |
has_new_data = bool(new_data) |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
@fbertsch thank you for fixing this issue. Do you happen to know which Spark PR causes this behavior change? |
I haven't been able to confirm, but I believe it's this change: https://issues.apache.org/jira/browse/SPARK-39041 That redid all the HiveThriftServer responses, and probably also changed the column responses. |
@turboFei are you able to release a new version of PyHive with this included? |
Thanks, merged to master.
This is definitely on our TODO list, hopefully we can achieve the first release in July. |
Why are the changes needed?
Bugfix. Spark 3.5 is returning
None
forresponse.results.columns
, while Spark 3.3 returned actual values.The response here: https://github.com/apache/kyuubi/blob/master/python/pyhive/hive.py#L507
For a query that does nothing (mine was an
add jar s3://a/b/c.jar
), here are the responses I received.Spark 3.3:
Spark 3.5:
How was this patch tested?
I tested by applying it locally and running my query against Spark 3.5. I was not able to get any unit tests running, sorry!
Was this patch authored or co-authored using generative AI tooling?
No.