-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51594][SQL] Use empty schema when saving a view which is not Hive compatible #50367
Conversation
Note: this also fixes a regression caused by https://github.com/apache/spark/pull/49506/files#diff-45c9b065d76b237bcfecda83b8ee08c1ff6592d6f85acca09c0fa01472e056afL587 Before #49506 , the malformed view will be created with non hive compatible mode because the save operation failed. |
// Hive compatible, we can set schema to empty so that Spark can still read this | ||
// view as the schema is also encoded in the table properties. | ||
case schema if schema.exists(f => !isHiveCompatibleDataType(f.dataType)) && | ||
tableDefinition.tableType == CatalogTableType.VIEW => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch the order of 2 guardians?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -271,7 +271,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat | |||
ignoreIfExists) | |||
} else { | |||
val tableWithDataSourceProps = tableDefinition.copy( | |||
schema = hiveCompatibleSchema, | |||
schema = hiveCompatibleSchema match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we rename the variable name here, it seems a bit weird that we are finding incompatible types from a compatible schema
thanks for the review, merging to master/4.0! |
…ive compatible ### What changes were proposed in this pull request? This is a long-standing issue. Spark always tries to save views in the Hive-compatible way, and only set the schema to empty if the save operation fails. However, for certain Hive compatibility issues, the save operation works but the following read operations will fail. This PR fixes this issue by setting view schema to empty if it's not Hive compatible. ### Why are the changes needed? to not create malformed views that no one can read. ### Does this PR introduce _any_ user-facing change? Yes, the malformed view will be saved in non hive compatible way so that Spark can read it. ### How was this patch tested? updated test case ### Was this patch authored or co-authored using generative AI tooling? no Closes #50367 from cloud-fan/view. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 9b51820) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ive compatible ### What changes were proposed in this pull request? This is a long-standing issue. Spark always tries to save views in the Hive-compatible way, and only set the schema to empty if the save operation fails. However, for certain Hive compatibility issues, the save operation works but the following read operations will fail. This PR fixes this issue by setting view schema to empty if it's not Hive compatible. ### Why are the changes needed? to not create malformed views that no one can read. ### Does this PR introduce _any_ user-facing change? Yes, the malformed view will be saved in non hive compatible way so that Spark can read it. ### How was this patch tested? updated test case ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#50367 from cloud-fan/view. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This is a long-standing issue. Spark always tries to save views in the Hive-compatible way, and only set the schema to empty if the save operation fails. However, for certain Hive compatibility issues, the save operation works but the following read operations will fail.
This PR fixes this issue by setting view schema to empty if it's not Hive compatible.
Why are the changes needed?
to not create malformed views that no one can read.
Does this PR introduce any user-facing change?
Yes, the malformed view will be saved in non hive compatible way so that Spark can read it.
How was this patch tested?
updated test case
Was this patch authored or co-authored using generative AI tooling?
no