Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51594][SQL] Use empty schema when saving a view which is not Hive compatible #50367

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Mar 24, 2025

What changes were proposed in this pull request?

This is a long-standing issue. Spark always tries to save views in the Hive-compatible way, and only set the schema to empty if the save operation fails. However, for certain Hive compatibility issues, the save operation works but the following read operations will fail.

This PR fixes this issue by setting view schema to empty if it's not Hive compatible.

Why are the changes needed?

to not create malformed views that no one can read.

Does this PR introduce any user-facing change?

Yes, the malformed view will be saved in non hive compatible way so that Spark can read it.

How was this patch tested?

updated test case

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Mar 24, 2025
@cloud-fan
Copy link
Contributor Author

Note: this also fixes a regression caused by https://github.com/apache/spark/pull/49506/files#diff-45c9b065d76b237bcfecda83b8ee08c1ff6592d6f85acca09c0fa01472e056afL587

Before #49506 , the malformed view will be created with non hive compatible mode because the save operation failed.

cc @yaooqinn @dongjoon-hyun

// Hive compatible, we can set schema to empty so that Spark can still read this
// view as the schema is also encoded in the table properties.
case schema if schema.exists(f => !isHiveCompatibleDataType(f.dataType)) &&
tableDefinition.tableType == CatalogTableType.VIEW =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch the order of 2 guardians?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -271,7 +271,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
ignoreIfExists)
} else {
val tableWithDataSourceProps = tableDefinition.copy(
schema = hiveCompatibleSchema,
schema = hiveCompatibleSchema match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we rename the variable name here, it seems a bit weird that we are finding incompatible types from a compatible schema

@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master/4.0!

@cloud-fan cloud-fan closed this in 9b51820 Mar 25, 2025
cloud-fan added a commit that referenced this pull request Mar 25, 2025
…ive compatible

### What changes were proposed in this pull request?

This is a long-standing issue. Spark always tries to save views in the Hive-compatible way, and only set the schema to empty if the save operation fails. However, for certain Hive compatibility issues, the save operation works but the following read operations will fail.

This PR fixes this issue by setting view schema to empty if it's not Hive compatible.

### Why are the changes needed?

to not create malformed views that no one can read.

### Does this PR introduce _any_ user-facing change?

Yes, the malformed view will be saved in non hive compatible way so that Spark can read it.

### How was this patch tested?

updated test case

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #50367 from cloud-fan/view.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 9b51820)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
kazemaksOG pushed a commit to kazemaksOG/spark-custom-scheduler that referenced this pull request Mar 27, 2025
…ive compatible

### What changes were proposed in this pull request?

This is a long-standing issue. Spark always tries to save views in the Hive-compatible way, and only set the schema to empty if the save operation fails. However, for certain Hive compatibility issues, the save operation works but the following read operations will fail.

This PR fixes this issue by setting view schema to empty if it's not Hive compatible.

### Why are the changes needed?

to not create malformed views that no one can read.

### Does this PR introduce _any_ user-facing change?

Yes, the malformed view will be saved in non hive compatible way so that Spark can read it.

### How was this patch tested?

updated test case

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#50367 from cloud-fan/view.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants