[SPARK-51594][SQL] Use empty schema when saving a view which is not Hive compatible #50367

cloud-fan · 2025-03-24T14:15:47Z

What changes were proposed in this pull request?

This is a long-standing issue. Spark always tries to save views in the Hive-compatible way, and only set the schema to empty if the save operation fails. However, for certain Hive compatibility issues, the save operation works but the following read operations will fail.

This PR fixes this issue by setting view schema to empty if it's not Hive compatible.

Why are the changes needed?

to not create malformed views that no one can read.

Does this PR introduce any user-facing change?

Yes, the malformed view will be saved in non hive compatible way so that Spark can read it.

How was this patch tested?

updated test case

Was this patch authored or co-authored using generative AI tooling?

no

cloud-fan · 2025-03-24T14:18:25Z

Note: this also fixes a regression caused by https://github.com/apache/spark/pull/49506/files#diff-45c9b065d76b237bcfecda83b8ee08c1ff6592d6f85acca09c0fa01472e056afL587

Before #49506 , the malformed view will be created with non hive compatible mode because the save operation failed.

cc @yaooqinn @dongjoon-hyun

yaooqinn · 2025-03-25T09:04:22Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala

+          // Hive compatible, we can set schema to empty so that Spark can still read this
+          // view as the schema is also encoded in the table properties.
+          case schema if schema.exists(f => !isHiveCompatibleDataType(f.dataType)) &&
+              tableDefinition.tableType == CatalogTableType.VIEW =>


Switch the order of 2 guardians?

Does it look https://github.com/apache/spark/pull/50367/files#diff-81dcb1eea508dd5886eadedce48c89169b812224020a75d38e66aed0ee71a039R296-R302 redundant now?

yaooqinn · 2025-03-25T09:07:19Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala

@@ -271,7 +271,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
        ignoreIfExists)
    } else {
      val tableWithDataSourceProps = tableDefinition.copy(
-        schema = hiveCompatibleSchema,
+        schema = hiveCompatibleSchema match {


nit: can we rename the variable name here, it seems a bit weird that we are finding incompatible types from a compatible schema

cloud-fan · 2025-03-25T13:57:00Z

thanks for the review, merging to master/4.0!

…ive compatible ### What changes were proposed in this pull request? This is a long-standing issue. Spark always tries to save views in the Hive-compatible way, and only set the schema to empty if the save operation fails. However, for certain Hive compatibility issues, the save operation works but the following read operations will fail. This PR fixes this issue by setting view schema to empty if it's not Hive compatible. ### Why are the changes needed? to not create malformed views that no one can read. ### Does this PR introduce _any_ user-facing change? Yes, the malformed view will be saved in non hive compatible way so that Spark can read it. ### How was this patch tested? updated test case ### Was this patch authored or co-authored using generative AI tooling? no Closes #50367 from cloud-fan/view. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 9b51820) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ive compatible ### What changes were proposed in this pull request? This is a long-standing issue. Spark always tries to save views in the Hive-compatible way, and only set the schema to empty if the save operation fails. However, for certain Hive compatibility issues, the save operation works but the following read operations will fail. This PR fixes this issue by setting view schema to empty if it's not Hive compatible. ### Why are the changes needed? to not create malformed views that no one can read. ### Does this PR introduce _any_ user-facing change? Yes, the malformed view will be saved in non hive compatible way so that Spark can read it. ### How was this patch tested? updated test case ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#50367 from cloud-fan/view. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

Use empty schema when saving a view which is not Hive compatible

bf45aad

github-actions bot added the SQL label Mar 24, 2025

yaooqinn reviewed Mar 25, 2025

View reviewed changes

address comments

ccc9582

yaooqinn approved these changes Mar 25, 2025

View reviewed changes

cloud-fan closed this in 9b51820 Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51594][SQL] Use empty schema when saving a view which is not Hive compatible #50367

[SPARK-51594][SQL] Use empty schema when saving a view which is not Hive compatible #50367

cloud-fan commented Mar 24, 2025 •

edited

Loading

cloud-fan commented Mar 24, 2025

yaooqinn Mar 25, 2025

yaooqinn Mar 25, 2025

yaooqinn Mar 25, 2025

cloud-fan commented Mar 25, 2025

[SPARK-51594][SQL] Use empty schema when saving a view which is not Hive compatible #50367

[SPARK-51594][SQL] Use empty schema when saving a view which is not Hive compatible #50367

Conversation

cloud-fan commented Mar 24, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

cloud-fan commented Mar 24, 2025

yaooqinn Mar 25, 2025

Choose a reason for hiding this comment

yaooqinn Mar 25, 2025

Choose a reason for hiding this comment

yaooqinn Mar 25, 2025

Choose a reason for hiding this comment

cloud-fan commented Mar 25, 2025

cloud-fan commented Mar 24, 2025 •

edited

Loading