Skip to content

Commit

Permalink
[SPARK-47309][SQL][XML] Fix schema inference issues in XML
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This PR fixes XML schema inference issues:

1. when there's an empty tag

2. when merging schema for NullType

### Why are the changes needed?

Fix a bug

### Does this PR introduce _any_ user-facing change?

Yes

### How was this patch tested?

Unit tests. There's a follow-up [PR](apache#45411) that introduces comprehensive tests for schema inference.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#45426 from shujingyang-db/fix-xml-schema-inference.

Authored-by: Shujing Yang <shujing.yang@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
  • Loading branch information
shujingyang-db authored and HyukjinKwon committed Mar 8, 2024
1 parent 7a5bb5d commit 9cac2bb
Showing 1 changed file with 4 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,9 @@ class XmlInferSchema(options: XmlOptions, caseSensitive: Boolean)

private def inferField(parser: XMLEventReader): DataType = {
parser.peek match {
case _: EndElement => NullType
case _: EndElement =>
parser.nextEvent()
NullType
case _: StartElement => inferObject(parser)
case _: Characters =>
val structType = inferObject(parser).asInstanceOf[StructType]
Expand Down Expand Up @@ -450,7 +452,7 @@ class XmlInferSchema(options: XmlOptions, caseSensitive: Boolean)
oldTypeOpt match {
// If the field name already exists,
// merge the type and infer the combined field as an array type if necessary
case Some(oldType) if !oldType.isInstanceOf[ArrayType] && !newType.isInstanceOf[NullType] =>
case Some(oldType) if !oldType.isInstanceOf[ArrayType] =>
ArrayType(compatibleType(caseSensitive, options.valueTag)(oldType, newType))
case Some(oldType) =>
compatibleType(caseSensitive, options.valueTag)(oldType, newType)
Expand Down

0 comments on commit 9cac2bb

Please sign in to comment.