HIVE-13748: TypeInfoParser cannot handle symbols in the field name of STRUCT #5767
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/HIVE-13748
I assume the STRUCT type of Hive derives from the ROW type of ANSI SQL. Based on "4.10 Row types" of SQL:2023 part 2, it is a sequence of (, ), where "field name" is any identifier. It is consistent with our parser's definition. "6.2 " and "5.4 Names and identifiers" include the syntax rule, and I don't see any restrictions on the content.
The approach is still controversial. If we follow the ANSI standard, we should accept any identifier. My first draft is slightly more defensive, allowing characters not to be used by type definitions.
To be perfect, we have to reimplement the type parser and ensure all Hive codes correctly serialize and deserialize type definitions.
Why are the changes needed?
It's possible that Hive can't read Iceberg tables written by other engines.
Does this PR introduce any user-facing change?
Our STRUCT type will be more generic.
Is the change a dependency upgrade?
No.
How was this patch tested?
Added unit tests and integration tests.