HIVE-13748: TypeInfoParser cannot handle symbols in the field name of STRUCT #5767

okumin · 2025-04-12T10:25:31Z

What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/HIVE-13748

I assume the STRUCT type of Hive derives from the ROW type of ANSI SQL. Based on "4.10 Row types" of SQL:2023 part 2, it is a sequence of (, ), where "field name" is any identifier. It is consistent with our parser's definition. "6.2 " and "5.4 Names and identifiers" include the syntax rule, and I don't see any restrictions on the content.

The approach is still controversial. If we follow the ANSI standard, we should accept any identifier. My first draft is slightly more defensive, allowing characters not to be used by type definitions.

To be perfect, we have to reimplement the type parser and ensure all Hive codes correctly serialize and deserialize type definitions.

Why are the changes needed?

It's possible that Hive can't read Iceberg tables written by other engines.

Does this PR introduce any user-facing change?

Our STRUCT type will be more generic.

Is the change a dependency upgrade?

No.

How was this patch tested?

Added unit tests and integration tests.

… STRUCT

github-actions · 2025-04-12T10:28:25Z

@check-spelling-bot Report

🔴 Please review

See the files view or the action log for details.

Unrecognized words (2)

DFB
user'id

Previously acknowledged words that are now absent

aarry bytecode HIVEFETCHOUTPUTSERDE timestamplocal yyyy

To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the git@github.com:okumin/hive.git repository
on the HIVE-13748-struct-name branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
-H "Content-Type: application/json" \
"https://api.github.com/repos/apache/hive/issues/comments/2798776518" > "$comment_json"
comment_body=$(mktemp)
jq -r ".body // empty" "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")

patch_add=$(perl -e '$/=undef; $_=<>; if (m{Unrecognized words[^<]*</summary>\n*```\n*([^<]*)```\n*</details>$}m) { print "$1" } elsif (m{Unrecognized words[^<]*\n\n((?:\w.*\n)+)\n}m) { print "$1" };' < "$comment_body")

update_files
rm $comment_body
git add -u

If the flagged items do not appear to be text

If items relate to a ...

well-formed pattern.

If you can write a pattern that would match it,
try adding it to the patterns.txt file.

Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

Note that patterns can't match multiline strings.
binary file.

Please add a file path to the excludes.txt file matching the containing file.

File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

sonarqubecloud · 2025-04-12T16:08:37Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

HIVE-13748: TypeInfoParser cannot handle symbols in the field name of…

ff24dd8

… STRUCT

asf-ci-hive added the tests pending label Apr 12, 2025

Add some words to expect.txt

e549b26

asf-ci-hive added tests failed tests pending and removed tests pending tests failed labels Apr 12, 2025

asf-ci-hive added tests passed and removed tests pending labels Apr 12, 2025

okumin changed the title ~~[WIP] HIVE-13748: TypeInfoParser cannot handle symbols in the field name of STRUCT~~ HIVE-13748: TypeInfoParser cannot handle symbols in the field name of STRUCT Apr 14, 2025

okumin marked this pull request as ready for review April 14, 2025 03:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HIVE-13748: TypeInfoParser cannot handle symbols in the field name of STRUCT #5767

HIVE-13748: TypeInfoParser cannot handle symbols in the field name of STRUCT #5767

Uh oh!

okumin commented Apr 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 12, 2025 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Apr 12, 2025

Uh oh!

Uh oh!

HIVE-13748: TypeInfoParser cannot handle symbols in the field name of STRUCT #5767

Are you sure you want to change the base?

HIVE-13748: TypeInfoParser cannot handle symbols in the field name of STRUCT #5767

Uh oh!

Conversation

okumin commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Is the change a dependency upgrade?

How was this patch tested?

Uh oh!

github-actions bot commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

@check-spelling-bot Report

🔴 Please review

Unrecognized words (2)

Uh oh!

sonarqubecloud bot commented Apr 12, 2025

Quality Gate passed

Uh oh!

Uh oh!

okumin commented Apr 12, 2025 •

edited

Loading

github-actions bot commented Apr 12, 2025 •

edited

Loading