Skip to content

HIVE-13748: TypeInfoParser cannot handle symbols in the field name of STRUCT #5767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

okumin
Copy link
Contributor

@okumin okumin commented Apr 12, 2025

What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/HIVE-13748

I assume the STRUCT type of Hive derives from the ROW type of ANSI SQL. Based on "4.10 Row types" of SQL:2023 part 2, it is a sequence of (, ), where "field name" is any identifier. It is consistent with our parser's definition. "6.2 " and "5.4 Names and identifiers" include the syntax rule, and I don't see any restrictions on the content.

The approach is still controversial. If we follow the ANSI standard, we should accept any identifier. My first draft is slightly more defensive, allowing characters not to be used by type definitions.

To be perfect, we have to reimplement the type parser and ensure all Hive codes correctly serialize and deserialize type definitions.

Why are the changes needed?

It's possible that Hive can't read Iceberg tables written by other engines.

Does this PR introduce any user-facing change?

Our STRUCT type will be more generic.

Is the change a dependency upgrade?

No.

How was this patch tested?

Added unit tests and integration tests.

Copy link

github-actions bot commented Apr 12, 2025

@check-spelling-bot Report

🔴 Please review

See the files view or the action log for details.

Unrecognized words (2)

DFB
user'id

Previously acknowledged words that are now absent aarry bytecode HIVEFETCHOUTPUTSERDE timestamplocal yyyy
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the git@github.com:okumin/hive.git repository
on the HIVE-13748-struct-name branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
-H "Content-Type: application/json" \
"https://api.github.com/repos/apache/hive/issues/comments/2798776518" > "$comment_json"
comment_body=$(mktemp)
jq -r ".body // empty" "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")

patch_add=$(perl -e '$/=undef; $_=<>; if (m{Unrecognized words[^<]*</summary>\n*```\n*([^<]*)```\n*</details>$}m) { print "$1" } elsif (m{Unrecognized words[^<]*\n\n((?:\w.*\n)+)\n}m) { print "$1" };' < "$comment_body")

update_files
rm $comment_body
git add -u
If the flagged items do not appear to be text

If items relate to a ...

  • well-formed pattern.

    If you can write a pattern that would match it,
    try adding it to the patterns.txt file.

    Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

    Note that patterns can't match multiline strings.

  • binary file.

    Please add a file path to the excludes.txt file matching the containing file.

    File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

    ^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

Copy link

@okumin okumin changed the title [WIP] HIVE-13748: TypeInfoParser cannot handle symbols in the field name of STRUCT HIVE-13748: TypeInfoParser cannot handle symbols in the field name of STRUCT Apr 14, 2025
@okumin okumin marked this pull request as ready for review April 14, 2025 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants