-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Characterize Hive ACID schema evolution #6316
Comments
The simplest case of schema evolution is column renaming. The last SELECT in the sequence below fails on Presto, because it gets a value of NULL for column3. Hive knows that column3 is the new name for column1, and gets the right value, 11:
This test fails on Presto for transactional tables, but passes for non-transactional tables. HIve doesn't re-write data files when schema evolution takes place, and in the example above, the data files will refer only to "column1". So the schema history needed to correctly interpret column3 must be in the metastore itself. Looking over metastore, I see that the Schema object has a properties map. I'm guessing that's where the history information is kept. |
What is the progress of this issue? @djsstarburst |
Hi, @sjx782392329, I've returned this week to looking at this issue. This test shows that Hive tracks column values across multiple renames, even though in this case Presto is doing the renames:
I believe I've found the Hive code that deals with column renames at There seem to be multiple places in the Hive code base where schema reconciliation could be happening when ORC ACID data files are read, but I'm unsure which places are actually used. I posted this to the ASF #hive Slack room:
|
In my view, your test cannot cover my test case. My problem is change the field's type of Hive. For example type of field from integer to byte, and then turn byte to integer. Presto v344 cannot read it rightly, but presto v0.214 can. @djsstarburst |
I captured traffic metastore traffic generated by a |
RE: #6070 This test program shows that widening an integral column type preserves the data in Hive:
However, in this test, the ALTER TABLE fails, because the new column type, TINYINT, is narrower than the original type, INT:
|
I test this case in my local machine. I can change field type from integer to tinyint by executing this |
When I run the test changing from column age from INT to TINYINT, I get this backtrace:
|
@djsstarburst I use hive table by executing command in the hive, it seems that you changed type of field by using presto. I think you should use Hive to modify fields instead of Presto. |
The failure shown by @djsstarburst comes from ALTER done via HiveServer2 ("in Hive"). @sjx782392329 what Hive version are you comparing with? |
hi, the version of hive is 1.2.1 @findepi @djsstarburst |
We're using hive 3.1.2-5, and as the link above shows, that version contains the error check. The error check is correct in general - - narrowing an integral type can cause data loss. |
@djsstarburst But, It's normal in v0.214 presto. I want to upgrade the version of presto. SQL previously executed by the user will fail. They will complain about the upgrade |
I am confused. To the best of my knowledge 0.214 does not support transactional tables, does it? |
I don't know why this case is related to transactional tables. Presto v0.214 can read data rightly though type of field had been changed from integer -> tinyint -> integer. |
Old version's behavior should not be treated as a "requirement" for the future, we wouldn't be able to fix bugs otherwise. The rule here is -- if Hive allows certain schema evolution, and can read from such a table, Trino should read as well.
i related to transactional tables as per this issue's title I'd suggest having a separate issue the problem you're experiencing. Since schema evolution needs to be taken care of by the file reader, please make sure to provide exact repro steps, including file format in use. |
I use Hive 3.x can query table rightly, but hive 1.2.1 can't read it correctly |
I just read through HiveAlterHandler, which implements all of Hive's variations of Trino must do the same thing - - ignore the column names from the data files. |
Here is the code in
Near the start of
It seems to me that the way to ensure data durability across column renames, compatible with Hive, is to post-process the |
I created PR #6479 to update Trino to use the latest column names from the metastore when accessing Hive ORC ACID tables. The cases Presto to match Hive behavior after column renaming(s). |
#6479 has been merged, so this issue can be closed. |
Executive Summary
#6070 and #6280 show that Trino queries of Hive ACID tables that have been schema evolved don't produce the same results as Hive itself does. PR #6479 changes Trino to by default match Hive's behavior on column rename/add/drop for formats ORC and Parquet.
PR #6479 does not address column type changes; that is left for a follow-on PR.
Kinds of Schema Evolution
There are multiple schema evolution cases to consider:
This ticket is opened to characterize Hive's behavior in response to schema evolution, and to recommend changes to Presto to match results from Hive.
Here is a summary of what I've learned running schema evolution tests against Hive 3.1.2-5 and Trino before the fixes in #6479.
Summary of Parquet Results Before #6479 Changes
Parquet Column Renaming
The results for non-partitioned Parquet tables are simple:
hive.parquet.use-column-names
is false, Trino sees the old values of column data in data files created before the column was renamed. This does not match Hive behavior.hive.parquet.use-column-names
is true, Trino reads nulls from as column data from data files created before the column was renamed. This matches Hive behavior.If parameter
hive.partition-use-column-values
is true, thenhive.parquet.use-column-names
must be true or an exception is thrown.These rules are succinctly encoded in this table:
Adding or Dropping Parquet Columns
If Trino drops the last non-partition column in a populated Parquet table, and then adds it back with the same name, when Hive queries the table, the old data for the last column will be returned. Trino exactly matches Hive's behavior whether the parameter
hive.parquet.use-column-names
is true or false.If Trino drops the last non-partition column in a populated Parquet table, and then adds it back with a different name, when Hive queries the table, the nulls will be returned for the value of the last column. Trino exactly matches Hive's behavior if the parameter
hive.parquet.use-column-names
is true. If the parameterhive.parquet.use-column-names
is false, Trino sees the old values for the last column, which does not match Hive's behavior.Summary of ORC Results Before #6479 Changes
These tests were run using file format ORC, though some used transactional tables and some used non-transactional tables.
The changes required to make Trino match the behavior we see in Hive have been made in PR #6479, which also contains all the tests that demonstrate the behavior of Hive and Trino.
ORC Column Renaming
By default, Trino ORC identifies columns exclusively by column order. If you insert rows, using either Trino or Hive, and then rename a column, using either Trino or Hive, a query of the old data using the new column name will always succeed in Hive. This is true for non-partition columns whether or not the table is transactional, and whether or not the table is partitioned. It's also true after a series of renames of the same non-partition column. Hive accomplishes this by matching the non-partition columns in files it reads based on column order and column type and not on the column name in the data file.
Like Parquet, ORC supports a pair of session config parameters that enable tracking of columns by name:
hive.partition-use-column-values
andhive.orc.use-column-names
. Both default to false. This table specifies the behavior when these parameters have non-default values:If parameter
hive.partition-use-column-values
is true, thenhive.orc.use-column-names
must be true or an exception is thrown. Ifhive.partition-use-column-values
is false andhive.orc.use-column-names
is true,Renaming partition columns is not allowed. An attempt to rename a partition column results in this error from the Hive metastore:
Renaming partition columns is not supported
.ORC Column Type Changes
Whether the table is transactional or non-transactional, Hive
ALTER TABLE CHANGE COLUMN old_name new_name new_type
will succeed only if thenew_type
is at least as wide as the old type. For example, a change from SHORT to INT is allowed; a change from INT to SHORT is not allowed, and theALTER TABLE
statement fails, as shown above.Trino does not provide a way to change the column type, but Hive does. As new tests in #6479 show, if a column type is widened using Hive's
ALTER TABLE REPLACE COLUMN old_name, new_name, new_type
, a subsequent Trino query will fail, saying that the types are incompatible even though the Hive query will succeed.Adding and Dropping ORC Columns
Whether the table is transactional or non-transactional, if Trino drops the last non-partition column in a populated table, and then adds it back, perhaps with a new name and a new, wider type, when Hive queries the table, the old data for the last column will be returned. With the #6479 changes, Trino exactly matches the Hive behavior.
Hive 3.1.2-5 does not support the syntax
ALTER TABLE ADD COLUMN ...
orALTER TABLE DROP COLUMN ...
. Instead it hasALTER TABLE REPLACE COLUMNS (col1 col1_type, ...)
. However, dropping any column using theALTER TABLE REPLACE COLUMNS...
syntax fails in Hive with this error message:Replacing columns cannot drop columns for table default.test_test_hive_renames_false_NONE_4739xndjz4di. SerDe may be incompatible.
. Looking at the Hive 3.1.2-5 codebase, this error will be raised if the table format is ORC and the number of columns in the replace list is less than the number of columns in the table.Summary of AVRO Results
AVRO does not support parameter
partition_use_column_names
, and has no analog to parameters likeorc_use_column_names
. Tests show that AVRO tracks columns exclusively by column name. This means that column values in a data file written before a column rename will all be null after the rename.The text was updated successfully, but these errors were encountered: