You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The OpenXJson Format will deserialize JSON arrays as Rows. But if the user stores the top-level JSON values as Arrays and tries to select only specific columns, the results will be incorrect.
This isn't impacting me, but I found it while implementing my own Hive Format and wanted to raise it, in case others care. I do not know if/how it worked or didn't with the Hive SerDes, but this seems like a correctness issue regardless.
Here is a unit test I put together that runs against 454 and shows the issue:
/* * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */packageio.trino.plugin.hive;
importio.trino.filesystem.Location;
importio.trino.filesystem.TrinoFileSystem;
importio.trino.testing.QueryRunner;
importorg.junit.jupiter.api.Test;
importjava.io.IOException;
importjava.io.OutputStream;
importjava.nio.charset.StandardCharsets;
importjava.util.UUID;
importstaticio.trino.testing.TestingNames.randomNameSuffix;
publicclassOpenXTopLevelArrayTestextendsBaseHiveConnectorTest
{
@OverrideprotectedQueryRunnercreateQueryRunner()
throwsException
{
returncreateHiveQueryRunner(HiveQueryRunner.builder());
}
@TestpublicvoidtestOpenXArrayAsTopLevelValueWithColumnReorder()
throwsIOException
{
// this needs to be made protected in BaseHiveConnectorTestTrinoFileSystemfileSystem = getTrinoFileSystem();
LocationtempDir = Location.of("local:///temp_" + UUID.randomUUID());
fileSystem.createDirectory(tempDir);
LocationdataFile = tempDir.appendPath("data.json");
// passes when json is object// String jsonText = "{ first_col: 31, skipped_col: \"second value\", third_col: \"third value\" }";// fails when json is corresponding arrayStringjsonText = "[31, \"second value\", \"third value\"]";
try (OutputStreamout = fileSystem.newOutputFile(dataFile).create()) {
out.write(jsonText.getBytes(StandardCharsets.UTF_8));
out.flush();
}
Stringtable = "test_openx_" + randomNameSuffix();
assertUpdate(""" create table %s ( first_col int, skipped_col varchar, third_col varchar) with (format = 'openx_json', external_location = '%s')"""
.formatted(table, tempDir));
assertQuery(""" select third_col, first_col from %s""".formatted(table),
"VALUES ('third value', 31)");
}
}
The text was updated successfully, but these errors were encountered:
The OpenXJson Format will deserialize JSON arrays as Rows. But if the user stores the top-level JSON values as Arrays and tries to select only specific columns, the results will be incorrect.
For example, a JSON value of:
In a Table defined as:
And a query of:
The result will be:
Instead of:
This isn't impacting me, but I found it while implementing my own Hive Format and wanted to raise it, in case others care. I do not know if/how it worked or didn't with the Hive SerDes, but this seems like a correctness issue regardless.
Here is a unit test I put together that runs against 454 and shows the issue:
The text was updated successfully, but these errors were encountered: