sql cannot run using Presto, but can run using hive, when the type of column in orc file doesn't match the type defined in hive table #3679

Xuchaojing · 2020-05-09T03:30:20Z

Query 20200509_032316_00682_9pmv8 failed: Error opening Hive split hdfs://xxxxxxx (offset=33554432, length=33554432): Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING

so is it any suggestion about this issue?

ebyhr · 2020-05-09T03:41:28Z

Could you try hive.orc.use-column-names=true?

Also, I would suggest you join the community Slack
https://prestosql.io/slack.html

Xuchaojing · 2020-05-09T05:24:05Z

hive.orc.use-column-names
it doesn't solve this issue
when i add this property to catalog/hive.properties, restart the presto cluster. then run the sql, the error still occurs

findepi · 2020-05-09T12:25:17Z

(For hive.orc.use-column-names, please track #1558)

@Xuchaojing can you provide the full error stacktrace with and without hive.orc.use-column-names=true, as well as table schema (SHOW CREATE TABLE xxx )

Xuchaojing · 2020-05-09T16:27:42Z

in order to protect the security of the information, I changed the name of the table. change the real name to "abc".
the following info is table schema :

CREATE TABLE `ods.abc`(
  `id` bigint,
  `identifier` string,
  `identifier_md5` string,
  `channel` bigint,
  `os_version` bigint,
  `request_url` string)
COMMENT 'abc'
PARTITIONED BY (
  `pt` string COMMENT 'full_partition')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
  'field.delim'='\u0001',
  'line.delim'='\n',
  'serialization.format'='\u0001')
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://testHadoop/dataware/ods/abc'
TBLPROPERTIES (
  'transient_lastDdlTime'='1574322027')

the full error stacktrace when run sql using presto:
Query 20200509_161449_04577_9pmv8 failed: Error opening Hive split hdfs://testHadoop/dataware/ods/abc/pt=20200425/part-__0f22af46_648b_45dd_896b_6cd088339140 (offset=134217728, length=33554432): Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING [hdfs://testHadoop/dataware/ods/abc/pt=20200425/part-__0f22af46_648b_45dd_896b_6cd088339140]

ebyhr · 2020-05-10T01:41:11Z

@Xuchaojing You can get the full stactrace by --deubg CLI option or Web UI. (The pasted message isn't full stacktrace)

Xuchaojing · 2020-05-10T05:32:10Z

@Xuchaojing You can get the full stactrace by --deubg CLI option or Web UI. (The pasted message isn't full stacktrace)

this is the full stacktrace, and the version of the prestosql i used is 317.

Query 20200510_052523_00189_9pmv8 failed: Error opening Hive split hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89 (offset=134217728, length=33554432: Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING [hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89]
io.prestosql.spi.PrestoException: Error opening Hive split hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89 (offset=134217728, length=33554432): Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING [hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89]
	at io.prestosql.plugin.hive.orc.OrcPageSourceFactory.createOrcPageSource(OrcPageSourceFactory.java:235)
	at io.prestosql.plugin.hive.orc.OrcPageSourceFactory.createPageSource(OrcPageSourceFactory.java:121)
	at io.prestosql.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:164)
	at io.prestosql.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:98)
	at io.prestosql.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:45)
	at io.prestosql.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
	at io.prestosql.operator.TableScanOperator.getOutput(TableScanOperator.java:283)
	at io.prestosql.operator.Driver.processInternal(Driver.java:379)
	at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
	at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
	at io.prestosql.operator.Driver.processFor(Driver.java:276)
	at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
	at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
	at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
	at io.prestosql.$gen.Presto_317_dirty____20200428_034006_1.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: io.prestosql.orc.OrcCorruptionException: Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING [hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89]
	at io.prestosql.orc.reader.ReaderUtils.verifyStreamType(ReaderUtils.java:40)
	at io.prestosql.orc.reader.SliceStreamReader.<init>(SliceStreamReader.java:63)
	at io.prestosql.orc.reader.StreamReaders.createStreamReader(StreamReaders.java:48)
	at io.prestosql.orc.OrcRecordReader.createStreamReaders(OrcRecordReader.java:559)
	at io.prestosql.orc.OrcRecordReader.<init>(OrcRecordReader.java:240)
	at io.prestosql.orc.OrcReader.createRecordReader(OrcReader.java:249)
	at io.prestosql.plugin.hive.orc.OrcPageSourceFactory.createOrcPageSource(OrcPageSourceFactory.java:205)

ebyhr · 2020-05-10T05:40:43Z

@Xuchaojing Could you share two kinds of full stacktrace with:

hive.orc.use-column-names=true
hive.orc.use-column-names=false

Xuchaojing · 2020-05-10T06:28:05Z

hive.orc.use-column-names=true

the error stack is the same, whether true or false.

ebyhr · 2020-05-10T06:35:20Z

Let's continue in this Slack thread https://prestosql.slack.com/archives/CGB0QHWSW/p1589011535253100

ArvinZheng · 2020-07-01T20:52:07Z

I think this is because the ORC reader that Hive uses converts the data type from File Type to Reader Type when possible, but Presto does not support the conversion currently.

Xuchaojing closed this as completed May 9, 2020

Xuchaojing reopened this May 9, 2020

qfrtrt mentioned this issue Nov 24, 2020

Presto read ORC error : Malformed ORC file. #6070

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql cannot run using Presto, but can run using hive, when the type of column in orc file doesn't match the type defined in hive table #3679

sql cannot run using Presto, but can run using hive, when the type of column in orc file doesn't match the type defined in hive table #3679

Xuchaojing commented May 9, 2020

ebyhr commented May 9, 2020

Xuchaojing commented May 9, 2020 •

edited

findepi commented May 9, 2020

Xuchaojing commented May 9, 2020 •

edited by ebyhr

ebyhr commented May 10, 2020

Xuchaojing commented May 10, 2020 •

edited by ebyhr

ebyhr commented May 10, 2020

Xuchaojing commented May 10, 2020

ebyhr commented May 10, 2020

ArvinZheng commented Jul 1, 2020

sql cannot run using Presto, but can run using hive, when the type of column in orc file doesn't match the type defined in hive table #3679

sql cannot run using Presto, but can run using hive, when the type of column in orc file doesn't match the type defined in hive table #3679

Comments

Xuchaojing commented May 9, 2020

ebyhr commented May 9, 2020

Xuchaojing commented May 9, 2020 • edited

findepi commented May 9, 2020

Xuchaojing commented May 9, 2020 • edited by ebyhr

ebyhr commented May 10, 2020

Xuchaojing commented May 10, 2020 • edited by ebyhr

ebyhr commented May 10, 2020

Xuchaojing commented May 10, 2020

ebyhr commented May 10, 2020

ArvinZheng commented Jul 1, 2020

Xuchaojing commented May 9, 2020 •

edited

Xuchaojing commented May 9, 2020 •

edited by ebyhr

Xuchaojing commented May 10, 2020 •

edited by ebyhr