Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql cannot run using Presto, but can run using hive, when the type of column in orc file doesn't match the type defined in hive table #3679

Open
Xuchaojing opened this issue May 9, 2020 · 10 comments

Comments

@Xuchaojing
Copy link

Query 20200509_032316_00682_9pmv8 failed: Error opening Hive split hdfs://xxxxxxx (offset=33554432, length=33554432): Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING

so is it any suggestion about this issue?

@ebyhr
Copy link
Member

ebyhr commented May 9, 2020

Could you try hive.orc.use-column-names=true?

Also, I would suggest you join the community Slack
https://prestosql.io/slack.html

@Xuchaojing
Copy link
Author

Xuchaojing commented May 9, 2020

hive.orc.use-column-names
it doesn't solve this issue
when i add this property to catalog/hive.properties, restart the presto cluster. then run the sql, the error still occurs

@Xuchaojing Xuchaojing reopened this May 9, 2020
@findepi
Copy link
Member

findepi commented May 9, 2020

(For hive.orc.use-column-names, please track #1558)

@Xuchaojing can you provide the full error stacktrace with and without hive.orc.use-column-names=true, as well as table schema (SHOW CREATE TABLE xxx )

@Xuchaojing
Copy link
Author

Xuchaojing commented May 9, 2020

in order to protect the security of the information, I changed the name of the table. change the real name to "abc".
the following info is table schema :

CREATE TABLE `ods.abc`(
  `id` bigint,
  `identifier` string,
  `identifier_md5` string,
  `channel` bigint,
  `os_version` bigint,
  `request_url` string)
COMMENT 'abc'
PARTITIONED BY (
  `pt` string COMMENT 'full_partition')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
  'field.delim'='\u0001',
  'line.delim'='\n',
  'serialization.format'='\u0001')
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://testHadoop/dataware/ods/abc'
TBLPROPERTIES (
  'transient_lastDdlTime'='1574322027')

the full error stacktrace when run sql using presto:
Query 20200509_161449_04577_9pmv8 failed: Error opening Hive split hdfs://testHadoop/dataware/ods/abc/pt=20200425/part-__0f22af46_648b_45dd_896b_6cd088339140 (offset=134217728, length=33554432): Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING [hdfs://testHadoop/dataware/ods/abc/pt=20200425/part-__0f22af46_648b_45dd_896b_6cd088339140]

@ebyhr
Copy link
Member

ebyhr commented May 10, 2020

@Xuchaojing You can get the full stactrace by --deubg CLI option or Web UI. (The pasted message isn't full stacktrace)

@Xuchaojing
Copy link
Author

Xuchaojing commented May 10, 2020

@Xuchaojing You can get the full stactrace by --deubg CLI option or Web UI. (The pasted message isn't full stacktrace)

this is the full stacktrace, and the version of the prestosql i used is 317.

Query 20200510_052523_00189_9pmv8 failed: Error opening Hive split hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89 (offset=134217728, length=33554432: Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING [hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89]
io.prestosql.spi.PrestoException: Error opening Hive split hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89 (offset=134217728, length=33554432): Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING [hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89]
	at io.prestosql.plugin.hive.orc.OrcPageSourceFactory.createOrcPageSource(OrcPageSourceFactory.java:235)
	at io.prestosql.plugin.hive.orc.OrcPageSourceFactory.createPageSource(OrcPageSourceFactory.java:121)
	at io.prestosql.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:164)
	at io.prestosql.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:98)
	at io.prestosql.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:45)
	at io.prestosql.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
	at io.prestosql.operator.TableScanOperator.getOutput(TableScanOperator.java:283)
	at io.prestosql.operator.Driver.processInternal(Driver.java:379)
	at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
	at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
	at io.prestosql.operator.Driver.processFor(Driver.java:276)
	at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
	at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
	at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
	at io.prestosql.$gen.Presto_317_dirty____20200428_034006_1.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: io.prestosql.orc.OrcCorruptionException: Malformed ORC file. Can not read SQL type bigint from ORC stream .id of type STRING [hdfs://testHadoop/dataware/ods/abc/pt=20200426/part-__170d9cc5_7b28_4828_bc34_ad141740ac89]
	at io.prestosql.orc.reader.ReaderUtils.verifyStreamType(ReaderUtils.java:40)
	at io.prestosql.orc.reader.SliceStreamReader.<init>(SliceStreamReader.java:63)
	at io.prestosql.orc.reader.StreamReaders.createStreamReader(StreamReaders.java:48)
	at io.prestosql.orc.OrcRecordReader.createStreamReaders(OrcRecordReader.java:559)
	at io.prestosql.orc.OrcRecordReader.<init>(OrcRecordReader.java:240)
	at io.prestosql.orc.OrcReader.createRecordReader(OrcReader.java:249)
	at io.prestosql.plugin.hive.orc.OrcPageSourceFactory.createOrcPageSource(OrcPageSourceFactory.java:205)

@ebyhr
Copy link
Member

ebyhr commented May 10, 2020

@Xuchaojing Could you share two kinds of full stacktrace with:

  1. hive.orc.use-column-names=true
  2. hive.orc.use-column-names=false

@Xuchaojing
Copy link
Author

  • hive.orc.use-column-names=true

the error stack is the same, whether true or false.

@ebyhr
Copy link
Member

ebyhr commented May 10, 2020

Let's continue in this Slack thread https://prestosql.slack.com/archives/CGB0QHWSW/p1589011535253100

@ArvinZheng
Copy link
Member

I think this is because the ORC reader that Hive uses converts the data type from File Type to Reader Type when possible, but Presto does not support the conversion currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants