You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trino 358. Query will execute between 0-2 times (usually once) successfully and then the cache no longer works. Restarting Trino solves the issue. The error doesn't seem to helpful though.
Error log (different query ID, but this is the same error we always get:
2021-10-25T16:03:42.927Z WARN 20211025_160037_84338_u4v8f.3.1-179-66 com.qubole.rubix.spi.fop.SocketObjectFactory Unable to open connection to host 192.168.9.55
org.apache.thrift.shaded.transport.TTransportException: java.net.SocketTimeoutException: connect timed out
at org.apache.thrift.shaded.transport.TSocket.open(TSocket.java:226)
at com.qubole.rubix.spi.fop.SocketObjectFactory.create(SocketObjectFactory.java:44)
at com.qubole.rubix.spi.fop.SocketObjectFactory.create(SocketObjectFactory.java:24)
at com.qubole.rubix.spi.fop.ObjectPoolPartition.increaseObjects(ObjectPoolPartition.java:128)
at com.qubole.rubix.spi.fop.ObjectPoolPartition.getObject(ObjectPoolPartition.java:85)
at com.qubole.rubix.spi.fop.ObjectPool.getObject(ObjectPool.java:95)
at com.qubole.rubix.spi.fop.ObjectPool.borrowObject(ObjectPool.java:81)
at com.qubole.rubix.spi.BookKeeperFactory.createBookKeeperClient(BookKeeperFactory.java:75)
at com.qubole.rubix.core.NonLocalRequestChain.<init>(NonLocalRequestChain.java:75)
at com.qubole.rubix.core.CachingInputStream.setupReadRequestChains(CachingInputStream.java:404)
at com.qubole.rubix.core.CachingInputStream.readInternal(CachingInputStream.java:254)
at com.qubole.rubix.core.CachingInputStream.read(CachingInputStream.java:223)
at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
at org.apache.hadoop.fs.BufferedFSInputStream.readFully(BufferedFSInputStream.java:113)
at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
at io.trino.plugin.hive.parquet.HdfsParquetDataSource.readFully(HdfsParquetDataSource.java:135)
at io.trino.plugin.hive.parquet.HdfsParquetDataSource$ReferenceCountedReader.read(HdfsParquetDataSource.java:301)
at io.trino.plugin.hive.parquet.HdfsParquetDataSource$1.read(HdfsParquetDataSource.java:208)
at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:276)
at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:337)
at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:320)
at io.trino.plugin.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:214)
at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:390)
at io.trino.spi.block.LazyBlock$LazyData.getTopLevelBlock(LazyBlock.java:359)
at io.trino.spi.block.LazyBlock.getBlock(LazyBlock.java:264)
at io.trino.spi.block.LazyBlock.isNull(LazyBlock.java:253)
at io.trino.$gen.PageFilter_20211025_160110_125.filter(Unknown Source)
at io.trino.$gen.PageFilter_20211025_160110_125.filter(Unknown Source)
at io.trino.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:121)
at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:280)
at io.trino.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:245)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:277)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.trino.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:149)
at io.trino.operator.Driver.processInternal(Driver.java:387)
at io.trino.operator.Driver.lambda$processFor$9(Driver.java:291)
at io.trino.operator.Driver.tryWithLock(Driver.java:683)
at io.trino.operator.Driver.processFor(Driver.java:284)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
at io.trino.$gen.Trino_358____20211021_011419_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.net.SocketTimeoutException: connect timed out
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.apache.thrift.shaded.transport.TSocket.open(TSocket.java:221)
... 63 more
2021-10-25T16:03:42.927Z WARN 20211025_160037_84338_u4v8f.3.1-179-66 com.qubole.rubix.spi.fop.ObjectPoolPartition Pool: bks-pool : Host: 192.168.9.55 : Could not increase pool size. Pool state: totalCount=0 queueSize=0 delta=1
2021-10-25T16:03:42.927Z ERROR 20211025_160037_84338_u4v8f.3.1-179-66 com.qubole.rubix.core.NonLocalRequestChain Could not get cache status from bookkeeper server at 192.168.9.55
java.lang.RuntimeException: Could not add connections to pool
at com.qubole.rubix.spi.fop.ObjectPoolPartition.getObject(ObjectPoolPartition.java:93)
at com.qubole.rubix.spi.fop.ObjectPool.getObject(ObjectPool.java:95)
at com.qubole.rubix.spi.fop.ObjectPool.borrowObject(ObjectPool.java:81)
at com.qubole.rubix.spi.BookKeeperFactory.createBookKeeperClient(BookKeeperFactory.java:75)
at com.qubole.rubix.core.NonLocalRequestChain.<init>(NonLocalRequestChain.java:75)
at com.qubole.rubix.core.CachingInputStream.setupReadRequestChains(CachingInputStream.java:404)
at com.qubole.rubix.core.CachingInputStream.readInternal(CachingInputStream.java:254)
at com.qubole.rubix.core.CachingInputStream.read(CachingInputStream.java:223)
at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
at org.apache.hadoop.fs.BufferedFSInputStream.readFully(BufferedFSInputStream.java:113)
at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
at io.trino.plugin.hive.parquet.HdfsParquetDataSource.readFully(HdfsParquetDataSource.java:135)
at io.trino.plugin.hive.parquet.HdfsParquetDataSource$ReferenceCountedReader.read(HdfsParquetDataSource.java:301)
at io.trino.plugin.hive.parquet.HdfsParquetDataSource$1.read(HdfsParquetDataSource.java:208)
at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:276)
at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:337)
at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:320)
at io.trino.plugin.hive.parquet.ParquetPageSource$ParquetBlockLoader.load(ParquetPageSource.java:214)
at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:390)
at io.trino.spi.block.LazyBlock$LazyData.getTopLevelBlock(LazyBlock.java:359)
at io.trino.spi.block.LazyBlock.getBlock(LazyBlock.java:264)
at io.trino.spi.block.LazyBlock.isNull(LazyBlock.java:253)
at io.trino.$gen.PageFilter_20211025_160110_125.filter(Unknown Source)
at io.trino.$gen.PageFilter_20211025_160110_125.filter(Unknown Source)
at io.trino.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:121)
at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:280)
at io.trino.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:245)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:277)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:319)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:306)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:200)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:221)
at io.trino.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:215)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:372)
at io.trino.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:149)
at io.trino.operator.Driver.processInternal(Driver.java:387)
at io.trino.operator.Driver.lambda$processFor$9(Driver.java:291)
at io.trino.operator.Driver.tryWithLock(Driver.java:683)
at io.trino.operator.Driver.processFor(Driver.java:284)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
at io.trino.$gen.Trino_358____20211021_011419_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
The text was updated successfully, but these errors were encountered:
nbayle
changed the title
Query causes hive connector to crash; requires restart of Trino
Query causes hive cache to crash; requires restart of Trino
Oct 26, 2021
We use Trino 380 and see the same error after enabled cache. We are running trino on k8s. We see the error on bunch of workers. But one of the worker particularly has more of these error and the rest of work has error Unable to open connection to host: <ip of the problematic worker>. The symptoms is the problematic has much higher CPU usage while the other worker has low CPU usage. Even though there are bunch of queries running on the cluster. Can someone look into this?
Trino 358. Query will execute between 0-2 times (usually once) successfully and then the cache no longer works. Restarting Trino solves the issue. The error doesn't seem to helpful though.
Properties:
Sample query:
First query success w/ results:
Subsequent query bookkeeper crash, runs for ages:
Error log (different query ID, but this is the same error we always get:
The text was updated successfully, but these errors were encountered: