Blocks read inconsistent #101

shenghui361 · 2022-04-05T08:05:28Z

spark version: 3.2.1
rss version: master
sql: tpc-ds[10T] query17

spark parameters:
conf spark.shuffle.manager=org.apache.spark.shuffle.RssShuffleManager
spark.rss.storage.type=HDFS
spark.rss.base.path=hdfs://ns/tmp/rss/hdfs_base_path
park.rss.data.replica=2
spark.dynamicAllocation.enabled=false
spark.shuffle.service.enabled=false
spark.rss.coordinator.quorum=coordinator1:19999,coordinator2:19999

`
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2455)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2404)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2403)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2403)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1160)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1160)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1160)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2643)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2585)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2574)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Caused by: com.tencent.rss.common.exception.RssException: Blocks read inconsistent: expected 310 blocks, actual 0 blocks
at com.tencent.rss.client.impl.ShuffleReadClientImpl.checkProcessedBlockIds(ShuffleReadClientImpl.java:222)
at org.apache.spark.shuffle.reader.RssShuffleDataIterator.hasNext(RssShuffleDataIterator.java:126)
at org.apache.spark.shuffle.reader.RssShuffleReader$MultiPartitionIterator.hasNext(RssShuffleReader.java:213)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage10.sort_addToSorter_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage10.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.smj_findNextJoinRows_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage11.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$2.hasNext(WholeStageCodegenExec.scala:778)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.shuffle.writer.RssShuffleWriter.write(RssShuffleWriter.java:138)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

`

colinmjj · 2022-04-06T02:16:35Z

"Blocks read inconsistent" means can't read expected blocks which are sent to shuffle server. It may be caused by write problem. For storage type, please use MEMORY_HDFS instead of HDFS.

shenghui361 · 2022-04-06T07:03:47Z

MEMORY_LOCALFILE has the same problem.

latincross · 2022-04-06T12:20:31Z

用local_hdfs模式，你会发现，本地存的和hdfs上存的数据不一样，然后报找不到文件的错误。

jerqi · 2022-04-07T15:55:57Z

MEMORY_LOCALFILE has the same problem.

What's the configuration of shuffle server?

shenghui361 · 2022-04-08T02:18:21Z

MEMORY_LOCALFILE has the same problem.

What's the configuration of shuffle server?

rss.rpc.server.port 19999
rss.jetty.http.port 19998
rss.storage.basePath /HDATA/1/rssdata,/HDATA/2/rssdata,/HDATA/3/rssdata,/HDATA/4/rssdata,/HDATA/5/rssdata,/HDATA/6/rssdata
#rss.storage.type MEMORY_LOCALFILE
rss.storage.type MEMORY_LOCALFILE_HDFS
rss.coordinator.quorum coordinator1:19999, coordinator2:19999
rss.server.buffer.capacity 40gb
rss.server.buffer.spill.threshold 22gb
rss.server.partition.buffer.size 150mb
rss.server.read.buffer.capacity 20gb
rss.server.flush.thread.alive 50
rss.server.flush.threadPool.size 100

rss.server.hdfs.base.path hdfs://ns/tmp/rss/hdfs_base_path

# multistorage config
rss.server.multistorage.enable true
#rss.server.uploader.enable false
#rss.server.uploader.base.path hdfs://ns/tmp/rss/uploader_base_path
#rss.server.uploader.thread.number 32
# rss.server.disk.capacity 1011550697553

jerqi · 2022-04-11T02:36:10Z

Client should keep the storage type consistent with server.

jerqi closed this as completed Apr 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blocks read inconsistent #101

Blocks read inconsistent #101

shenghui361 commented Apr 5, 2022

colinmjj commented Apr 6, 2022

shenghui361 commented Apr 6, 2022

latincross commented Apr 6, 2022

jerqi commented Apr 7, 2022

shenghui361 commented Apr 8, 2022 •

edited

jerqi commented Apr 11, 2022

Blocks read inconsistent #101

Blocks read inconsistent #101

Comments

shenghui361 commented Apr 5, 2022

colinmjj commented Apr 6, 2022

shenghui361 commented Apr 6, 2022

latincross commented Apr 6, 2022

jerqi commented Apr 7, 2022

shenghui361 commented Apr 8, 2022 • edited

jerqi commented Apr 11, 2022

shenghui361 commented Apr 8, 2022 •

edited