Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark-connector2.0查询点偶尔出现ArrayIndexOutOfBoundsException #48

Closed
lanzhoujiaoya opened this issue May 14, 2022 · 2 comments
Closed
Labels
affects/none PR/issue: this bug affects none version. need info Solution: need more information process/done Process of bug severity/none Severity of bug type/bug Type: something is unexpected

Comments

@lanzhoujiaoya
Copy link

点总共24个属性,但是通过com.vesoft.nebula.client.storage.StorageClient
Schema schema = metaManager.getTag(spaceName, tagName).getSchema();查询出来只有15个属性,封装过程中导致数组下标越界
以下是报错日志
22/05/14 19:50:43 INFO connector.NebulaDataSource: create reader
22/05/14 19:50:43 INFO connector.NebulaDataSource: options {spacename=AssoNet, nocolumn=false, metaaddress=xxxxxxx label=cust, type=vertex, connectionretry=2, timeout=6000, executionretry=1, paths=[], limit=10, returncols=, partitionnumber=14}
root
|-- _vertexId: string (nullable = false)
|-- cust_no: string (nullable = true)
|-- cust_name: string (nullable = true)
xxxxxx总共24个属性

22/05/14 19:50:44 INFO spark.ContextCleaner: Cleaned accumulator 1
22/05/14 19:50:44 INFO codegen.CodeGenerator: Code generated in 158.8538 ms
22/05/14 19:50:44 INFO codegen.CodeGenerator: Code generated in 7.8005 ms
22/05/14 19:50:44 INFO codegen.CodeGenerator: Code generated in 22.0784 ms
22/05/14 19:50:44 INFO spark.SparkContext: Starting job: count at Nebula2HiveVertexTest.scala:54
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Registering RDD 7 (count at Nebula2HiveVertexTest.scala:54)
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Got job 0 (count at Nebula2HiveVertexTest.scala:54) with 1 output partitions
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (count at Nebula2HiveVertexTest.scala:54)
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[7] at count at Nebula2HiveVertexTest.scala:54), which has no missing parents
22/05/14 19:50:44 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 36.9 KB, free 1983.3 MB)
22/05/14 19:50:44 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 1983.3 MB)
22/05/14 19:50:44 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on SMY-U1-0040.smyoa.com:62328 (size: 13.9 KB, free: 1983.3 MB)
22/05/14 19:50:44 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
22/05/14 19:50:44 INFO scheduler.DAGScheduler: Submitting 14 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[7] at count at Nebula2HiveVertexTest.scala:54) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13))
22/05/14 19:50:44 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 14 tasks
22/05/14 19:50:44 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 10783 bytes)
22/05/14 19:50:44 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 10783 bytes)
22/05/14 19:50:44 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
22/05/14 19:50:44 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 1 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 3 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 5 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 7 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 9 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 11 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 13 has not allocation host.
22/05/14 19:50:45 ERROR meta.MetaManager: space basketballplayer part 15 has not allocation host.
22/05/14 19:50:45 INFO reader.NebulaVertexPartitionReader: partition index: 2, scanParts: List(2, 16, 30, 44, 58)
22/05/14 19:50:45 INFO reader.NebulaVertexPartitionReader: partition index: 1, scanParts: List(1, 15, 29, 43, 57)
22/05/14 19:50:45 WARN storage.BlockManager: Putting block rdd_2_1 failed due to exception java.lang.ArrayIndexOutOfBoundsException: 16.
22/05/14 19:50:45 WARN storage.BlockManager: Block rdd_2_1 could not be removed as it was not found on disk or in memory
22/05/14 19:50:45 ERROR executor.Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.ArrayIndexOutOfBoundsException: 16
at com.smy.connector.reader.NebulaPartitionReader$$anonfun$get$1.apply$mcVI$sp(NebulaPartitionReader.scala:75)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:74)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:29)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
22/05/14 19:50:45 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 10783 bytes)
22/05/14 19:50:45 INFO executor.Executor: Running task 2.0 in stage 0.0 (TID 2)
22/05/14 19:50:45 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: 16
at com.smy.connector.reader.NebulaPartitionReader$$anonfun$get$1.apply$mcVI$sp(NebulaPartitionReader.scala:75)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:74)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:29)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

22/05/14 19:50:45 ERROR scheduler.TaskSetManager: Task 1 in stage 0.0 failed 1 times; aborting job
22/05/14 19:50:45 INFO reader.NebulaVertexPartitionReader: partition index: 3, scanParts: List(3, 17, 31, 45, 59)
22/05/14 19:50:45 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
22/05/14 19:50:45 INFO scheduler.TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled
22/05/14 19:50:45 INFO executor.Executor: Executor is trying to kill task 0.0 in stage 0.0 (TID 0), reason: Stage cancelled
22/05/14 19:50:45 INFO executor.Executor: Executor is trying to kill task 2.0 in stage 0.0 (TID 2), reason: Stage cancelled
22/05/14 19:50:45 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
22/05/14 19:50:45 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (count at Nebula2HiveVertexTest.scala:54) failed in 0.499 s due to Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: 16
at com.smy.connector.reader.NebulaPartitionReader$$anonfun$get$1.apply$mcVI$sp(NebulaPartitionReader.scala:75)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:74)
at com.smy.connector.reader.NebulaPartitionReader.get(NebulaPartitionReader.scala:29)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:125)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
程序执行截图
(https://user-images.githubusercontent.com/31941244/168424814-2a51fa2f-b8ea-4e3e-8653-81f8009934df.png)
(https://user-images.githubusercontent.com/31941244/168424821-7d8db61f-a6ec-49c9-816f-e0c1a194cf55.png)
(https://user-images.githubusercontent.com/31941244/168424834-7d890aee-1c87-4a11-b001-3cf1a28f332b.png)

@Nicole00
Copy link
Contributor

Nicole00 commented Jul 9, 2022

Which version of connector do you use?
Whether the schema is updated? add more properties for tag?

@Sophie-Xie Sophie-Xie added need info Solution: need more information type/bug Type: something is unexpected labels Nov 29, 2022
@HarrisChu HarrisChu added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Dec 1, 2022
@Sophie-Xie
Copy link
Contributor

Closed it first, if you have some new info, you can open the issue again.

@github-actions github-actions bot added the process/fixed Process of bug label Dec 2, 2022
@HarrisChu HarrisChu added the process/done Process of bug label Jan 5, 2023
@github-actions github-actions bot removed the process/fixed Process of bug label Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/none PR/issue: this bug affects none version. need info Solution: need more information process/done Process of bug severity/none Severity of bug type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

4 participants