Description
I encountered the following problems when using xgboost4j-spark 0.72 , spark 2.4 and scala 2.12,
Can someone help me?
25/05/26 03:04:59 INFO java.RabitTracker$TrackerProcessLogger: 2025-05-26 11:04:59,111 INFO [8] train-auc:0.671623 test-auc:0.670459
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: 2025-05-26 11:05:07,494 INFO [9] train-auc:0.671457 test-auc:0.670350
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: 2025-05-26 11:05:07,509 DEBUG Recieve shutdown signal from 0
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: 2025-05-26 11:05:07,509 DEBUG Recieve shutdown signal from 65
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: Exception in thread Thread-1:
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: Traceback (most recent call last):
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: self.run()
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: File "/usr/lib64/python2.6/threading.py", line 484, in run
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: self.__target(*self.__args, **self.__kwargs)
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: File "/tmp/tracker7706002261293739680.py", line 325, in run
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: self.accept_slaves(nslave)
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: File "/tmp/tracker7706002261293739680.py", line 276, in accept_slaves
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: assert s.rank not in wait_conn
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: AssertionError
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger:
25/05/26 03:05:07 INFO storage.BlockManagerInfo: Added rdd_62_88 in memory on dx-hadoop47.dx:23123 (size: 344.0 B, free: 15.2 GB)
25/05/26 03:05:07 INFO storage.BlockManagerInfo: Added rdd_62_7 in memory on dx-hadoop68.dx:59869 (size: 344.0 B, free: 15.2 GB)
25/05/26 03:05:07 INFO java.RabitTracker: Tracker Process ends with exit code 0
25/05/26 03:05:07 INFO java.RabitTracker$TrackerProcessLogger: Tracker Process ends with exit code 0
25/05/26 03:05:07 INFO XGBoostSpark: Rabit returns with exit code 0
25/05/26 03:05:07 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 12.0 (TID 3704) in 232680 ms on dx-hadoop68.dx (executor 56) (2/100)
25/05/26 03:05:07 INFO spark.SparkContext: Starting job: first at XGBoost.scala:392
25/05/26 03:05:07 INFO scheduler.DAGScheduler: Got job 12 (first at XGBoost.scala:392) with 1 output partitions
25/05/26 03:05:07 INFO scheduler.DAGScheduler: Final stage: ResultStage 13 (first at XGBoost.scala:392)
25/05/26 03:05:07 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 11)
25/05/26 03:05:07 INFO scheduler.DAGScheduler: Missing parents: List()
25/05/26 03:05:07 INFO scheduler.DAGScheduler: Submitting ResultStage 13 (ZippedPartitionsRDD2[62] at zipPartitions at XGBoost.scala:117), which has no missing parents
25/05/26 03:05:07 INFO memory.MemoryStore: Block broadcast_26 stored as values in memory (estimated size 6.6 KB, free 14.0 GB)
25/05/26 03:05:07 INFO memory.MemoryStore: Block broadcast_26_piece0 stored as bytes in memory (estimated size 4.1 KB, free 14.0 GB)
25/05/26 03:05:07 INFO storage.BlockManagerInfo: Added broadcast_26_piece0 in memory on dx-daikuanmodel00.dx:22477 (size: 4.1 KB, free: 14.0 GB)
25/05/26 03:05:07 INFO spark.SparkContext: Created broadcast 26 from broadcast at DAGScheduler.scala:1163
25/05/26 03:05:07 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 13 (ZippedPartitionsRDD2[62] at zipPartitions at XGBoost.scala:117) (first 15 tasks are for partitions Vector(0))
25/05/26 03:05:07 INFO cluster.YarnScheduler: Adding task set 13.0 with 1 tasks
25/05/26 03:05:07 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 13.0 (TID 3905, dx-hadoop68.dx, executor 56, partition 0, PROCESS_LOCAL, 8243 bytes)
25/05/26 03:05:07 INFO storage.BlockManagerInfo: Added broadcast_26_piece0 in memory on dx-hadoop68.dx:59869 (size: 4.1 KB, free: 15.2 GB)
25/05/26 03:05:07 INFO storage.BlockManagerInfo: Added rdd_62_22 in memory on dx-hadoop53.dx:9114 (size: 344.0 B, free: 15.2 GB)
25/05/26 03:05:07 INFO scheduler.TaskSetManager: Finished task 88.0 in stage 12.0 (TID 3785) in 232705 ms on dx-hadoop47.dx (executor 69) (3/100)
25/05/26 03:05:07 INFO scheduler.TaskSetManager: Finished task 22.0 in stage 12.0 (TID 3719) in 233093 ms on dx-hadoop53.dx (executor 99) (4/100)
25/05/26 03:05:08 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 65.
25/05/26 03:05:08 INFO scheduler.DAGScheduler: Executor lost: 65 (epoch 4)
25/05/26 03:05:08 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 65 from BlockManagerMaster.
25/05/26 03:05:08 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(65, dx-hadoop106.dx, 20612, None)
25/05/26 03:05:08 INFO storage.BlockManagerMaster: Removed 65 successfully in removeExecutor
25/05/26 03:05:08 INFO scheduler.DAGScheduler: Shuffle files lost for executor: 65 (epoch 4)
25/05/26 03:05:08 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 43.
25/05/26 03:05:08 INFO scheduler.DAGScheduler: Executor lost: 43 (epoch 5)
25/05/26 03:05:08 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 43 from BlockManagerMaster.
25/05/26 03:05:08 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(43, dx-hadoop60.dx, 55740, None)