-
Notifications
You must be signed in to change notification settings - Fork 849
Open
Description
SynapseML version
com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3
System information
- Language version (e.g. python 3.8, scala 2.12): python 3.9
- Spark Version (e.g. 3.2.3): 3.3.2
- Spark Platform (e.g. Synapse, Databricks): Databricks
Describe the problem
I have a for-loop lightgbm fit job for rolling back validation;
The job failed on multi-node cluster with log error Connection Refused
, and after checked the failed tasks, the executor failed with detail error message java.lang.ArrayIndexOutOfBoundsException
and caused the Connection Refused
error;
Meanwhile the job can run on single-node cluster without any issue.
The dataframe sent to model is around 48,000, with partition as below
Partition 0 has 19000 records
Partition 1 has 18000 records
Partition 2 has 7000 records
Partition 3 has 4000 records
And the issue cannot be fixed by df.repartition(5)
.

Code to reproduce issue
max_base_date = '2024-09-01'
tmp_train_df = train_merged_df.where(sf.col('base_date')<max_base_date).cache()
tmp_actual_df = actual_merged_df.where(sf.col('base_date')<max_base_date).cache()
model.fit(tmp_train_df, tmp_actual_df)
Other info / logs
No response
What component(s) does this bug affect?
area/cognitive
: Cognitive projectarea/core
: Core projectarea/deep-learning
: DeepLearning projectarea/lightgbm
: Lightgbm projectarea/opencv
: Opencv projectarea/vw
: VW projectarea/website
: Websitearea/build
: Project build systemarea/notebooks
: Samples under notebooks folderarea/docker
: Docker usagearea/models
: models related issueTo pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
What language(s) does this bug affect?
language/scala
: Scala source codelanguage/python
: Pyspark APIslanguage/r
: R APIslanguage/csharp
: .NET APIslanguage/new
: Proposals for new client languagesTo pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
What integration(s) does this bug affect?
integrations/synapse
: Azure Synapse integrationsintegrations/azureml
: Azure ML integrationsintegrations/databricks
: Databricks integrationsTo pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
Activity
Fix java.lang.ArrayIndexOutOfBoundsException in multi-node cluster run
bjm88620 commentedon Sep 11, 2024
Hi @dciborow , I can see the fix PR is created, would like to check whether it will be available for
com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3
? Thanks in advance.