Skip to content

[BUG]java.lang.ArrayIndexOutOfBoundsException on multi-node cluster run #2278

@bjm88620

Description

@bjm88620

SynapseML version

com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3

System information

  • Language version (e.g. python 3.8, scala 2.12): python 3.9
  • Spark Version (e.g. 3.2.3): 3.3.2
  • Spark Platform (e.g. Synapse, Databricks): Databricks

Describe the problem

I have a for-loop lightgbm fit job for rolling back validation;
The job failed on multi-node cluster with log error Connection Refused, and after checked the failed tasks, the executor failed with detail error message java.lang.ArrayIndexOutOfBoundsException and caused the Connection Refused error;

Meanwhile the job can run on single-node cluster without any issue.

The dataframe sent to model is around 48,000, with partition as below

Partition 0 has 19000 records
Partition 1 has 18000 records
Partition 2 has 7000 records
Partition 3 has 4000 records

And the issue cannot be fixed by df.repartition(5).

Screenshot 2024-09-04 at 21 16 29

Code to reproduce issue

max_base_date = '2024-09-01'
tmp_train_df = train_merged_df.where(sf.col('base_date')<max_base_date).cache()
tmp_actual_df = actual_merged_df.where(sf.col('base_date')<max_base_date).cache()
model.fit(tmp_train_df, tmp_actual_df)

Other info / logs

No response

What component(s) does this bug affect?

  • area/cognitive: Cognitive project
    area/core: Core project
    area/deep-learning: DeepLearning project
    area/lightgbm: Lightgbm project
    area/opencv: Opencv project
    area/vw: VW project
    area/website: Website
    area/build: Project build system
    area/notebooks: Samples under notebooks folder
    area/docker: Docker usage
    area/models: models related issue

What language(s) does this bug affect?

  • language/scala: Scala source code
    language/python: Pyspark APIs
    language/r: R APIs
    language/csharp: .NET APIs
    language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/synapse: Azure Synapse integrations
    integrations/azureml: Azure ML integrations
    integrations/databricks: Databricks integrations

Activity

added a commit that references this issue on Sep 7, 2024
11bfba1
bjm88620

bjm88620 commented on Sep 11, 2024

@bjm88620
Author

Hi @dciborow , I can see the fix PR is created, would like to check whether it will be available for com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3 ? Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @bjm88620

      Issue actions

        [BUG]java.lang.ArrayIndexOutOfBoundsException on multi-node cluster run · Issue #2278 · microsoft/SynapseML