Skip to content

Using Python trained LightGBM model to predict on Spark DataFrame (in Databricks) #6938

Open
@jmpanfil

Description

@jmpanfil

I am curious if there is a straightforward way to use a regular python trained LightGBM model on a distributed Spark DataFrame. The model was trained using version 4.3.0 and the input data is a scipy.sparse csr_matrix. I have used XGBoost in a similar fashion, and the solution there was creating a Spark dataframe where each row had an ID column and a feature column that was a org.apache.spark.ml.linalg.Vectors.sparse vector. The regular non-distributed XGBoost model was loaded into their Spark version model and then calling model.xgbRegressionModel.transform(df_sparse). Is there a similar solution for LightGBM where I can load in a non-distributed trained model into Spark and predict over a distributed dataframe?

I am using Databricks with LightGBM version 4.3.0 (4.5.0 also available). Python or Scala solutions are both welcome. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions