Using Python trained LightGBM model to predict on Spark DataFrame (in Databricks)

I am curious if there is a straightforward way to use a regular python trained LightGBM model on a distributed Spark DataFrame. The model was trained using version 4.3.0 and the input data is a scipy.sparse csr_matrix. I have used XGBoost in a similar fashion, and the solution there was creating a Spark dataframe where each row had an ID column and a feature column that was a `org.apache.spark.ml.linalg.Vectors.sparse` vector. The regular non-distributed XGBoost model was loaded into their Spark version model and then calling `model.xgbRegressionModel.transform(df_sparse)`. Is there a similar solution for LightGBM where I can load in a non-distributed trained model into Spark and predict over a distributed dataframe?

I am using Databricks with LightGBM version 4.3.0 (4.5.0 also available). Python or Scala solutions are both welcome. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using Python trained LightGBM model to predict on Spark DataFrame (in Databricks) #6938

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using Python trained LightGBM model to predict on Spark DataFrame (in Databricks) #6938

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions