Description
I am curious if there is a straightforward way to use a regular python trained LightGBM model on a distributed Spark DataFrame. The model was trained using version 4.3.0 and the input data is a scipy.sparse csr_matrix. I have used XGBoost in a similar fashion, and the solution there was creating a Spark dataframe where each row had an ID column and a feature column that was a org.apache.spark.ml.linalg.Vectors.sparse
vector. The regular non-distributed XGBoost model was loaded into their Spark version model and then calling model.xgbRegressionModel.transform(df_sparse)
. Is there a similar solution for LightGBM where I can load in a non-distributed trained model into Spark and predict over a distributed dataframe?
I am using Databricks with LightGBM version 4.3.0 (4.5.0 also available). Python or Scala solutions are both welcome. Thank you!