Skip to content

Add Possibility to Load XGBoost Models as PySpark Models #11449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ayoub317
Copy link
Contributor

Closes #11400

The default values for device, use_gpu and tree_method were added due to this function call, and eventually this one which assumes they already exist in _paramMap or _defaultParamMap.

@ayoub317 ayoub317 force-pushed the load-model-as-spark-model branch 5 times, most recently from 39d1379 to a3acff2 Compare May 11, 2025 16:09
@ayoub317 ayoub317 force-pushed the load-model-as-spark-model branch from a3acff2 to 1688ddb Compare May 11, 2025 16:20
@trivialfis
Copy link
Member

cc @WeichenXu123 @wbo4958

@trivialfis trivialfis self-requested a review May 14, 2025 19:49
raise NotImplementedError()

@classmethod
def convert_sklearn_model_to_spark_xgb_model(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this function be exposed to users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good to expose it to users so that if they have already loaded or trained an sklearn model, and maybe want to make predictions on a large dataset they can do it without the need to save the model to disk and then load it back with the load_model method.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this. But what's the difference between the approach taken here and the following snippet?

from xgboost.spark import SparkXGBRegressorModel
from xgboost import XGBRegressor

reg = XGBRegressor()

SparkXGBRegressorModel(reg, None)

@ayoub317
Copy link
Contributor Author

Hello @trivialfis ,
The code snippet you provided doesn't seem to load an XGBoost model and make predictions. I expect the loading part to work, but an error to occur during the prediction without this commit. It also I think packages the loading as an sklearn model and the conversion to a Spark model in a nice way.

@wbo4958
Copy link
Contributor

wbo4958 commented May 16, 2025

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load XGBoost model made in Python to spark
3 participants