Open
Description
Since spark 3.5, a new pyspark module is added: pyspark.ml.connect
, it supports a few ML algorithms that runs on spark connect mode. This is design doc:
https://www.google.com/url?q=https://docs.google.com/document/d/1LHzwCjm2SluHkta_08cM3jxFSgfF-niaCZbtIThG-H8/edit&sa=D&source=calendar&ust=1700005806011038&usg=AOvVaw2VEdVyMYg40yDLpElhcRAu
We should make estimators defined in xgboost.spark
to support spark connect mode, to achieve the goal, we need:
- Make these estimator class inherits
pyspark.ml.connect.Estimator
if it runs on spark connect mode - All implementation code should only calls spark connect API (i.e. spark Dataframe API).
Metadata
Metadata
Assignees
Labels
No labels