Training a multioutput regression model on GPU with XGBoost is significantly slower (~6x) compared to CatBoost.
Setup:
- 50k samples, 500 features, 15 targets
- Default GPU params
Observation:
XGBoost appears to train one model per target, while CatBoost handles multioutput natively, leading to major performance gains.
Can you add native multioutput support on GPU for better performance, similar to CatBoost.