-
Notifications
You must be signed in to change notification settings - Fork 182
Variability in results using sklearnex with ExtraTrees and RandomForest classifiers #1916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not sure whether it's related but if I use Intelex, I got a warning |
@YoochanMyung About the results not matching exactly between Apart from the generated random numbers, the implementation here also differs in other ways such as usage of histogram-based methods instead of sorting-based methods, so they won't match exactly down to the last decimal in the numbers that they output, but as you found out the evaluation metrics on the two results are very similar. About the second issue - I'm unable to reproduce the problematic behavior. You should see that warning if the estimator is fitted to data with column names (such as a pandas DataFrame) and then it is used to predict on data without names (such as a NumPy array). Are you able to provide a reproducible example where the warning would be issued in a situation in which it shouldn't? |
Hi @YoochanMyung , I have the same issue than yours with my own data. The predictive power of sklearnex and scikit-learn is really different. I have also found that re run the exact algorithm twice , with the same random seed , produce different result with Random Forest Regressor. Did you manage to solve your issue ? |
@leorene97490 Thanks for following up here. Would you be able to provide a reproducible example with random or public data in which different results are generated despite the seeds, or in which scikit-learn-intelex produces much worse predictions? |
Hi David, Thanks for following up. I haven’t been able to reproduce the error using any other random or public dataset. Scikit-learn-intelex doesn’t degrade prediction quality—in fact, it significantly improves it. What really concerns me is this: in my notebook, running the pipeline cell yields the same “score 1” every time. But if I restart the kernel and run it again, I get a completely different “score 2” – and the difference isn’t just a few decimals; it’s substantial. I’ve already fixed the random_state in the RandomForestRegressor, but the scores still vary drastically after a kernel restart. |
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
Getting different results by turning on/off sklearnex with ExtraTrees and RandomForest algorithms.
This issue occurs starting with version 2024.1. I found it with my own dataset, and it's also reproducible with the
breast_cancer
dataset, but not with theiris
dataset.To Reproduce
Expected behavior
Same results between using sklearnex and original sklearn.
Output/Screenshots
Before patching sklearnex with ExtraTrees
After patching sklearnex with ExtraTrees
Environment:
The text was updated successfully, but these errors were encountered: