Variability in results using sklearnex with ExtraTrees and RandomForest classifiers #1916

YoochanMyung · 2024-07-04T22:47:47Z

Describe the bug
Getting different results by turning on/off sklearnex with ExtraTrees and RandomForest algorithms.
This issue occurs starting with version 2024.1. I found it with my own dataset, and it's also reproducible with the breast_cancerdataset, but not with the iris dataset.

To Reproduce

Setup 'scikit-learn==1.5.1' (any version from 1.2.1)
Setup 'scikit-learn-intelex==2024.1' (any version from 2024.1)
Run the following test code:

import pandas as pd

from sklearnex import patch_sklearn
patch_sklearn()

from xgboost import XGBClassifier
from sklearn.ensemble import ExtraTreesClassifier, RandomForestClassifier
from sklearn.metrics import multilabel_confusion_matrix, confusion_matrix

from sklearn.model_selection import  cross_val_predict, train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler, label_binarize
from sklearn.metrics import matthews_corrcoef, confusion_matrix
N_CORES = 16

# Toy Data

from sklearn.datasets import load_iris,load_breast_cancer
data = load_breast_cancer()
X = data['data']
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=1)

# ExtraTrees
classifier_cv = ExtraTreesClassifier(n_estimators=300, random_state=1, n_jobs=N_CORES)
classifier_test = ExtraTreesClassifier(n_estimators=300, random_state=1, n_jobs=N_CORES)

cv_results = cross_val_predict(classifier_cv, X_train, y_train, cv=10)
classifier_test.fit(X_train, y_train)

test_results = classifier_test.predict(X_test)
print("###CV###")
print(matthews_corrcoef(y_train, cv_results))
print(confusion_matrix(y_train,cv_results).ravel())

print("###TEST###")
print(matthews_corrcoef(y_test, test_results))
print(confusion_matrix(y_test,test_results).ravel())

Expected behavior
Same results between using sklearnex and original sklearn.

Output/Screenshots

Before patching sklearnex with ExtraTrees

###CV###
0.935861738490973
[144   5   7 242]
###TEST###
0.9247930594534806
[ 58   5   1 107]

After patching sklearnex with ExtraTrees

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
###CV###
0.9409328452526324
[143   6   5 244]
###TEST###
0.8992907835033845
[ 57   6   2 106]

Environment:

OS: Ubuntu 22.04.04 LTS
Scikit-learn==1.5.1 but I tested on 1.2.1, 1.3.x, 1.4.x.. etc.

The text was updated successfully, but these errors were encountered:

YoochanMyung · 2024-07-04T23:46:33Z

Not sure whether it's related but if I use Intelex, I got a warning UserWarning: X does not have valid feature names, but ExtraTreesClassifier was fitted with feature names. Maybe there is a glitch in terms of handling the feature names or their orders by Intelex?

david-cortes-intel · 2025-01-02T08:52:57Z

@YoochanMyung About the results not matching exactly between scikit-learn and scikit-learn-intelex: this is expectable - these are randomized algorithms, and scikit-learn-intelex uses different random number generators than scikit-learn that are more performant.

Apart from the generated random numbers, the implementation here also differs in other ways such as usage of histogram-based methods instead of sorting-based methods, so they won't match exactly down to the last decimal in the numbers that they output, but as you found out the evaluation metrics on the two results are very similar.

About the second issue - I'm unable to reproduce the problematic behavior. You should see that warning if the estimator is fitted to data with column names (such as a pandas DataFrame) and then it is used to predict on data without names (such as a NumPy array).

Are you able to provide a reproducible example where the warning would be issued in a situation in which it shouldn't?

leorene97490 · 2025-05-06T20:02:06Z

Hi @YoochanMyung ,

I have the same issue than yours with my own data. The predictive power of sklearnex and scikit-learn is really different. I have also found that re run the exact algorithm twice , with the same random seed , produce different result with Random Forest Regressor.

Did you manage to solve your issue ?

david-cortes-intel · 2025-05-07T06:33:56Z

@leorene97490 Thanks for following up here. Would you be able to provide a reproducible example with random or public data in which different results are generated despite the seeds, or in which scikit-learn-intelex produces much worse predictions?

leorene97490 · 2025-05-07T07:35:55Z

Hi David,

Thanks for following up. I haven’t been able to reproduce the error using any other random or public dataset.

Scikit-learn-intelex doesn’t degrade prediction quality—in fact, it significantly improves it.

What really concerns me is this: in my notebook, running the pipeline cell yields the same “score 1” every time. But if I restart the kernel and run it again, I get a completely different “score 2” – and the difference isn’t just a few decimals; it’s substantial.

I’ve already fixed the random_state in the RandomForestRegressor, but the scores still vary drastically after a kernel restart.

YoochanMyung added the bug label Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Variability in results using sklearnex with ExtraTrees and RandomForest classifiers #1916

Variability in results using sklearnex with ExtraTrees and RandomForest classifiers #1916

YoochanMyung commented Jul 4, 2024 •

edited

Loading

YoochanMyung commented Jul 4, 2024

Uh oh!

david-cortes-intel commented Jan 2, 2025

Uh oh!

leorene97490 commented May 6, 2025 •

edited

Loading

Uh oh!

david-cortes-intel commented May 7, 2025

Uh oh!

leorene97490 commented May 7, 2025

Uh oh!

Variability in results using sklearnex with ExtraTrees and RandomForest classifiers #1916

Variability in results using sklearnex with ExtraTrees and RandomForest classifiers #1916

Comments

YoochanMyung commented Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

YoochanMyung commented Jul 4, 2024

Uh oh!

david-cortes-intel commented Jan 2, 2025

Uh oh!

leorene97490 commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-cortes-intel commented May 7, 2025

Uh oh!

leorene97490 commented May 7, 2025

Uh oh!

YoochanMyung commented Jul 4, 2024 •

edited

Loading

leorene97490 commented May 6, 2025 •

edited

Loading