Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction on training data does not match with summary output #51

Open
zanocom opened this issue May 31, 2023 · 0 comments
Open

Prediction on training data does not match with summary output #51

zanocom opened this issue May 31, 2023 · 0 comments

Comments

@zanocom
Copy link

zanocom commented May 31, 2023

Hi all ,
I find a strange result when I try to compare the output of a model from summary()
with predict() on the same training set.

The same units are not classified in the same way, so I find two different confusion matrix.
The issue arise with trials >1 and at least 3 variables in training set.
I would expect the same results but maybe I misunderstood the inner workings of the algo.

I use
R 4.1.1
and package C50 0.1.8

This is a code that reproduce the issue from credit_data dataset:

##################################################################################

library(modeldata)
data(credit_data)

vars <- c("Home", "Seniority", 'Job')

# a simple split
set.seed(2411)
in_train <- sample(1:nrow(credit_data), size = 3000)
train_data_example <- credit_data[ in_train,]
test_data_example  <- credit_data[-in_train,]



library( C50)
library( yardstick )
tree_mod <- C5.0(x = train_data_example[, vars], 
                 y = train_data_example$Status 
                 , trials = 10 
                 , seed = 65 
                 )


summary(tree_mod)


  
prediction_df_train <- tibble(value = train_data_example$Status , 
                                predict =  predict(tree_mod, newdata = train_data_example[, vars])  )
  
 

conf_mat(prediction_df_train , truth = value, estimate = predict)

confusion matrix in summary( tree_mod )

id different than confusion matrix built from predict()

##################################################################################

Thank you,
Massimo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant