Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in na.fail.default and #461 fix #479

Closed
farbodr opened this issue Aug 28, 2016 · 8 comments
Closed

Error in na.fail.default and #461 fix #479

farbodr opened this issue Aug 28, 2016 · 8 comments

Comments

@farbodr
Copy link

farbodr commented Aug 28, 2016

Not sure if this is a new issue or not. I am getting this error and don't quite understand the fix that was applied for #461 as this appears to be related to the fix.

Here is my call to train. The data set contains no NA:

fitWithXGBLinear <- train(target~.,
                                data=train,  
                                method = 'xgbLinear',
                                tunrLength=3,
                                trControl = ctrl ,
                                metric="logLoss"
)

This produces the error:

Error in na.fail.default(list(target = c(10L, 8L, 8L, 12L, 1L, 1L, 4L,  : 
  missing values in object

As shown below, my data set contains no NA. This data set is from a current kaggle competition:

> lapply(train, function(x) any(is.na(x)))
$a
[1] FALSE

$target
[1] FALSE

$b
[1] FALSE

$c
[1] FALSE

This data set is from a current kaggle competition:

'data.frame':   51336 obs. of  4 variables:
 $ a     :Class 'integer64'  num [1:51336] 1.06e+05 -9.81e+71 3.76e+234 -4.97e-249 -3.98e+136 ...
 $ target: Factor w/ 12 levels "F23.","F24.26",..: 10 8 8 12 1 1 4 4 10 6 ...
 $ b     : chr  "小米" "TCL" "TCL" "小米" ...
 $ c     : chr  "MI 3" "么么哒" "么么哒" "MI 4" ...

If I change the call to train to this, it seems to run. I've never had to do this before so not sure I understand the need.

fitWithXGBLinear <- train(target~.,
                                data=train,  
                                method = 'xgbLinear',
                                tunrLength=3,
                                trControl = ctrl ,
                                na.action = na.omit,
                                metric="logLoss"
)
@topepo
Copy link
Owner

topepo commented Aug 28, 2016

It is not a bug. Originally it was setup to stop execution when you have NA fails unless you use na.action = na.pass or other option. #461 fixed the bug (in R) that erroneously made the default na.pass.

The issue here is that people had missing data and did not realized that it was being omitted. Even when they chose an imputation method, the missing data were excluded before train was able to pre-process.

@topepo topepo closed this as completed Aug 28, 2016
@farbodr
Copy link
Author

farbodr commented Aug 28, 2016

Thanks for the explanation. I am still not clear why I got the missing value error when there are no NAs in the data.

@topepo
Copy link
Owner

topepo commented Aug 28, 2016

We would need the data to figure that out.

@NBRAYKO
Copy link

NBRAYKO commented Jan 2, 2017

@farbodr had same issue and it annoyed the hell out of me. Still not sure what exactly causes it, but found that I'd passed my object as a tibble, and it all works when I convert to data.frame. Weirdly, only got the error when I'd set up multiple cores with doMC.

@khari14
Copy link

khari14 commented Mar 8, 2018

titanicData$Pclass <- as.factor(titanicData$Pclass)
titanicData$Survived <- as.factor(titanicData$Survived)
titanicData$Title <- as.factor(titanicData$Title)
titanicData$Pclass <- as.factor(titanicData$FsizeD)
titanicData$Pclass <- as.factor(titanicData$Embarked)

index<- sample(1:nrow(titanicData),nrow(titanicData)*0.75, replace = F)
train<- titanicData[index,]
test<- titanicData[-index,]

library(randomForest)
rf_model <- randomForest(Survived ~ Pclass + Sex + Age + SibSp + Parch + Fare + Title + FsizeD, data = train, na.action = na.omit)

I am getting following error when I am fitting the rf:

Error in randomForest.default(m, y, ...) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In data.matrix(x) : NAs introduced by coercion

Can you please explain?

@topepo
Copy link
Owner

topepo commented Mar 8, 2018

Please open a new issue and follow the issue template.

@DhanaLakshmi15
Copy link

str(data)
data$Selector<-as.factor(data$Selector)
table(data$Selector)
set.seed(123)
ind<-sample(2, nrow(data), replace=TRUE, prob=c(0.7,0.3))
train<-data[ind==1,]
test<-data[ind==2,]
#226 library(randomForest)
set.seed(222)
rf<-randomForest(Selector~., data=train)

i am getting the error when i am running the statement of "rf"
Error in randomForest.default(m, y, ...) : cannot handle categorical predictors with more than 53 categories.
can you explain and fix the problem?

@topepo
Copy link
Owner

topepo commented Mar 19, 2018

@DhanaLakshmi15

This is unrelated to caret. Also, I don't think that there is a fix but you can try ranger instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants