Error in na.fail.default and #461 fix #479

farbodr · 2016-08-28T13:01:58Z

Not sure if this is a new issue or not. I am getting this error and don't quite understand the fix that was applied for #461 as this appears to be related to the fix.

Here is my call to train. The data set contains no NA:

fitWithXGBLinear <- train(target~.,
                                data=train,  
                                method = 'xgbLinear',
                                tunrLength=3,
                                trControl = ctrl ,
                                metric="logLoss"
)

This produces the error:

Error in na.fail.default(list(target = c(10L, 8L, 8L, 12L, 1L, 1L, 4L,  : 
  missing values in object

As shown below, my data set contains no NA. This data set is from a current kaggle competition:

> lapply(train, function(x) any(is.na(x)))
$a
[1] FALSE

$target
[1] FALSE

$b
[1] FALSE

$c
[1] FALSE

This data set is from a current kaggle competition:

'data.frame':   51336 obs. of  4 variables:
 $ a     :Class 'integer64'  num [1:51336] 1.06e+05 -9.81e+71 3.76e+234 -4.97e-249 -3.98e+136 ...
 $ target: Factor w/ 12 levels "F23.","F24.26",..: 10 8 8 12 1 1 4 4 10 6 ...
 $ b     : chr  "小米" "TCL" "TCL" "小米" ...
 $ c     : chr  "MI 3" "么么哒" "么么哒" "MI 4" ...

If I change the call to train to this, it seems to run. I've never had to do this before so not sure I understand the need.

fitWithXGBLinear <- train(target~.,
                                data=train,  
                                method = 'xgbLinear',
                                tunrLength=3,
                                trControl = ctrl ,
                                na.action = na.omit,
                                metric="logLoss"
)

The text was updated successfully, but these errors were encountered:

topepo · 2016-08-28T19:35:36Z

It is not a bug. Originally it was setup to stop execution when you have NA fails unless you use na.action = na.pass or other option. #461 fixed the bug (in R) that erroneously made the default na.pass.

The issue here is that people had missing data and did not realized that it was being omitted. Even when they chose an imputation method, the missing data were excluded before train was able to pre-process.

farbodr · 2016-08-28T20:24:21Z

Thanks for the explanation. I am still not clear why I got the missing value error when there are no NAs in the data.

topepo · 2016-08-28T20:58:15Z

We would need the data to figure that out.

NBRAYKO · 2017-01-02T22:44:46Z

@farbodr had same issue and it annoyed the hell out of me. Still not sure what exactly causes it, but found that I'd passed my object as a tibble, and it all works when I convert to data.frame. Weirdly, only got the error when I'd set up multiple cores with doMC.

khari14 · 2018-03-08T12:00:29Z

titanicData$Pclass <- as.factor(titanicData$Pclass)
titanicData$Survived <- as.factor(titanicData$Survived)
titanicData$Title <- as.factor(titanicData$Title)
titanicData$Pclass <- as.factor(titanicData$FsizeD)
titanicData$Pclass <- as.factor(titanicData$Embarked)

index<- sample(1:nrow(titanicData),nrow(titanicData)*0.75, replace = F)
train<- titanicData[index,]
test<- titanicData[-index,]

library(randomForest)
rf_model <- randomForest(Survived ~ Pclass + Sex + Age + SibSp + Parch + Fare + Title + FsizeD, data = train, na.action = na.omit)

I am getting following error when I am fitting the rf:

Error in randomForest.default(m, y, ...) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In data.matrix(x) : NAs introduced by coercion

Can you please explain?

topepo · 2018-03-08T20:18:11Z

Please open a new issue and follow the issue template.

DhanaLakshmi15 · 2018-03-19T09:19:06Z

str(data)
data$Selector<-as.factor(data$Selector)
table(data$Selector)
set.seed(123)
ind<-sample(2, nrow(data), replace=TRUE, prob=c(0.7,0.3))
train<-data[ind==1,]
test<-data[ind==2,]
#226 library(randomForest)
set.seed(222)
rf<-randomForest(Selector~., data=train)

i am getting the error when i am running the statement of "rf"
Error in randomForest.default(m, y, ...) : cannot handle categorical predictors with more than 53 categories.
can you explain and fix the problem?

topepo · 2018-03-19T16:47:21Z

@DhanaLakshmi15

This is unrelated to caret. Also, I don't think that there is a fix but you can try ranger instead.

topepo closed this as completed Aug 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in na.fail.default and #461 fix #479

Error in na.fail.default and #461 fix #479

farbodr commented Aug 28, 2016 •

edited

Loading

topepo commented Aug 28, 2016

farbodr commented Aug 28, 2016

topepo commented Aug 28, 2016

NBRAYKO commented Jan 2, 2017

khari14 commented Mar 8, 2018

topepo commented Mar 8, 2018

DhanaLakshmi15 commented Mar 19, 2018

topepo commented Mar 19, 2018

Error in na.fail.default and #461 fix #479

Error in na.fail.default and #461 fix #479

Comments

farbodr commented Aug 28, 2016 • edited Loading

topepo commented Aug 28, 2016

farbodr commented Aug 28, 2016

topepo commented Aug 28, 2016

NBRAYKO commented Jan 2, 2017

khari14 commented Mar 8, 2018

topepo commented Mar 8, 2018

DhanaLakshmi15 commented Mar 19, 2018

topepo commented Mar 19, 2018

farbodr commented Aug 28, 2016 •

edited

Loading