Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upadd checkCondtionalX to preProc of train()? #334
Comments
|
|
|
Thanks a lot! if added to preProc, I hope those predictors with empty distribution be automatically removed from that set of training data. May I get a note when this is implemented? |
|
I'll update this thread as it happens. |
|
That should do it. If you could do some testing, that would be a big help. |
|
Thanks for the quick response! And sorry for the slow testing! I'm not familiar with how to use github and it also took me hours to install devtools into my centOS due to lack of many dependencies. Anyway, finally I successfully installed and verified the caret version is caret_6.0-63. But here comes the questions: is the below warning message from nb due to the empty conditional distribution of a predictor? If it does, adding
Here is my code rm(list=ls())
library(caret)
library(klaR)
data(mdrr)
set.seed(107)
preds1=mdrrDescr[1:50,1:10]
resps1=mdrrClass[1:50]
table(resps1)
data1=cbind(preds1,Class=resps1)
inTrain <- createDataPartition(y = data1$Class, p = .75,
list = FALSE)
str(inTrain)
training <- data1[inTrain,]
testing <- data1[-inTrain,]
ctrl <- trainControl(method = "repeatedcv",
repeats = 3,
number = 10,
classProbs = TRUE,
summaryFunction = twoClassSummary)
nbFit <- train(Class ~ .,
data = training,
method = "nb",
tuneLength = 15,
trControl = ctrl,
metric = "ROC",
preProc = c("center", "scale","conditionalX"))
nbFit2=NaiveBayes(Class ~ .,data=training[-(20:25),]) |
|
That is a warning generated by the underlying naive Bayes code. Basically, it wants you to know that, for one sample, all of the elements that make up the posterior probability (besides the prior) are zero. |
|
but should |
|
No, I don't think that it related to this. These predictors are all continuous (some with fairly degenerate distributions) and the density is zero where one of the holdout samples falls. |
|
Thanks for the quick response! I am not very good at statistics, so please understand if my questions are naive.
|
|
The warning most likely occurs when samples are being predicted (perhaps the holdout during resampling) There could be a zero value from a perfectly good density. If your data are X ~ N(0,1), the density for X = 10^10 would be zero since it is a highly unlikely value. During resampling, pre-processing statistics are computed on the same data being used to fit the model. It is applied to the same data as well as the holdout. After resampling, the statistics are computed on the entire training set and applied to any data that are subsequently predicted. I'm going to close this since it is not about this particular bug anymore. |
It will reduce the chance of naive bayes to crash.
Thanks a lot!