-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error calling gbm #145
Comments
Please post a minimal, reproducible example here and we'll take a look at it. |
Sure, here is an rda file with the objects issueDataframe and issueResponse. It has only 20 rows, but all 14 variables. To reproduce the error, run: BTW: with the latest github versions of caret and gbm, I had to define "gbm.fit = gbm". But it didn't change the error. |
It's much more helpful to have an example fully encapsulated in a runnable r script. Try your script with the iris dataset (3 classes) and see if your script works. |
Ok, it throws the same error. The script is:
And the first warning message.
|
Can you send the results of I'll look into this but I don't get an error with the iris data: set.seed(1)
mod <- train(iris[,1:4], iris[,5], method="gbm", verbose = T) Results:
|
I've just cleared my environment and it started to work for the iris dataset, will check if it works as part of my RMarkdown document (where there are many imports, scripts and definitions). |
It still fails with the data file I mentioned before. It's just a portable small file and can be loaded with load(), with variables properly coded. Could you please try it? |
I'm not going to Often the act of producing the minimal example will identify the source of the bug, but once you've created it, I'll be happy to help you solve the problem. |
BTW, the key element of a MRR is:
|
The "entire environment" is a 20x14 dataframe and a 20 length vector. I could upload a CSV if it makes you happy, but claiming that the package works only with "anything that's accessible by the standard R installation" doesn't fix the issue either. Also, I could use an R command to load that CSV from the internet so you can run it in the console, but that won't change the fact that the code fails for that. I repeat, there are NO other packages or variables loaded other than that small dataframe. |
Then please use |
Furthermore, it might be interesting to investigate how many columns you can remove from the dataframe before the bug goes away. |
Here is the script, which loads the dataframe from a gist. I added more rows so that it doesn't fail because of not enough data. The problem seems to be related to libraries being imported before (if I remove them, it works).
Edit: removed unnecessary modules. |
Interesting! How many libraries can your remove before it starts working? — On Wed, Apr 22, 2015 at 6:26 PM, Alejandro Pulver
|
dplyr is causing the issue, even if I load plyr before as the documentation suggests |
Interesting! Can you send the output of a sessionInfo()? There might be a bug between Caret and dplyr. — On Wed, Apr 22, 2015 at 6:58 PM, Alejandro Pulver
|
|
All right, so the problem is that the gbm functions in caret don't work with plyr's > class(issueDataframe)
[1] "tbl_df" "data.frame" For now, coerce to a regular data.frame: fitRF = train(as.data.frame(issueDataframe), issueResponse, method = "gbm") We'll look into adding support for |
It looks like other caret models work with |
In general it helps to do |
Thanks a lot! |
No problem! |
Just got a very similar problem with my own custom caret model based on |
Check the class of the object. If you are using |
Any plans to implement acceptance of data_frame & tbl_df objects? They are likely the future of R objects & are faster/cleaner to deal with. |
I faced the same problem when training with xgbTree too not just gbm. |
It is a fairly complex issue. Most models end up using matrices no matter what you start with. I used to automatically do an The only way that I can see around this is to do something like train.tbl_df <- function(x, y, ...) train.default(as.data.frame(x), y, ...) but that is really no different than you using train.default(as.data.frame(x), y, ...) yourself (as @zachmayer suggests). I the near future, I strongly believe that there will be modeling workflows that can use other types of data types directly =] |
I faced this issue today while fitting a glm model . however after I cleared the workspace I dint encounter this error . All my data were in data frame(not a tbl_df). This may help to narrow down the bug 1st run: Error in requireNamespaceQuietStop("pROC") : package pROC is required " Fold01: parameter=none at this moment I cleared the workspace but did not start a new session . Something to do with pROC may be but I am not sure , let me know if you need anymore detail |
Without a reproducible example and the results of If you add these, please start another issue. |
I've asked this question in stackoverflow, but it also may be a bug.
http://stackoverflow.com/questions/29802216/caret-error-using-gbm-but-not-without-caret
I could run more tests or provide more information if it's helpful to fix this issue.
The text was updated successfully, but these errors were encountered: