New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xgboost #43
xgboost #43
Conversation
@Athospd - Thank you so much for this PR!! I'll review it soon. |
Two more things. I Just made a test with a real model from my work and It took hours. Maybelline It worth It to invest a little bit into efficiency. |
package ‘xgboost’ version 0.82.1 (latest from CRAN) |
Hi @Athospd , I'm getting a test failure when I run your code locally. Do R checks fail on your side?
I think the issue are changes made to the |
R/model-xgboost.R
Outdated
dplyr::mutate_at(dplyr::vars(Yes, No, Missing), ~stringr::str_replace(., "^.*-", "")) %>% | ||
dplyr::mutate_at(dplyr::vars(Yes, No, Missing), ~as.integer(.) + 1) %>% # xxgboost is 0-indexed | ||
dplyr::group_by(Tree) %>% | ||
tidyr::nest(.key = "tree_data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to avoid dependencies on tidyr
please? I'm also trying to reduce dependencies on dplyr
, but if we can't avoid those, we'll need to add the @importFrom for them in the R/tidypredict.R
R/model-xgboost.R
Outdated
} | ||
|
||
get_xgb_trees.character <- function(xgb_dump_text_with_stats, feature_names) { | ||
feature_names_tbl <- tibble::enframe(feature_names, "Feature", "feature_name") %>% dplyr::mutate(Feature = as.character(Feature - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid dependencies on tibble
please?
library(magrittr) | ||
library(RSQLite) | ||
library(DBI) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that these can be library calls can be moved to tests/testthat.R
R/model-xgboost.R
Outdated
|
||
trees <- xgb.model.dt.tree(text = xgb_dump_text_with_stats) %>% | ||
dplyr::left_join(feature_names_tbl, by = "Feature") %>% | ||
dplyr::mutate_at(dplyr::vars(Yes, No, Missing), ~stringr::str_replace(., "^.*-", "")) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the stringr
dependency please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a "base" version of get_xgb_trees.character()
at this commit db1a933
For some reason, my Review comments were lost. Just wanted to say thank you for this PR. And that my comments were mainly centered around avoiding the use of |
Thanks, Edgar, contributing for this pkg is a honor to me. |
That's just awesome!! I'm glad that you are able to take advantage of this package. At this point the Travis CI checks are still failing. But it looks that at this point is because it's missing the package dependency for It looks like we'll almost there. So would you mind also adding the NEWS item please? Please make sure to reference your GH user and Issue ID that this PR covers at the end and in parenthesis, like this:
|
I don't know if I'm missing something but I was only able to make the tests run successfully after put back the 'library' calls inside the test_xgboost.R file. I'd put it inside testthat.R as we'd agreed to do so. And maybe that explains why Travis CI checks were failing. |
Are you getting any Warnings? I think the current failure is because Travis is set to treat Warnings as Errors |
Ok, I just ran
|
tests/testthat.R
Outdated
library(xgboost) | ||
library(purrr) | ||
library(dplyr) | ||
library(magrittr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to remove any references to magrittr
completely. Unless you are using a function that dplyr
does not import from that package. Just reviewed the Travis logs and it seems that the warning is being generated by not having this package listed as a dependency in the DESCRIPTION file:
* checking for unstated dependencies in ‘tests’ ... WARNING
'library' or 'require' call not declared from: ‘magrittr’
* checking tests ...
Running ‘testthat.R’ [12s/12s]
OK
* checking PDF version of manual ... OK
* DONE
Status: 1 WARNING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, this change was hung on my local branch.
Outstanding work @Athospd ! Thank you again. |
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Overall
for object of class
xgb.Booster
tidypredict_fit()
tidypredict_sql()
tidypredict_to_column()
parse_model()
build_fit_formula_xgb()
It is based on ´xgb,dump(dump_format = "text")` output. I tried to stick with the randomForest code structure as similar as I could.
todo
Comments
xgb.train()
that do not storeobjective
when custom function is passed by user.tidypredict_fit()
returns acall
instead oflist
(like randomForest). This way it is possible to maketidypredict_to_column()
to work in despite it been a tree based model.