Skip to content

added support for case_weights#72

Merged
simonpcouch merged 11 commits into
tidymodels:mainfrom
p-schaefer:main
May 8, 2024
Merged

added support for case_weights#72
simonpcouch merged 11 commits into
tidymodels:mainfrom
p-schaefer:main

Conversation

@p-schaefer
Copy link
Copy Markdown

I noticed there wasn't much progress on integrating case_weights into lightgbm through bonsai so I've taken a crack at adding it. This is in reference to 65.

Copy link
Copy Markdown
Contributor

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A strong start! Some pointers to move this closer:

Comment thread R/lightgbm.R Outdated
#' @keywords internal
#' @export
train_lightgbm <- function(x, y, max_depth = -1, num_iterations = 100, learning_rate = 0.1,
train_lightgbm <- function(x, y,weights=NULL, max_depth = -1, num_iterations = 100, learning_rate = 0.1,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
train_lightgbm <- function(x, y,weights=NULL, max_depth = -1, num_iterations = 100, learning_rate = 0.1,
train_lightgbm <- function(x, y, weights = NULL, max_depth = -1, num_iterations = 100, learning_rate = 0.1,

Spaces after commas and before and after operators; could you apply this style elsewhere in the PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've corrected these

Comment thread R/lightgbm.R
if (!is.null(val_index)) {
args$main$valids <-
list(validation =
lightgbm::lgb.Dataset(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a number of unrelated style edits in this PR like this one. Could you revert?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've corrected these? Let me know if thats not the case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. If you look in this diff, there are several spaces before the function call and arguments that didn't used to be there.

Comment thread tests/testthat/test-lightgbm.R Outdated
Comment thread tests/testthat/test-lightgbm.R Outdated
Patrick Schaefer added 4 commits December 15, 2023 11:57
…er vector specifying one or more variables in `x`used for grouping observations with the same value to either the training or validation set.
@p-schaefer
Copy link
Copy Markdown
Author

Thanks for the review @simonpcouch. Note, I am also working on a method that allows some ability to customize validation data beyond just specifying a proportion of training data. I find this quite necessary for my work (since my data are not entirely independent - hence needing case weights as well), but I don't know if its robust enough to include in Bonsai proper.

Comment thread R/lightgbm.R
# this function ensures that multiclass classification predictions are always
# returned as a [num_observations, num_classes] matrix, regardless of lightgbm version
reshape_lightgbm_multiclass_preds <- function(preds, num_rows) {
n_preds_per_case <- length(preds) / num_rows
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More space deletions here that are unrelated to the PR.

Comment thread R/lightgbm.R
p <- stats::predict(object$fit, prepare_df_lgbm(new_data), type = "raw", ...)
p <- stats::predict(object$fit, prepare_df_lgbm(new_data), type = "raw", ...)
} else {
p <- stats::predict(object$fit, prepare_df_lgbm(new_data), rawscore = TRUE, ...)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well.

Comment thread tests/testthat/_snaps/lightgbm.new.md Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file, lightgbm.new.md, is an intermediate file from testthat. To merge its changes into the existing test-lightgbm.R, use testthat::snapshot_accept().

@simonpcouch
Copy link
Copy Markdown
Contributor

Thanks again for your work here. Given the lack of movement on this PR, I'm going to go ahead and close. :)

@simonpcouch simonpcouch closed this Apr 7, 2024
@joranE
Copy link
Copy Markdown

joranE commented Apr 8, 2024

I have found a need for this feature recently, so I may try to resurrect this pull request and finish cleaning it up sometime this week. Anything I should know besides your comments above, @simonpcouch ?

@simonpcouch
Copy link
Copy Markdown
Contributor

That'd be wonderful, @joranE! Conceptually, the implementation here looks solid—following the current review comments would put this PR in a good spot.

@p-schaefer
Copy link
Copy Markdown
Author

p-schaefer commented Apr 8, 2024 via email

@joranE
Copy link
Copy Markdown

joranE commented Apr 8, 2024

@p-schaefer Sounds good, I'll let you tackle it! If you find yourself pulled away from it again for any reason, just let me know.

@p-schaefer
Copy link
Copy Markdown
Author

I've fixed the spacing issues, and dealt with the intermediate file from testthat. Should I submit a new PR?

@simonpcouch
Copy link
Copy Markdown
Contributor

You're welcome to push to the same branch! I'll reopen this one.

@simonpcouch simonpcouch reopened this Apr 9, 2024
@p-schaefer
Copy link
Copy Markdown
Author

Sorry, forgot to rerun the tests and checks. Looks good to me now, let me know if there is anything else.

@joranE
Copy link
Copy Markdown

joranE commented Apr 9, 2024

Thanks for closing the loop on this @p-schaefer! Now I just need to figure out if there's an easy way to tweak this interface to allow for the use of the linear_tree parameter for piecewise linear trees, which "naturally" needs to be passed to lgb.Dataset() as well, instead of lgb.train(). 🙄

@p-schaefer
Copy link
Copy Markdown
Author

No problem. It might be worth opening a new issue to discuss how to deal with linear_tree (if you want some help). I think we can probably come up with a suitably tidy workflow for dealing with arguments that should be passed to lgb.Dataset() vs lgb.train(). I've been thinking of playing around with linear_tree as well.

@joranE
Copy link
Copy Markdown

joranE commented Apr 9, 2024

I added a new issue here.

Comment thread tests/testthat/test-lightgbm.R Outdated
@simonpcouch
Copy link
Copy Markdown
Contributor

Test failures are unrelated, feel free to ignore. :)

@p-schaefer
Copy link
Copy Markdown
Author

@simonpcouch Was there anything else you need from me?

@simonpcouch
Copy link
Copy Markdown
Contributor

Nice work, @p-schaefer! :)

@simonpcouch simonpcouch merged commit f40d55a into tidymodels:main May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants