Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] lgb.Dataset with free_raw_data = FALSE still raises an error #6008

Open
mocista opened this issue Jul 26, 2023 · 3 comments · May be fixed by #6844
Open

[R-package] lgb.Dataset with free_raw_data = FALSE still raises an error #6008

mocista opened this issue Jul 26, 2023 · 3 comments · May be fixed by #6844

Comments

@mocista
Copy link

mocista commented Jul 26, 2023

Description

Defining a lgb.Dataset with free_raw_data = FALSE. Then slicing it in two parts and using them as train/validation sets in lightgbm (R). The call lgb.train(... fails with an error

please set ‘free_raw_data = FALSE’ when you construct lgb.Dataset"

Don't understand why. Can anyone help please?

Reproducible example

library(lightgbm)

boston = MASS::Boston
str(boston)
dim(boston)

set.seed(12)
boston_lgb_dataset = lgb.Dataset(scale(boston[, -14]), label = boston[,  14] ,free_raw_data = FALSE)

dtrain = lightgbm::slice(boston_lgb_dataset, c(1:350))
dtest = lightgbm::slice(boston_lgb_dataset, c(351:506))

params = list(
  objective = "regression"
  , metric = "l2"
  , min_data = 1L
  , learning_rate = .3
)
 
model = lgb.train( 
  params = params
  , data = dtrain
  , nrounds = 20L
   , valids = list( test  = dtest) 
 )

Environment info

R version 4.2.0

LightGBM version: 3.3.5

@jameslamb jameslamb changed the title lgb.Dataset with free_raw_data = FALSE still raises an error [R-package] lgb.Dataset with free_raw_data = FALSE still raises an error Aug 8, 2023
@jameslamb
Copy link
Collaborator

Thanks for using LightGBM and for the clear writeup with a reproducible example!

I've edited the text a bit to make it clearer which parts are your own words, logs printed by LightGBM, and the code you ran. If you're unfamiliar with how to do that, please see https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax.

I'll investigate this today.

@jameslamb
Copy link
Collaborator

I ran this code on {lightgbm} v3.3.5 and v4.0.0 (built from source here, since that isn't on CRAN yet), and got the same error both times. Providing a bit more error message to help:

Error in valid_data$set_reference(data) :
set_reference: cannot set reference after freeing raw data,
please set ‘free_raw_data = FALSE’ when you construct lgb.Dataset

That looks like a bug to me. Maybe because the raw data isn't being passed through in Dataset$slice()?

slice = function(idxset) {
return(
Dataset$new(
data = NULL

Sorry about that! Are you interested in trying to contribute a fix?

If not, someone here will pick it up when we have time.

@jameslamb jameslamb added the bug label Aug 8, 2023
@walkerjameschris
Copy link

walkerjameschris commented Feb 28, 2025

I opened a PR for this as a possible fix. I reproduced this using the lgb.slice.Dataset interface, but got the same error.

library(lightgbm)

boston = MASS::Boston
str(boston)
dim(boston)

set.seed(12)
boston_lgb_dataset = lgb.Dataset(scale(boston[, -14]), label = boston[,  14] ,free_raw_data = FALSE)

dtrain = lgb.slice.Dataset(boston_lgb_dataset, c(1:350))
dtest = lgb.slice.Dataset(boston_lgb_dataset, c(351:506))

params = list(
  objective = "regression"
  , metric = "l2"
  , min_data = 1L
  , learning_rate = .3
)
 
model = lgb.train( 
  params = params
  , data = dtrain
  , nrounds = 20L
   , valids = list( test  = dtest) 
 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants