-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shrinkage different on first tree between default and custom objective #6853
Comments
Thanks for using LightGBM and for excellent investigation! First, I just want to mention... it's really helpful when you provide links to other code on GitHub, to use bare links that aren't wrapped in a label like "here". Then GitHub will render them inline, like this: LightGBM/src/boosting/gbdt.cpp Lines 416 to 418 in f7c641d
|
I'll share my understanding from reading this code (thanks for the links!). I believe it's intentional.
LightGBM/include/LightGBM/tree.h Lines 228 to 229 in 6437645
LightGBM/src/boosting/gbdt.cpp Line 346 in f7c641d
For regression, that means it has a single value. It'd only have more than one value for multi-class classification (where This part you're pointing to: LightGBM/src/boosting/gbdt.cpp Lines 347 to 354 in f7c641d
does not mean "must be using the default (built-in) objective". It means "we haven't yet done any boosting". That's the point where LightGBM/src/boosting/gbdt.cpp Lines 322 to 323 in f7c641d
That will be set to 0.0 unless ALL of the following are true:
That "there are not yet any trees on the Booster" is the most important part for your question... As it says at https://lightgbm.readthedocs.io/en/latest/Parameters.html#boost_from_average, boosting from the average for the first tree helps the model converge faster. And so for the first tree, with a built-in objective the leaf value will be like:
So the shrinkage is still applied based on whatever you passed via params, but THEN the bias is added... so Thanks very much for putting in the effort to create a reproducible example. Here's one using LightGBM 4.6.0. Notice a few relevant changes compared to the one posted above.
import numpy as np
import lightgbm as lgb
from sklearn.datasets import make_regression
def custom_mse_objective(preds, train_data):
labels = train_data.get_label()
residual = preds - labels
grad = residual
hess = np.ones_like(labels)
return grad, hess
def _summarize(booster):
model_json = booster.dump_model()
print(f"shrinkage (tree=0): {model_json['tree_info'][0]['shrinkage']}")
print(f"shrinkage (tree=1): {model_json['tree_info'][1]['shrinkage']}")
print("--- first 2 trees ---")
df = booster.trees_to_dataframe()
# just first 2 trees
df = df[df["tree_index"].isin([0, 1])]
# only leaf nodes
df = df[df["left_child"].isna()]
cols_to_keep = ["tree_index", "value", "weight", "count"]
print(df[cols_to_keep])
# create Dataset
X, y = make_regression(n_samples=1_000, n_features=5, n_informative=5, random_state=312)
# LightGBM uses float32 for label data
label_mean = np.mean(y.astype(np.float32))
print(label_mean)
# -1.3294673
params = {
'num_leaves': 3,
'max_depth': 3,
'learning_rate': 0.15,
'verbose': -1,
'seed': 708,
'deterministic': True,
'n_estimators': 5,
}
# case 1: custom objective
bst1 = lgb.train(
params={**params, "objective": custom_mse_objective},
train_set=lgb.Dataset(X, label=y)
)
_summarize(bst1)
# shrinkage (tree=0): 0.15
# shrinkage (tree=1): 0.15
# --- first 2 trees ---
# tree_index value weight count
# 2 0 -17.444587 268 268
# 3 0 0.497709 302 302
# 4 0 10.059119 430 430
# 7 1 -19.598573 174 174
# 8 1 -2.385923 371 371
# 9 1 9.067741 455 455
# case 2: built-in objective, with boost_from_average=False
bst2 = lgb.train(
params={**params, "objective": "regression", "boost_from_average": False},
train_set=lgb.Dataset(X, label=y)
)
_summarize(bst2)
# shrinkage (tree=0): 0.15
# shrinkage (tree=1): 0.15
# --- first 2 trees ---
# tree_index value weight count
# 2 0 -17.444587 268 268
# 3 0 0.497709 302 302
# 4 0 10.059119 430 430
# 7 1 -19.598573 174 174
# 8 1 -2.385923 371 371
# 9 1 9.067741 455 455
# case 3: built-in objective, boost_from_average=False, learning_rate=1.0 (to observe the raw values)
bst3 = lgb.train(
params={**params, "objective": "regression", "boost_from_average": False, "learning_rate": 1.0},
train_set=lgb.Dataset(X, label=y)
)
_summarize(bst3)
# shrinkage (tree=0): 1
# shrinkage (tree=1): 1
# --- first 2 trees ---
# tree_index value weight count
# 2 0 -116.297247 268 268
# 3 0 3.318062 302 302
# 4 0 67.060792 430 430
# 7 1 -108.530284 136 136
# 8 1 -27.076909 425 425
# 9 1 59.835546 439 439
# case 4: built-in objective, with boost_from_average=True (the default)
bst4 = lgb.train(
params={**params, "objective": "regression"},
train_set=lgb.Dataset(X, label=y)
)
_summarize(bst4)
# shrinkage (tree=0): 1
# shrinkage (tree=1): 0.15
# --- first 2 trees ---
# tree_index value weight count
# 2 0 -18.574634 268 268
# 3 0 -0.632337 302 302
# 4 0 8.929072 430 430
# 7 1 -19.429066 174 174
# 8 1 -2.216416 371 371
# 9 1 9.237247 455 455 First, notice that the first 2 cases produce identical models:
Next, look at the leaf values on that First, notice that the value from the
And then that the ultimate leaf value is pretty close to that + the mean of the target.
I'm guessing there's some small numeric precision issue that's resulting in that last number not quite matching (or maybe I've made some mistake that @jmoralez or @shiyu1994 could correct). |
Version
lightgbm==4.5.0
Install
Question
I found that if we use custom objective, the shrinkage of first tree is equal to learning rate. And if we use default objective like mse, the shrinkage of first tree is equal to 1.
Tests codes are like this:
outputs:
As boosting code shows in bool GBDT::TrainOneIter
it only set init_scores when gradients and hessians are both nullptr, which means it must be default objective.
And as shrinkage code shows
only fabs(init_scores) > 0, it will set new_tree.shrinkage = 1.
Thus, if we use custom objective, the shrinkage of first tree will be learning rate. And if we use default objective like mse, the shrinkage of first tree will be 1. It's not clear to me if this is a deliberate design or a bug, if it's adesign, are there anyone can explain this for me, thanks a lot!
The text was updated successfully, but these errors were encountered: