Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrinkage different on first tree between default and custom objective #6853

Open
Shanxce opened this issue Mar 4, 2025 · 2 comments
Open
Labels

Comments

@Shanxce
Copy link

Shanxce commented Mar 4, 2025

Version

lightgbm==4.5.0

Install

pip install lightgbm

Question

I found that if we use custom objective, the shrinkage of first tree is equal to learning rate. And if we use default objective like mse, the shrinkage of first tree is equal to 1.

Tests codes are like this:

import numpy as np
import lightgbm as lgb

X = np.random.randn(10000, 5)
y = np.random.randn(10000)

def custom_mse_objective(preds, train_data):
    labels = train_data.get_label()
    residual = preds - labels
    grad = residual
    hess = np.ones_like(labels)
    return grad, hess

params = {
    'num_leaves': 3,
    'max_depth': 3,
    'learning_rate': 0.15,
    'verbose': -1, 
    'seed': 1000,
    'linear_tree': True,
    'deterministic': True,
    'force_row_wise': True,
    'n_estimators': 100,
}


print("Training with default MSE objective...")
train_data = lgb.Dataset(X, label=y)
model_default = lgb.train(params, train_data)

print("\nTraining with custom MSE objective...")
params['objective'] = custom_mse_objective
train_data = lgb.Dataset(X, label=y)
model_custom = lgb.train(params, train_data)

print(model_default.dump_model()['tree_info'][0]['shrinkage'])
print(model_custom.dump_model()['tree_info'][0]['shrinkage'])

outputs:

Training with default MSE objective...

Training with custom MSE objective...
1
0.15

As boosting code shows in bool GBDT::TrainOneIter

if (gradients == nullptr || hessians == nullptr) {
    for (int cur_tree_id = 0; cur_tree_id < num_tree_per_iteration_; ++cur_tree_id) {
      init_scores[cur_tree_id] = BoostFromAverage(cur_tree_id, true);
    }
    ...
  } else {
    ...
  }

it only set init_scores when gradients and hessians are both nullptr, which means it must be default objective.

And as shrinkage code shows

if (std::fabs(init_scores[cur_tree_id]) > kEpsilon) {
    new_tree->AddBias(init_scores[cur_tree_id]);
}

only fabs(init_scores) > 0, it will set new_tree.shrinkage = 1.

Thus, if we use custom objective, the shrinkage of first tree will be learning rate. And if we use default objective like mse, the shrinkage of first tree will be 1. It's not clear to me if this is a deliberate design or a bug, if it's adesign, are there anyone can explain this for me, thanks a lot!

@jameslamb
Copy link
Collaborator

Thanks for using LightGBM and for excellent investigation!

First, I just want to mention... it's really helpful when you provide links to other code on GitHub, to use bare links that aren't wrapped in a label like "here". Then GitHub will render them inline, like this:

if (std::fabs(init_scores[cur_tree_id]) > kEpsilon) {
new_tree->AddBias(init_scores[cur_tree_id]);
}

@jameslamb
Copy link
Collaborator

It's not clear to me if this is a deliberate design or a bug, if it's a design, are there anyone can explain this for me, thanks a lot!

I'll share my understanding from reading this code (thanks for the links!).

I believe it's intentional.

AddBias() directly sets the tree's shrinkage_ back to 1.0:

// force to 1.0
shrinkage_ = 1.0f;

init_scores there in GBDT::TrainOneIter() is not like the init_score you might set on the Dataset. It's a vector of doubles with num_trees_per_iteration_ elements.

std::vector<double> init_scores(num_tree_per_iteration_, 0.0);

For regression, that means it has a single value. It'd only have more than one value for multi-class classification (where num_trees_per_iteration_ is equal to the number of classes in the target).

This part you're pointing to:

// boosting first
if (gradients == nullptr || hessians == nullptr) {
for (int cur_tree_id = 0; cur_tree_id < num_tree_per_iteration_; ++cur_tree_id) {
init_scores[cur_tree_id] = BoostFromAverage(cur_tree_id, true);
}
Boosting();
gradients = gradients_pointer_;
hessians = hessians_pointer_;

does not mean "must be using the default (built-in) objective". It means "we haven't yet done any boosting".

That's the point where init_scores is initialized. That's guarded by this in BoostFromAverage():

if (models_.empty() && !train_score_updater_->has_init_score() && objective_function_ != nullptr) {
if (config_->boost_from_average || (train_data_ != nullptr && train_data_->num_features() == 0)) {

That will be set to 0.0 unless ALL of the following are true:

  • there are not yet any trees on the Booster
  • a custom init_score array has not been provided in the training Dataset
  • using a LightGBM built-in objective (not a custom function)
  • parameter boost_from_average is set to true (the default) OR there are 0 features in the input data

That "there are not yet any trees on the Booster" is the most important part for your question... std::fabs(init_scores[cur_tree_id]) > kEpsilon will only be true for the first tree added to the model.

As it says at https://lightgbm.readthedocs.io/en/latest/Parameters.html#boost_from_average, boosting from the average for the first tree helps the model converge faster.

And so for the first tree, with a built-in objective the leaf value will be like:

{leaf_value} * {shrinkage_rate} + {init_score}

So the shrinkage is still applied based on whatever you passed via params, but THEN the bias is added... so shrinkage_rate: 1.0 is set after to ensure that the bias isn't also scaled by the learning rate when scores are computed.

Thanks very much for putting in the effort to create a reproducible example. Here's one using LightGBM 4.6.0. Notice a few relevant changes compared to the one posted above.

  • uses data where the features are related to the target (instead of purely random)
  • removes "linear_tree": true (this behavior is not specific to linear trees)
  • reduces n_estimators to 5 (we are only looking at the first few trees here)
import numpy as np
import lightgbm as lgb
from sklearn.datasets import make_regression

def custom_mse_objective(preds, train_data):
    labels = train_data.get_label()
    residual = preds - labels
    grad = residual
    hess = np.ones_like(labels)
    return grad, hess

def _summarize(booster):
    model_json = booster.dump_model()
    print(f"shrinkage (tree=0): {model_json['tree_info'][0]['shrinkage']}")
    print(f"shrinkage (tree=1): {model_json['tree_info'][1]['shrinkage']}")
    print("--- first 2 trees ---")
    df = booster.trees_to_dataframe()
    # just first 2 trees
    df = df[df["tree_index"].isin([0, 1])]
    # only leaf nodes
    df = df[df["left_child"].isna()]
    cols_to_keep = ["tree_index", "value", "weight", "count"]
    print(df[cols_to_keep])

# create Dataset
X, y = make_regression(n_samples=1_000, n_features=5, n_informative=5, random_state=312)

# LightGBM uses float32 for label data
label_mean = np.mean(y.astype(np.float32))
print(label_mean)
# -1.3294673

params = {
    'num_leaves': 3,
    'max_depth': 3,
    'learning_rate': 0.15,
    'verbose': -1, 
    'seed': 708,
    'deterministic': True,
    'n_estimators': 5,
}

# case 1: custom objective
bst1 = lgb.train(
    params={**params, "objective": custom_mse_objective},
    train_set=lgb.Dataset(X, label=y)
)
_summarize(bst1)
# shrinkage (tree=0): 0.15
# shrinkage (tree=1): 0.15
# --- first 2 trees ---
#    tree_index      value  weight  count
# 2           0 -17.444587     268    268
# 3           0   0.497709     302    302
# 4           0  10.059119     430    430
# 7           1 -19.598573     174    174
# 8           1  -2.385923     371    371
# 9           1   9.067741     455    455

# case 2: built-in objective, with boost_from_average=False
bst2 = lgb.train(
    params={**params, "objective": "regression", "boost_from_average": False},
    train_set=lgb.Dataset(X, label=y)
)
_summarize(bst2)
# shrinkage (tree=0): 0.15
# shrinkage (tree=1): 0.15
# --- first 2 trees ---
#    tree_index      value  weight  count
# 2           0 -17.444587     268    268
# 3           0   0.497709     302    302
# 4           0  10.059119     430    430
# 7           1 -19.598573     174    174
# 8           1  -2.385923     371    371
# 9           1   9.067741     455    455

# case 3: built-in objective, boost_from_average=False, learning_rate=1.0 (to observe the raw values)
bst3 = lgb.train(
    params={**params, "objective": "regression", "boost_from_average": False, "learning_rate": 1.0},
    train_set=lgb.Dataset(X, label=y)
)
_summarize(bst3)
# shrinkage (tree=0): 1
# shrinkage (tree=1): 1
# --- first 2 trees ---
#    tree_index       value  weight  count
# 2           0 -116.297247     268    268
# 3           0    3.318062     302    302
# 4           0   67.060792     430    430
# 7           1 -108.530284     136    136
# 8           1  -27.076909     425    425
# 9           1   59.835546     439    439

# case 4: built-in objective, with boost_from_average=True (the default)
bst4 = lgb.train(
    params={**params, "objective": "regression"},
    train_set=lgb.Dataset(X, label=y)
)
_summarize(bst4)
# shrinkage (tree=0): 1
# shrinkage (tree=1): 0.15
# --- first 2 trees ---
#    tree_index      value  weight  count
# 2           0 -18.574634     268    268
# 3           0  -0.632337     302    302
# 4           0   8.929072     430    430
# 7           1 -19.429066     174    174
# 8           1  -2.216416     371    371
# 9           1   9.237247     455    455

First, notice that the first 2 cases produce identical models:

  • custom MSE implementation
  • built-in "regression' objective + setting boost_from_average=False

Next, look at the leaf values on that bst4 model (which is just using the built-in "regression" model).

First, notice that the value from the learning_rate=1.0 model multiple by 0.15 is exactly the value from the learning_rate=0.15 model with the custom objective and boost_from_average=False!

-116.297247  # leaf value with no shrinkage
x
0.15                 # shrinkage being applied in all the other models
=
-17.444587    # leaf value form models where BoostFromAverage() returned 0.0

And then that the ultimate leaf value is pretty close to that + the mean of the target.

-17.444587  # leaf value form models where BoostFromAverage() returned 0.0
+
-1.3294673  # mean of the target (after casting all values to float32)
=
-18.7740543

I'm guessing there's some small numeric precision issue that's resulting in that last number not quite matching (or maybe I've made some mistake that @jmoralez or @shiyu1994 could correct).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants