Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of bound access when dataset in continual training has fewer features than in the loaded model #5156

Open
shiyu1994 opened this issue Apr 17, 2022 · 0 comments · May be fixed by #5157
Open
Labels

Comments

@shiyu1994
Copy link
Collaborator

Description

When the training dataset has fewer features than in the loaded model, out of bound access can happen at least in one place here

feature_importances[models_[iter]->split_feature(split_idx)] += 1.0;

where feature_importance has the size of the feature number in the training dataset in continual training, while the feature indices in trees come from the loaded model and can cause out of bound access.

Reproducible example

A reproducible example will be added in the PR fixing this bug, also as a test case.

Environment info

LightGBM version or commit hash:
LightGBM master branch

Additional Comments

When the input dataset is from LibSVM file, this can be a bug because it is possible that in the dataset for continual training all values of a feature are missing, and thus when loading the dataset, the found number of features is fewer than before.

However, in terms of other input formats, this problem should be classified as a misusing case. And proper warning, or fatal message should be provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant