LGBMRegressor on California Housing dataset is 0.68 >> 0.46 #36

fingertap · 2022-08-23T02:21:26Z

I use the sample code to prepare the dataset:

device = 'cpu'
dataset = sklearn.datasets.fetch_california_housing()
task_type = 'regression'

X_all = dataset['data'].astype('float32')
y_all = dataset['target'].astype('float32')
n_classes = None

X = {}
y = {}
X['train'], X['test'], y['train'], y['test'] = sklearn.model_selection.train_test_split(
    X_all, y_all, train_size=0.8
)
X['train'], X['val'], y['train'], y['val'] = sklearn.model_selection.train_test_split(
    X['train'], y['train'], train_size=0.8
)

# not the best way to preprocess features, but enough for the demonstration
preprocess = sklearn.preprocessing.StandardScaler().fit(X['train'])
X = {
    k: torch.tensor(preprocess.fit_transform(v), device=device)
    for k, v in X.items()
}
y = {k: torch.tensor(v, device=device) for k, v in y.items()}

# !!! CRUCIAL for neural networks when solving regression problems !!!
y_mean = y['train'].mean().item()
y_std = y['train'].std().item()
y = {k: (v - y_mean) / y_std for k, v in y.items()}

y = {k: v.float() for k, v in y.items()}

And I train a LGBMRegressor with the default hyper parameters:

model = lgb.LGBMRegressor()
model.fit(X['train'], y['train'])

But when I evaluate on the test fold, I found the performance is 0.68:

>>> test_pred = model.predict(X['test'])
>>> test_pred = torch.from_numpy(test_pred)
>>> rmse = torch.nn.functional.mse_loss(
>>>     test_pred.view(-1), y['test'].view(-1)) ** 0.5 * y_std
>>> print(f'Test RMSE: {rmse:.2f}.')
Test RMSE: 0.68.

Even using the model from rtdl gives me 0.56 RMSE:

(epoch) 57 (batch) 0 (loss) 0.1885
(epoch) 57 (batch) 10 (loss) 0.1315
(epoch) 57 (batch) 20 (loss) 0.1735
(epoch) 57 (batch) 30 (loss) 0.1197
(epoch) 57 (batch) 40 (loss) 0.1952
(epoch) 57 (batch) 50 (loss) 0.1167
Epoch 057 | Validation score: 0.7334 | Test score: 0.5612 <<< BEST VALIDATION EPOCH

Is there anything I miss? How can I reproduce the performance in your paper? Thanks!

The text was updated successfully, but these errors were encountered:

fingertap · 2022-08-23T02:52:35Z

Problem solved. There is a bug in the example code. Change

k: torch.tensor(preprocess.fit_transform(v), device=device)

into

k: torch.tensor(preprocess.transform(v), device=device)

Yura52 · 2022-08-23T08:29:34Z

@fingertap thank you for reporting this!

fingertap closed this as completed Aug 23, 2022

Yura52 transferred this issue from yandex-research/rtdl-revisiting-models Aug 23, 2022

Yura52 reopened this Aug 23, 2022

Yura52 closed this as completed in 448d590 Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LGBMRegressor on California Housing dataset is 0.68 >> 0.46 #36

LGBMRegressor on California Housing dataset is 0.68 >> 0.46 #36

fingertap commented Aug 23, 2022

fingertap commented Aug 23, 2022

Yura52 commented Aug 23, 2022

LGBMRegressor on California Housing dataset is 0.68 >> 0.46 #36

LGBMRegressor on California Housing dataset is 0.68 >> 0.46 #36

Comments

fingertap commented Aug 23, 2022

fingertap commented Aug 23, 2022

Yura52 commented Aug 23, 2022