[CUDA] Potential missed inversion of discounts within CUDA implementation of LambdarankNDCG #6847

DJ-Hayden · 2025-02-28T09:51:49Z

Description

Hello, I believe there might be a missed discount inversion within the CUDA implementation of LightGBM's LambdarankNDCG rank objective class? I'm unfortunately better at math than optimized C++/CUDA so I may be missing something simple. I'll try to link to the appropriate code/lines to justify my thoughts.

Within the normal, non-CUDA implementation of GetGradientsForOneQuery within rank_objective.hpp here there are two calls to DCGCalculator::GetDiscount to get the discounts.
I believe those calls go to DCGCalculator::GetDiscount here, which looks up pre-computed discounts.
Those pre-computed discounts seem to occur here during the DCGCalculator::Init call. Notably, the discounts are 1.0 / std::log2(2.0 + i), which aligns with the math (that the discounts are inverted).
Within the CUDA implementation of GetGradientsKernel_LambdarankNDCG within cuda_rank_objective.cu here the discounts are not precomputed.
Instead, each discount here and here directly compute the discount as log2(2.0f + i).

It does not appear to invert the discount later in the code. In fact, both the non-CUDA and CUDA implementation are nigh-identical within that lambda gradient calculation loop (besides the above inversion discrepancy and a pre-computed sigmoid lookup table).

Reproducible example

The code runs, trains, and seemingly works fine even without the inversion. So in all likelihood I might just be misunderstanding something.

Environment info

LightGBM Version 4.6.0

Additional Comments

Since the ranks should always be 0+, I believe the bug can be fixed with simple inversions within the function:

const double high_discount = log2(2.0f + high_rank); -> const double high_discount = 1.0f / log2(2.0f + high_rank);
const double low_discount = log2(2.0f + low_rank); -> const double low_discount = 1.0f / log2(2.0f + low_rank);

The text was updated successfully, but these errors were encountered:

jameslamb · 2025-03-03T02:53:07Z

@shiyu1994 or @metpavel can you please investigate this one?

shiyu1994 · 2025-03-05T17:42:27Z

Good catch. It does seem that there's a missing of inversion. I'll double check and push a fix for this if necessary.

jameslamb added the question label Mar 3, 2025

jameslamb changed the title ~~Potential missed inversion of discounts within CUDA implementation of LambdarankNDCG~~ [CUDA] Potential missed inversion of discounts within CUDA implementation of LambdarankNDCG Mar 3, 2025

shiyu1994 self-assigned this Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Potential missed inversion of discounts within CUDA implementation of LambdarankNDCG #6847

[CUDA] Potential missed inversion of discounts within CUDA implementation of LambdarankNDCG #6847

DJ-Hayden commented Feb 28, 2025 •

edited

Loading

jameslamb commented Mar 3, 2025

shiyu1994 commented Mar 5, 2025

[CUDA] Potential missed inversion of discounts within CUDA implementation of LambdarankNDCG #6847

[CUDA] Potential missed inversion of discounts within CUDA implementation of LambdarankNDCG #6847

Comments

DJ-Hayden commented Feb 28, 2025 • edited Loading

Description

Reproducible example

Environment info

Additional Comments

jameslamb commented Mar 3, 2025

shiyu1994 commented Mar 5, 2025

DJ-Hayden commented Feb 28, 2025 •

edited

Loading