Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] Potential missed inversion of discounts within CUDA implementation of LambdarankNDCG #6847

Open
DJ-Hayden opened this issue Feb 28, 2025 · 1 comment
Assignees
Labels

Comments

@DJ-Hayden
Copy link

DJ-Hayden commented Feb 28, 2025

Description

Hello, I believe there might be a missed discount inversion within the CUDA implementation of LightGBM's LambdarankNDCG rank objective class? I'm unfortunately better at math than optimized C++/CUDA so I may be missing something simple. I'll try to link to the appropriate code/lines to justify my thoughts.

  1. Within the normal, non-CUDA implementation of GetGradientsForOneQuery within rank_objective.hpp here there are two calls to DCGCalculator::GetDiscount to get the discounts.

  2. I believe those calls go to DCGCalculator::GetDiscount here, which looks up pre-computed discounts.

  3. Those pre-computed discounts seem to occur here during the DCGCalculator::Init call. Notably, the discounts are 1.0 / std::log2(2.0 + i), which aligns with the math (that the discounts are inverted).

  4. Within the CUDA implementation of GetGradientsKernel_LambdarankNDCG within cuda_rank_objective.cu here the discounts are not precomputed.

  5. Instead, each discount here and here directly compute the discount as log2(2.0f + i).

It does not appear to invert the discount later in the code. In fact, both the non-CUDA and CUDA implementation are nigh-identical within that lambda gradient calculation loop (besides the above inversion discrepancy and a pre-computed sigmoid lookup table).

Reproducible example

The code runs, trains, and seemingly works fine even without the inversion. So in all likelihood I might just be misunderstanding something.

Environment info

LightGBM Version 4.6.0

Additional Comments

Since the ranks should always be 0+, I believe the bug can be fixed with simple inversions within the function:

  • const double high_discount = log2(2.0f + high_rank); -> const double high_discount = 1.0f / log2(2.0f + high_rank);
  • const double low_discount = log2(2.0f + low_rank); -> const double low_discount = 1.0f / log2(2.0f + low_rank);
@jameslamb
Copy link
Collaborator

@shiyu1994 or @metpavel can you please investigate this one?

@jameslamb jameslamb changed the title Potential missed inversion of discounts within CUDA implementation of LambdarankNDCG [CUDA] Potential missed inversion of discounts within CUDA implementation of LambdarankNDCG Mar 3, 2025
@shiyu1994 shiyu1994 self-assigned this Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants