Skip to content

[style] Introduce clang-format #6895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

[style] Introduce clang-format #6895

wants to merge 5 commits into from

Conversation

borchero
Copy link
Collaborator

@borchero borchero commented Apr 27, 2025

Motivation

Every time I touch the C++ code, I have to manually format it. This always feels annoying and it's impossible to ensure style consistency.

To make it simpler for myself and other contributors to write easily readable C++ code, this PR introduces clang-format which is (to the best of my knowledge) the de-facto standard for formatting in the C/C++ ecosystem.

Changes

  • Add a .clang-format file with the basic style configuration based on the "Google" style (which cpplint also follows)
  • Add clang-format to the .pre-commit-config.yaml
  • Adjust cpplint to not generate false positives after the formatting
  • Ignore most parts of include/LightGBM/config.h and all of src/io/config_auto.cpp from formatting (the latter via .clang-format-ignore) as clang-format messes up the current parameter documentation otherwise
  • Remove duplicate semicolons in three places which made cpplint fail after formatting

Once this PR is merged, we should probably add the merge commit to .git-blame-ignore-revs.

In a follow-up PR, we might want to replace cpplint with clang-tidy as a much more powerful alternative.

@borchero borchero self-assigned this Apr 27, 2025
@borchero borchero marked this pull request as draft April 27, 2025 12:13
@borchero borchero closed this Apr 27, 2025
@borchero borchero reopened this Apr 27, 2025
@borchero borchero marked this pull request as ready for review April 27, 2025 16:45
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

I'm broadly supportive of adding clang-format and autoformatting of C++ code with it... but a +16,379 −17,112 PR is way too large to review.

Could you instead do this the way that @StrikerRUS rolled out yamllint, with a first PR that introduces the tool along with a configuration that skips almost all checks, and then subsequent PRs that gradually add more rules?

@@ -0,0 +1,4 @@
BasedOnStyle: Google
ColumnLimit: 99
# Sorting includes currently breaks the code so we have to disable it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this, please? What specifically does "breaks the code" mean?

If there are issues caused by the ordering of includes, it would be good to identify them... for example, is LightGBM #define-ing things with names that conflict with #defines from third-party headers?

It'd be fine to document that in a separate issue and link the issue in the comment in this file, but I'd like to see a bit more documentation here than just "breaks the code", please.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are issues caused by the ordering of includes, it would be good to identify them...

I agree. I can create an issue but I'm not sure how to comprehensively identify files where the import order plays a role as compilation fails on the first file where there is an issue 👀

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Totally fine, no need to fully investigate the problem just to open the issue.

At least having the plaintext error message(s) from the compiler would be helpful for anyone else facing a similar thing to find it from search, and for anyone wanting to contribute to know where to start.

@@ -0,0 +1 @@
src/io/config_auto.cpp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this in the PR description:

Ignore most parts of include/LightGBM/config.h and all of src/io/config_auto.cpp from formatting (the latter via .clang-format-ignore) as clang-format messes up the current parameter documentation otherwise

How specifically does clang-format "mess it up"? Is this your personal opinion about the style produced by clang-format, or do you mean that it actually breaks the ability to generate https://lightgbm.readthedocs.io/en/latest/Parameters.html?

We always have the option to modify https://github.com/microsoft/LightGBM/blob/master/.ci/parameter-generator.py to generate differently-formatted code to work with these tools.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include/LightGBM/config.h

This file is ignored in parts (via // clang-format off and // clang-format on comments) as comments are re-formatted, breaking the current logic that extracts the documentation. I don't think it's worth making a lot of changes there now, i.e. I think disabling formatting there is reasonable.

src/io/config_auto.cpp

This file would currently be needed to be reformatted by clang-format. I agree that we can adjust the code generation though so that we don't need to ignore it from formatting.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the explanation! Totally fine with that. I do wish that this didn't involve adding 2 new root-level config files, but I understand that just excluding files in .pre-commit-config.yaml probably would mean that IDEs would still make clang-format suggestions about src/io/config_auto.cpp.

I'm ok with leaving this as-is.

@@ -0,0 +1,4 @@
BasedOnStyle: Google
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BasedOnStyle: Google
# configuration for 'clang-format'
#
# for details, see https://clang.llvm.org/docs/ClangFormatStyleOptions.html
BasedOnStyle: Google

Can you please add a comment here on how to find these configuration options' meanings and the list of possible options?

static void CreateRandomDenseData(int32_t nrows, int32_t ncols, int32_t nclasses,
std::vector<double>* features, std::vector<float>* labels,
std::vector<float>* weights, std::vector<double>* init_scores,
std::vector<int32_t>* groups);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see lots of changes like this, where one-argument-per-line is being replaces with a bunching of arguments based on the maximum line length.

Could we instead enforce 1-argument-per-line? I find the new form really difficult to read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree and I will check whether there's an option for that. However, for simplicity, I'd prefer to not deviate from the default style settings too much. Generally, any decision that we make here can be easily changed as clang-format auto-formats.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use this thread to continue the discussion in quote comments from #6895 (comment).

I'd prefer to not deviate from the default style settings too much.

As long as the CI passes, I am pretty certain that there can't be an issue

I really strongly disagree with this.

"the code doesn't break" is a minimal requirement for accepting this change, but not sufficient. The code in this repo isn't just used by compilers... it also needs to be read by humans.

I'm supportive of using a tool for auto-formating C++ Code, but not of such a large change to the style of the code without review and well-informed approval.

There are many customizations available via https://clang.llvm.org/docs/ClangFormatStyleOptions.html and I'd like to see a mix of them used to reduce the size of this diff as much as possible. That'd also help with reducing merge conflicts for large PRs like #6182 and #6138.

I think breaking up the PR in the same way as for the introduction of yamllint is neither viable not necessary.

I disagree that it is not "viable". Here's an example of how this could be done:

  • PR 1: introduce the tooling + a very small batch of initial code changes
  • PR2: add another batch of files
    • remove a chunk of files from .clang-format-ignore, maybe everything in src/objective
  • PR3-{n}: add more batches of files
    • by removing more exclusions from .clang-format-ignore
    • trying to keep the diff to a manageable size for human reviewers
    • keep doing these until everything is formatted

This would make the task of "modify the configuration to minimize the diff as much as possible" more manageable. And at each PR, the diff would be small enough that we could have informed conversations about how to change configs to match whatever style we want the code to have.

If you're open to this approach generally but don't have the time / willingness to do all of these PRs, then I'll propose this: you do "PR 1" (this PR), and I'll do all the subsequent ones.

cc @StrikerRUS @jmoralez @shiyu1994 for your thoughts as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not comfortable to approve PRs without at least reading all changes in them. So I'm +1 for splitting this huge PR into a smaller reviewable ones.

@@ -22,6 +22,11 @@ repos:
hooks:
- id: yamllint
args: ["--strict"]
- repo: https://github.com/pre-commit/mirrors-clang-format
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions about how this will affect the development process.

  • does this require installing clang separately, or will pre-commit set that up for you?
  • does this work on Windows?

@borchero
Copy link
Collaborator Author

but a +16,379 −17,112 PR is way too large to review.

Fair enough. However, I think breaking up the PR in the same way as for the introduction of yamllint is neither viable not necessary. As clang-format does not yield linting failures that need to be fixed manually but auto-formats the code, I don't think its changes require careful review. As long as the CI passes, I am pretty certain that there can't be an issue. According to this Stackoverflow answer, the only way that clang-format can break the code is by sorting imports (which I'd propose to simply disable for the sake of the introducing PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants