[style] Introduce clang-format #6895

borchero · 2025-04-27T12:10:30Z

Motivation

Every time I touch the C++ code, I have to manually format it. This always feels annoying and it's impossible to ensure style consistency.

To make it simpler for myself and other contributors to write easily readable C++ code, this PR introduces clang-format which is (to the best of my knowledge) the de-facto standard for formatting in the C/C++ ecosystem.

Changes

Add a .clang-format file with the basic style configuration based on the "Google" style (which cpplint also follows)
Add clang-format to the .pre-commit-config.yaml
Adjust cpplint to not generate false positives after the formatting
Ignore most parts of include/LightGBM/config.h and all of src/io/config_auto.cpp from formatting (the latter via .clang-format-ignore) as clang-format messes up the current parameter documentation otherwise
Remove duplicate semicolons in three places which made cpplint fail after formatting

Once this PR is merged, we should probably add the merge commit to .git-blame-ignore-revs.

In a follow-up PR, we might want to replace cpplint with clang-tidy as a much more powerful alternative.

jameslamb

Thanks for this!

I'm broadly supportive of adding clang-format and autoformatting of C++ code with it... but a +16,379 −17,112 PR is way too large to review.

Could you instead do this the way that @StrikerRUS rolled out yamllint, with a first PR that introduces the tool along with a configuration that skips almost all checks, and then subsequent PRs that gradually add more rules?

jameslamb · 2025-04-27T17:52:13Z

.clang-format

@@ -0,0 +1,4 @@
+BasedOnStyle: Google
+ColumnLimit: 99
+# Sorting includes currently breaks the code so we have to disable it


Can you elaborate on this, please? What specifically does "breaks the code" mean?

If there are issues caused by the ordering of includes, it would be good to identify them... for example, is LightGBM #define-ing things with names that conflict with #defines from third-party headers?

It'd be fine to document that in a separate issue and link the issue in the comment in this file, but I'd like to see a bit more documentation here than just "breaks the code", please.

If there are issues caused by the ordering of includes, it would be good to identify them...

I agree. I can create an issue but I'm not sure how to comprehensively identify files where the import order plays a role as compilation fails on the first file where there is an issue 👀

Thank you! Totally fine, no need to fully investigate the problem just to open the issue.

At least having the plaintext error message(s) from the compiler would be helpful for anyone else facing a similar thing to find it from search, and for anyone wanting to contribute to know where to start.

jameslamb · 2025-04-27T17:53:09Z

.clang-format-ignore

@@ -0,0 +1 @@
+src/io/config_auto.cpp


I see this in the PR description:

Ignore most parts of include/LightGBM/config.h and all of src/io/config_auto.cpp from formatting (the latter via .clang-format-ignore) as clang-format messes up the current parameter documentation otherwise

How specifically does clang-format "mess it up"? Is this your personal opinion about the style produced by clang-format, or do you mean that it actually breaks the ability to generate https://lightgbm.readthedocs.io/en/latest/Parameters.html?

We always have the option to modify https://github.com/microsoft/LightGBM/blob/master/.ci/parameter-generator.py to generate differently-formatted code to work with these tools.

include/LightGBM/config.h

This file is ignored in parts (via // clang-format off and // clang-format on comments) as comments are re-formatted, breaking the current logic that extracts the documentation. I don't think it's worth making a lot of changes there now, i.e. I think disabling formatting there is reasonable.

src/io/config_auto.cpp

This file would currently be needed to be reformatted by clang-format. I agree that we can adjust the code generation though so that we don't need to ignore it from formatting.

Ok, thanks for the explanation! Totally fine with that. I do wish that this didn't involve adding 2 new root-level config files, but I understand that just excluding files in .pre-commit-config.yaml probably would mean that IDEs would still make clang-format suggestions about src/io/config_auto.cpp.

I'm ok with leaving this as-is.

jameslamb · 2025-04-27T17:54:55Z

.clang-format

@@ -0,0 +1,4 @@
+BasedOnStyle: Google


Suggested change

BasedOnStyle: Google

# configuration for 'clang-format'

#

# for details, see https://clang.llvm.org/docs/ClangFormatStyleOptions.html

BasedOnStyle: Google

Can you please add a comment here on how to find these configuration options' meanings and the list of possible options?

jameslamb · 2025-04-27T17:59:49Z

tests/cpp_tests/testutils.h

+  static void CreateRandomDenseData(int32_t nrows, int32_t ncols, int32_t nclasses,
+                                    std::vector<double>* features, std::vector<float>* labels,
+                                    std::vector<float>* weights, std::vector<double>* init_scores,
+                                    std::vector<int32_t>* groups);


I see lots of changes like this, where one-argument-per-line is being replaces with a bunching of arguments based on the maximum line length.

Could we instead enforce 1-argument-per-line? I find the new form really difficult to read.

I generally agree and I will check whether there's an option for that. However, for simplicity, I'd prefer to not deviate from the default style settings too much. Generally, any decision that we make here can be easily changed as clang-format auto-formats.

Let's use this thread to continue the discussion in quote comments from #6895 (comment).

I'd prefer to not deviate from the default style settings too much.

As long as the CI passes, I am pretty certain that there can't be an issue

I really strongly disagree with this.

"the code doesn't break" is a minimal requirement for accepting this change, but not sufficient. The code in this repo isn't just used by compilers... it also needs to be read by humans.

I'm supportive of using a tool for auto-formating C++ Code, but not of such a large change to the style of the code without review and well-informed approval.

There are many customizations available via https://clang.llvm.org/docs/ClangFormatStyleOptions.html and I'd like to see a mix of them used to reduce the size of this diff as much as possible. That'd also help with reducing merge conflicts for large PRs like #6182 and #6138.

I think breaking up the PR in the same way as for the introduction of yamllint is neither viable not necessary.

I disagree that it is not "viable". Here's an example of how this could be done:

PR 1: introduce the tooling + a very small batch of initial code changes

this PR introduces the tooling, configuration, CI integration, etc.

it only targets 1 file (maybe a medium-sized one, like https://github.com/microsoft/LightGBM/blob/master/src/treelearner/serial_tree_learner.cpp)

"only targets 1 file" could be achieved by listing all other files in .clang-format-ignore

PR2: add another batch of files

remove a chunk of files from .clang-format-ignore, maybe everything in src/objective

PR3-{n}: add more batches of files

by removing more exclusions from .clang-format-ignore

trying to keep the diff to a manageable size for human reviewers

keep doing these until everything is formatted

This would make the task of "modify the configuration to minimize the diff as much as possible" more manageable. And at each PR, the diff would be small enough that we could have informed conversations about how to change configs to match whatever style we want the code to have.

If you're open to this approach generally but don't have the time / willingness to do all of these PRs, then I'll propose this: you do "PR 1" (this PR), and I'll do all the subsequent ones.

cc @StrikerRUS @jmoralez @shiyu1994 for your thoughts as well.

I'm not comfortable to approve PRs without at least reading all changes in them. So I'm +1 for splitting this huge PR into a smaller reviewable ones.

jameslamb · 2025-04-27T18:01:54Z

.pre-commit-config.yaml

@@ -22,6 +22,11 @@ repos:
    hooks:
      - id: yamllint
        args: ["--strict"]
+  - repo: https://github.com/pre-commit/mirrors-clang-format


Some questions about how this will affect the development process.

does this require installing clang separately, or will pre-commit set that up for you?

does this work on Windows?

borchero · 2025-04-27T19:54:27Z

but a +16,379 −17,112 PR is way too large to review.

Fair enough. However, I think breaking up the PR in the same way as for the introduction of yamllint is neither viable not necessary. As clang-format does not yield linting failures that need to be fixed manually but auto-formats the code, I don't think its changes require careful review. As long as the CI passes, I am pretty certain that there can't be an issue. According to this Stackoverflow answer, the only way that clang-format can break the code is by sorting imports (which I'd propose to simply disable for the sake of the introducing PR).

borchero self-assigned this Apr 27, 2025

borchero requested review from guolinke, jameslamb, shiyu1994, jmoralez and StrikerRUS as code owners April 27, 2025 12:10

borchero marked this pull request as draft April 27, 2025 12:13

borchero closed this Apr 27, 2025

borchero force-pushed the clang-format branch from 9f4b397 to a725360 Compare April 27, 2025 13:05

Fix

25f7e0d

borchero reopened this Apr 27, 2025

borchero added 4 commits April 27, 2025 18:05

Make fixes

6f532ec

Fix config

1f0014a

Fix

72397f0

Ignore

e160e7f

borchero marked this pull request as ready for review April 27, 2025 16:45

borchero added the awaiting review label Apr 27, 2025

jameslamb requested changes Apr 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[style] Introduce clang-format #6895

[style] Introduce clang-format #6895

Uh oh!

borchero commented Apr 27, 2025 •

edited

Loading

Uh oh!

jameslamb left a comment

Uh oh!

jameslamb Apr 27, 2025

Uh oh!

borchero Apr 27, 2025

Uh oh!

jameslamb Apr 28, 2025

Uh oh!

jameslamb Apr 27, 2025

Uh oh!

borchero Apr 27, 2025

Uh oh!

jameslamb Apr 28, 2025

Uh oh!

jameslamb Apr 27, 2025

Uh oh!

jameslamb Apr 27, 2025

Uh oh!

borchero Apr 27, 2025

Uh oh!

jameslamb Apr 28, 2025

Uh oh!

StrikerRUS Apr 29, 2025

Uh oh!

jameslamb Apr 27, 2025

Uh oh!

borchero commented Apr 27, 2025

Uh oh!

Uh oh!

[style] Introduce clang-format #6895

Are you sure you want to change the base?

[style] Introduce clang-format #6895

Uh oh!

Conversation

borchero commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

borchero commented Apr 27, 2025

Uh oh!

Uh oh!

borchero commented Apr 27, 2025 •

edited

Loading