-
Notifications
You must be signed in to change notification settings - Fork 3.9k
[style] Introduce clang-format #6895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
I'm broadly supportive of adding clang-format
and autoformatting of C++ code with it... but a +16,379 −17,112
PR is way too large to review.
Could you instead do this the way that @StrikerRUS rolled out yamllint
, with a first PR that introduces the tool along with a configuration that skips almost all checks, and then subsequent PRs that gradually add more rules?
@@ -0,0 +1,4 @@ | |||
BasedOnStyle: Google | |||
ColumnLimit: 99 | |||
# Sorting includes currently breaks the code so we have to disable it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this, please? What specifically does "breaks the code" mean?
If there are issues caused by the ordering of includes, it would be good to identify them... for example, is LightGBM #define
-ing things with names that conflict with #define
s from third-party headers?
It'd be fine to document that in a separate issue and link the issue in the comment in this file, but I'd like to see a bit more documentation here than just "breaks the code", please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are issues caused by the ordering of includes, it would be good to identify them...
I agree. I can create an issue but I'm not sure how to comprehensively identify files where the import order plays a role as compilation fails on the first file where there is an issue 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Totally fine, no need to fully investigate the problem just to open the issue.
At least having the plaintext error message(s) from the compiler would be helpful for anyone else facing a similar thing to find it from search, and for anyone wanting to contribute to know where to start.
@@ -0,0 +1 @@ | |||
src/io/config_auto.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this in the PR description:
Ignore most parts of include/LightGBM/config.h and all of src/io/config_auto.cpp from formatting (the latter via .clang-format-ignore) as clang-format messes up the current parameter documentation otherwise
How specifically does clang-format
"mess it up"? Is this your personal opinion about the style produced by clang-format
, or do you mean that it actually breaks the ability to generate https://lightgbm.readthedocs.io/en/latest/Parameters.html?
We always have the option to modify https://github.com/microsoft/LightGBM/blob/master/.ci/parameter-generator.py to generate differently-formatted code to work with these tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include/LightGBM/config.h
This file is ignored in parts (via // clang-format off
and // clang-format on
comments) as comments are re-formatted, breaking the current logic that extracts the documentation. I don't think it's worth making a lot of changes there now, i.e. I think disabling formatting there is reasonable.
src/io/config_auto.cpp
This file would currently be needed to be reformatted by clang-format
. I agree that we can adjust the code generation though so that we don't need to ignore it from formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the explanation! Totally fine with that. I do wish that this didn't involve adding 2 new root-level config files, but I understand that just excluding files in .pre-commit-config.yaml
probably would mean that IDEs would still make clang-format
suggestions about src/io/config_auto.cpp
.
I'm ok with leaving this as-is.
@@ -0,0 +1,4 @@ | |||
BasedOnStyle: Google |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BasedOnStyle: Google | |
# configuration for 'clang-format' | |
# | |
# for details, see https://clang.llvm.org/docs/ClangFormatStyleOptions.html | |
BasedOnStyle: Google |
Can you please add a comment here on how to find these configuration options' meanings and the list of possible options?
static void CreateRandomDenseData(int32_t nrows, int32_t ncols, int32_t nclasses, | ||
std::vector<double>* features, std::vector<float>* labels, | ||
std::vector<float>* weights, std::vector<double>* init_scores, | ||
std::vector<int32_t>* groups); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see lots of changes like this, where one-argument-per-line is being replaces with a bunching of arguments based on the maximum line length.
Could we instead enforce 1-argument-per-line? I find the new form really difficult to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally agree and I will check whether there's an option for that. However, for simplicity, I'd prefer to not deviate from the default style settings too much. Generally, any decision that we make here can be easily changed as clang-format
auto-formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use this thread to continue the discussion in quote comments from #6895 (comment).
I'd prefer to not deviate from the default style settings too much.
As long as the CI passes, I am pretty certain that there can't be an issue
I really strongly disagree with this.
"the code doesn't break" is a minimal requirement for accepting this change, but not sufficient. The code in this repo isn't just used by compilers... it also needs to be read by humans.
I'm supportive of using a tool for auto-formating C++ Code, but not of such a large change to the style of the code without review and well-informed approval.
There are many customizations available via https://clang.llvm.org/docs/ClangFormatStyleOptions.html and I'd like to see a mix of them used to reduce the size of this diff as much as possible. That'd also help with reducing merge conflicts for large PRs like #6182 and #6138.
I think breaking up the PR in the same way as for the introduction of yamllint is neither viable not necessary.
I disagree that it is not "viable". Here's an example of how this could be done:
- PR 1: introduce the tooling + a very small batch of initial code changes
- this PR introduces the tooling, configuration, CI integration, etc.
- it only targets 1 file (maybe a medium-sized one, like https://github.com/microsoft/LightGBM/blob/master/src/treelearner/serial_tree_learner.cpp)
- "only targets 1 file" could be achieved by listing all other files in
.clang-format-ignore
- PR2: add another batch of files
- remove a chunk of files from
.clang-format-ignore
, maybe everything insrc/objective
- remove a chunk of files from
- PR3-
{n}
: add more batches of files- by removing more exclusions from
.clang-format-ignore
- trying to keep the diff to a manageable size for human reviewers
- keep doing these until everything is formatted
- by removing more exclusions from
This would make the task of "modify the configuration to minimize the diff as much as possible" more manageable. And at each PR, the diff would be small enough that we could have informed conversations about how to change configs to match whatever style we want the code to have.
If you're open to this approach generally but don't have the time / willingness to do all of these PRs, then I'll propose this: you do "PR 1" (this PR), and I'll do all the subsequent ones.
cc @StrikerRUS @jmoralez @shiyu1994 for your thoughts as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not comfortable to approve PRs without at least reading all changes in them. So I'm +1 for splitting this huge PR into a smaller reviewable ones.
@@ -22,6 +22,11 @@ repos: | |||
hooks: | |||
- id: yamllint | |||
args: ["--strict"] | |||
- repo: https://github.com/pre-commit/mirrors-clang-format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions about how this will affect the development process.
- does this require installing
clang
separately, or willpre-commit
set that up for you? - does this work on Windows?
Fair enough. However, I think breaking up the PR in the same way as for the introduction of |
Motivation
Every time I touch the C++ code, I have to manually format it. This always feels annoying and it's impossible to ensure style consistency.
To make it simpler for myself and other contributors to write easily readable C++ code, this PR introduces clang-format which is (to the best of my knowledge) the de-facto standard for formatting in the C/C++ ecosystem.
Changes
.clang-format
file with the basic style configuration based on the "Google" style (whichcpplint
also follows)clang-format
to the.pre-commit-config.yaml
cpplint
to not generate false positives after the formattinginclude/LightGBM/config.h
and all ofsrc/io/config_auto.cpp
from formatting (the latter via.clang-format-ignore
) asclang-format
messes up the current parameter documentation otherwisecpplint
fail after formattingOnce this PR is merged, we should probably add the merge commit to
.git-blame-ignore-revs
.In a follow-up PR, we might want to replace
cpplint
withclang-tidy
as a much more powerful alternative.