-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[clang-format] Handle Trailing Whitespace After Line Continuation (P2223R2) #145243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang-format] Handle Trailing Whitespace After Line Continuation (P2223R2) #145243
Conversation
…223R2) Fixes llvm#145226. Implement P2223R2 in clang-format to correctly handle cases where a backslash '\' is followed by trailing whitespace before the newline. Previously, clang-format failed to properly detect and handle such cases, leading to misformatted code. With this, clang-format matches the behavior already implemented in Clang's lexer and DependencyDirectivesScanner.cpp, which allow trailing whitespace after a line continuation in any C++ standard.
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-format Author: Naveen Seth Hanig (naveen-seth) ChangesFixes #145226. Implement P2223R2 in clang-format to correctly handle cases where a backslash '\' is followed by trailing whitespace before the newline. With this, Full diff: https://github.com/llvm/llvm-project/pull/145243.diff 3 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 96477ef6ddc9a..7c8e231655e86 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -1025,6 +1025,9 @@ clang-format
``enum`` enumerator lists.
- Add ``OneLineFormatOffRegex`` option for turning formatting off for one line.
- Add ``SpaceAfterOperatorKeyword`` option.
+- Support trailing whitespace in line splicing.
+ (P2223R2 <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2223r2.pdf>_, #GH145226)
+
clang-refactor
--------------
diff --git a/clang/lib/Format/FormatTokenLexer.cpp b/clang/lib/Format/FormatTokenLexer.cpp
index 4cc4f5f22db0d..bcc8e6ffd91ab 100644
--- a/clang/lib/Format/FormatTokenLexer.cpp
+++ b/clang/lib/Format/FormatTokenLexer.cpp
@@ -14,6 +14,7 @@
#include "FormatTokenLexer.h"
#include "FormatToken.h"
+#include "clang/Basic/CharInfo.h"
#include "clang/Basic/SourceLocation.h"
#include "clang/Basic/SourceManager.h"
#include "clang/Format/Format.h"
@@ -1205,14 +1206,23 @@ static size_t countLeadingWhitespace(StringRef Text) {
while (Cur < End) {
if (isspace(Cur[0])) {
++Cur;
- } else if (Cur[0] == '\\' && (Cur[1] == '\n' || Cur[1] == '\r')) {
- // A '\' followed by a newline always escapes the newline, regardless
- // of whether there is another '\' before it.
+ } else if (Cur[0] == '\\') {
+ // A '\' followed by a optional horizontal whitespace (P22232R2) and then
+ // newline always escapes the newline, regardless of whether there is
+ // another '\' before it.
// The source has a null byte at the end. So the end of the entire input
// isn't reached yet. Also the lexer doesn't break apart an escaped
// newline.
- assert(End - Cur >= 2);
- Cur += 2;
+ const unsigned char *Lookahead = Cur + 1;
+ while (isHorizontalWhitespace(*Lookahead))
+ ++Lookahead;
+ if (*Lookahead == '\n' || *Lookahead == '\r') {
+ // Splice found, consume it.
+ Cur = Lookahead + 1;
+ continue;
+ }
+ // No line splice found; the '\' is a token.
+ break;
} else if (Cur[0] == '?' && Cur[1] == '?' && Cur[2] == '/' &&
(Cur[3] == '\n' || Cur[3] == '\r')) {
// Newlines can also be escaped by a '?' '?' '/' trigraph. By the way, the
@@ -1295,13 +1305,18 @@ FormatToken *FormatTokenLexer::getNextToken() {
case '/':
// The text was entirely whitespace when this loop was entered. Thus
// this has to be an escape sequence.
- assert(Text.substr(i, 2) == "\\\r" || Text.substr(i, 2) == "\\\n" ||
- Text.substr(i, 4) == "\?\?/\r" ||
+ assert(Text.substr(i, 4) == "\?\?/\r" ||
Text.substr(i, 4) == "\?\?/\n" ||
(i >= 1 && (Text.substr(i - 1, 4) == "\?\?/\r" ||
Text.substr(i - 1, 4) == "\?\?/\n")) ||
(i >= 2 && (Text.substr(i - 2, 4) == "\?\?/\r" ||
- Text.substr(i - 2, 4) == "\?\?/\n")));
+ Text.substr(i - 2, 4) == "\?\?/\n")) ||
+ (Text[i] == '\\' && [&]() -> bool {
+ size_t j = i + 1;
+ while (j < Text.size() && isHorizontalWhitespace(Text[j]))
+ ++j;
+ return j < Text.size() && (Text[j] == '\n' || Text[j] == '\r');
+ }()));
InEscape = true;
break;
default:
diff --git a/clang/test/Format/splice-trailing-whitespace-p2223r2.cpp b/clang/test/Format/splice-trailing-whitespace-p2223r2.cpp
new file mode 100644
index 0000000000000..3cc08cc631965
--- /dev/null
+++ b/clang/test/Format/splice-trailing-whitespace-p2223r2.cpp
@@ -0,0 +1,14 @@
+// RUN: grep -Ev "// *[A-Z-]+:" %s \
+// RUN: | clang-format -style='{BasedOnStyle: LLVM, AlignEscapedNewlines: DontAlign}' \
+// RUN: | FileCheck -strict-whitespace %s
+
+// CHECK: {{^#define TAG\(\.\.\.\) \\}}
+// CHECK: {{^ struct a \{\};}}
+// There is whitespace following v this backslash!
+#define TAG(...) struct a { \
+ };
+
+// CHECK: {{^int i;}}
+// The comment below eats its following line because of the line splice.
+// I also have trailing whitespace. Nom nom nom \
+int i;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please wait for @owenca .
Text.substr(i, 4) == "\?\?/\r" || | ||
assert(Text.substr(i, 4) == "\?\?/\r" || | ||
Text.substr(i, 4) == "\?\?/\n" || | ||
(i >= 1 && (Text.substr(i - 1, 4) == "\?\?/\r" || | ||
Text.substr(i - 1, 4) == "\?\?/\n")) || | ||
(i >= 2 && (Text.substr(i - 2, 4) == "\?\?/\r" || | ||
Text.substr(i - 2, 4) == "\?\?/\n"))); | ||
Text.substr(i - 2, 4) == "\?\?/\n")) || | ||
(Text[i] == '\\' && [&]() -> bool { | ||
size_t j = i + 1; | ||
while (j < Text.size() && isHorizontalWhitespace(Text[j])) | ||
++j; | ||
return j < Text.size() && (Text[j] == '\n' || Text[j] == '\r'); | ||
}())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd delete the assertion. WDYT @HazardyKnusperkeks ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is, why was it put there. But I have a hard time understanding it, how can Text.substr(i, 4)
be equal to ??/\r
, when Text[i]
is /
?
I think an assert is fine, but it needs to be reformulated, and maybe add a comment about trigraph. I at least keep forgetting, that these things exist(ed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was it put there? Before my patch, the code did not handle trigraphs. It looked at the character following the backslash. My patch added support for trigraphs. To continue checking for the character following the trigraph meant I had to add more code. I realized that I did not have to add code for checking past the first character because the entire block only ran when there was whitespace. So I removed the code for checking past the first character. Then @owenca asked why it was unnecessary to check further than the first character. I thought that if someone had to ask then that meant the reasoning was not obvious. So I added an assertion. I thought it could help developers understand the code.
Why can Text.substr(i, 4)
be equal to ??/\r
, when Text[i]
is /
? If you click on the arrow to see more lines around the changed code, you will see that the block also runs when the character is \
or ?
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did look around, but did not consider this while reading the assert...
Then the assert should stay, but maybe add a comment, as I said trigraphs are not on everyones mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want him to add a comment in this pull request, or do you want me to add a comment separately?
I can expand the comment on line 1296 like this.
The code preceding the loop and in the countLeadingWhitespace
function guarantees that Text
is entirely whitespace, not including comments but including escaped newlines which may be escaped with a trigraph. So if 1 of these characters show up, then it has to be in an escape sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it’s best if you author the comment, but I (or you) can add your suggested comment to this PR and credit you as a co-author (@sstwcw). Otherwise, I’ll merge it as is.
If you’d like me to add it in this PR, do I need to re-request a review afterward? @HazardyKnusperkeks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done separately.
…223R2) (llvm#145243) Fixes llvm#145226. Implement [P2223R2](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2223r2.pdf) in clang-format to correctly handle cases where a backslash '\\' is followed by trailing whitespace before the newline. Previously, `clang-format` failed to properly detect and handle such cases, leading to misformatted code. With this, `clang-format` matches the behavior already implemented in Clang's lexer and `DependencyDirectivesScanner.cpp`, which allow trailing whitespace after a line continuation in any C++ standard.
…223R2) (llvm#145243) Fixes llvm#145226. Implement [P2223R2](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2223r2.pdf) in clang-format to correctly handle cases where a backslash '\\' is followed by trailing whitespace before the newline. Previously, `clang-format` failed to properly detect and handle such cases, leading to misformatted code. With this, `clang-format` matches the behavior already implemented in Clang's lexer and `DependencyDirectivesScanner.cpp`, which allow trailing whitespace after a line continuation in any C++ standard.
Fixes #145226.
Implement P2223R2 in clang-format to correctly handle cases where a backslash '\' is followed by trailing whitespace before the newline.
Previously,
clang-format
failed to properly detect and handle such cases, leading to misformatted code.With this,
clang-format
matches the behavior already implemented in Clang's lexer andDependencyDirectivesScanner.cpp
, which allow trailing whitespace after a line continuation in any C++ standard.