-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[analyzer] Avoid unnecessary super region invalidation in CStringChecker
#146212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang Author: None (flovent) ChangesBounded string functions takes smallest of two values as it's copy size ( for Currently when one of two values is unknown or This patch add check to see if one of these two values is definitely in-bound, if so Closes #143807. Full diff: https://github.com/llvm/llvm-project/pull/146212.diff 2 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
index 4d12fdcec1f1a..433fd2ce5f292 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -272,7 +272,8 @@ class CStringChecker : public Checker< eval::Call,
static ProgramStateRef
invalidateDestinationBufferBySize(CheckerContext &C, ProgramStateRef S,
const Expr *BufE, ConstCFGElementRef Elem,
- SVal BufV, SVal SizeV, QualType SizeTy);
+ SVal BufV, SVal SizeV, QualType SizeTy,
+ bool CouldAccessOutOfBound = true);
/// Operation never overflows, do not invalidate the super region.
static ProgramStateRef invalidateDestinationBufferNeverOverflows(
@@ -1211,14 +1212,17 @@ bool CStringChecker::isFirstBufInBound(CheckerContext &C, ProgramStateRef State,
ProgramStateRef CStringChecker::invalidateDestinationBufferBySize(
CheckerContext &C, ProgramStateRef S, const Expr *BufE,
- ConstCFGElementRef Elem, SVal BufV, SVal SizeV, QualType SizeTy) {
+ ConstCFGElementRef Elem, SVal BufV, SVal SizeV, QualType SizeTy,
+ bool CouldAccessOutOfBound) {
auto InvalidationTraitOperations =
- [&C, S, BufTy = BufE->getType(), BufV, SizeV,
- SizeTy](RegionAndSymbolInvalidationTraits &ITraits, const MemRegion *R) {
+ [&C, S, BufTy = BufE->getType(), BufV, SizeV, SizeTy,
+ CouldAccessOutOfBound](RegionAndSymbolInvalidationTraits &ITraits,
+ const MemRegion *R) {
// If destination buffer is a field region and access is in bound, do
// not invalidate its super region.
if (MemRegion::FieldRegionKind == R->getKind() &&
- isFirstBufInBound(C, S, BufV, BufTy, SizeV, SizeTy)) {
+ (!CouldAccessOutOfBound ||
+ isFirstBufInBound(C, S, BufV, BufTy, SizeV, SizeTy))) {
ITraits.setTrait(
R,
RegionAndSymbolInvalidationTraits::TK_DoNotInvalidateSuperRegion);
@@ -2223,6 +2227,67 @@ void CStringChecker::evalStrcpyCommon(CheckerContext &C, const CallEvent &Call,
Result = lastElement;
}
+ // For bounded method, amountCopied take the minimum of two values,
+ // for ConcatFnKind::strlcat:
+ // amountCopied = min (size - dstLen - 1 , srcLen)
+ // for others:
+ // amountCopied = min (srcLen, size)
+ // So even if we don't know about amountCopied, as long as one of them will
+ // not cause an out-of-bound access, the whole function's operation will not
+ // too, that will avoid invalidating the superRegion of data member in that
+ // situation.
+ bool CouldAccessOutOfBound = true;
+ if (IsBounded && amountCopied.isUnknown()) {
+ // Get the max number of characters to copy.
+ SizeArgExpr lenExpr = {{Call.getArgExpr(2), 2}};
+ SVal lenVal = state->getSVal(lenExpr.Expression, LCtx);
+
+ // Protect against misdeclared strncpy().
+ lenVal =
+ svalBuilder.evalCast(lenVal, sizeTy, lenExpr.Expression->getType());
+
+ std::optional<NonLoc> lenValNL = lenVal.getAs<NonLoc>();
+
+ auto CouldAccessOutOfBoundForSVal = [&](NonLoc Val) -> bool {
+ return !isFirstBufInBound(C, state, C.getSVal(Dst.Expression),
+ Dst.Expression->getType(), Val,
+ C.getASTContext().getSizeType());
+ };
+
+ if (strLengthNL) {
+ CouldAccessOutOfBound = CouldAccessOutOfBoundForSVal(*strLengthNL);
+ }
+
+ if (CouldAccessOutOfBound && lenValNL) {
+ switch (appendK) {
+ case ConcatFnKind::none:
+ case ConcatFnKind::strcat: {
+ CouldAccessOutOfBound = CouldAccessOutOfBoundForSVal(*lenValNL);
+ break;
+ }
+ case ConcatFnKind::strlcat: {
+ if (!dstStrLengthNL)
+ break;
+
+ SVal freeSpace = svalBuilder.evalBinOpNN(state, BO_Sub, *lenValNL,
+ *dstStrLengthNL, sizeTy);
+ if (!isa<NonLoc>(freeSpace))
+ break;
+
+ freeSpace =
+ svalBuilder.evalBinOp(state, BO_Sub, freeSpace,
+ svalBuilder.makeIntVal(1, sizeTy), sizeTy);
+ std::optional<NonLoc> freeSpaceNL = freeSpace.getAs<NonLoc>();
+ if (!freeSpaceNL)
+ break;
+
+ CouldAccessOutOfBound = CouldAccessOutOfBoundForSVal(*freeSpaceNL);
+ break;
+ }
+ }
+ }
+ }
+
// Invalidate the destination (regular invalidation without pointer-escaping
// the address of the top-level region). This must happen before we set the
// C string length because invalidation will clear the length.
@@ -2232,7 +2297,7 @@ void CStringChecker::evalStrcpyCommon(CheckerContext &C, const CallEvent &Call,
// string, but that's still an improvement over blank invalidation.
state = invalidateDestinationBufferBySize(
C, state, Dst.Expression, Call.getCFGElementRef(), *dstRegVal,
- amountCopied, C.getASTContext().getSizeType());
+ amountCopied, C.getASTContext().getSizeType(), CouldAccessOutOfBound);
// Invalidate the source (const-invalidation without const-pointer-escaping
// the address of the top-level region).
diff --git a/clang/test/Analysis/cstring-should-not-invalidate.cpp b/clang/test/Analysis/cstring-should-not-invalidate.cpp
new file mode 100644
index 0000000000000..14c92447c52ca
--- /dev/null
+++ b/clang/test/Analysis/cstring-should-not-invalidate.cpp
@@ -0,0 +1,107 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection
+// -analyzer-config c++-inlining=constructors -verify %s
+
+// expected-no-diagnostics
+
+typedef unsigned int size_t;
+
+char *strncpy(char *dest, const char *src, size_t x);
+
+// issue 143807
+struct strncpyTestClass {
+ int *m_ptr;
+ char m_buff[1000];
+
+ void KnownLen(char *src) {
+ m_ptr = new int;
+ strncpy(m_buff, src, sizeof(m_buff)); // known len but unknown src size
+ delete m_ptr; // no warning
+ }
+
+ void KnownSrcLen(size_t n) {
+ m_ptr = new int;
+ strncpy(m_buff, "xyz", n); // known src size but unknown len
+ delete m_ptr; // no warning
+ }
+};
+
+void strncpyTest(char *src, size_t n) {
+ strncpyTestClass rep;
+ rep.KnownLen(src);
+ rep.KnownSrcLen(n);
+}
+
+size_t strlcpy(char *dest, const char *src, size_t size);
+
+struct strlcpyTestClass {
+ int *m_ptr;
+ char m_buff[1000];
+
+ void KnownLen(char *src) {
+ m_ptr = new int;
+ strlcpy(m_buff, src, sizeof(m_buff)); // known len but unknown src size
+ delete m_ptr; // no warning
+ }
+
+ void KnownSrcLen(size_t n) {
+ m_ptr = new int;
+ strlcpy(m_buff, "xyz", n); // known src size but unknown len
+ delete m_ptr; // no warning
+ }
+};
+
+void strlcpyTest(char *src, size_t n) {
+ strlcpyTestClass rep;
+ rep.KnownLen(src);
+ rep.KnownSrcLen(n);
+}
+
+char *strncat(char *s1, const char *s2, size_t n);
+
+struct strncatTestClass {
+ int *m_ptr;
+ char m_buff[1000];
+
+ void KnownLen(char *src) {
+ m_ptr = new int;
+ strncat(m_buff, src, sizeof(m_buff)); // known len but unknown src size
+ delete m_ptr; // no warning
+ }
+
+ void KnownSrcLen(size_t n) {
+ m_ptr = new int;
+ strncat(m_buff, "xyz", n); // known src size but unknown len
+ delete m_ptr; // no warning
+ }
+};
+
+void strncatTest(char *src, size_t n) {
+ strncatTestClass rep;
+ rep.KnownLen(src);
+ rep.KnownSrcLen(n);
+}
+
+size_t strlcat(char *dst, const char *src, size_t size);
+
+struct strlcatTestClass {
+ int *m_ptr;
+ char m_buff[1000];
+
+ void KnownLen(char *src) {
+ m_ptr = new int;
+ strlcat(m_buff, src, sizeof(m_buff)); // known len but unknown src size
+ delete m_ptr; // no warning
+ }
+
+ void KnownSrcLen(size_t n) {
+ m_ptr = new int;
+ strlcat(m_buff, "xyz", n); // known src size but unknown len
+ delete m_ptr; // no warning
+ }
+};
+
+void strlcatTest(char *src, size_t n) {
+ strlcatTestClass rep;
+ rep.KnownLen(src);
+ rep.KnownSrcLen(n);
+}
|
@llvm/pr-subscribers-clang-static-analyzer-1 Author: None (flovent) ChangesBounded string functions takes smallest of two values as it's copy size ( for Currently when one of two values is unknown or This patch add check to see if one of these two values is definitely in-bound, if so Closes #143807. Full diff: https://github.com/llvm/llvm-project/pull/146212.diff 2 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
index 4d12fdcec1f1a..433fd2ce5f292 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -272,7 +272,8 @@ class CStringChecker : public Checker< eval::Call,
static ProgramStateRef
invalidateDestinationBufferBySize(CheckerContext &C, ProgramStateRef S,
const Expr *BufE, ConstCFGElementRef Elem,
- SVal BufV, SVal SizeV, QualType SizeTy);
+ SVal BufV, SVal SizeV, QualType SizeTy,
+ bool CouldAccessOutOfBound = true);
/// Operation never overflows, do not invalidate the super region.
static ProgramStateRef invalidateDestinationBufferNeverOverflows(
@@ -1211,14 +1212,17 @@ bool CStringChecker::isFirstBufInBound(CheckerContext &C, ProgramStateRef State,
ProgramStateRef CStringChecker::invalidateDestinationBufferBySize(
CheckerContext &C, ProgramStateRef S, const Expr *BufE,
- ConstCFGElementRef Elem, SVal BufV, SVal SizeV, QualType SizeTy) {
+ ConstCFGElementRef Elem, SVal BufV, SVal SizeV, QualType SizeTy,
+ bool CouldAccessOutOfBound) {
auto InvalidationTraitOperations =
- [&C, S, BufTy = BufE->getType(), BufV, SizeV,
- SizeTy](RegionAndSymbolInvalidationTraits &ITraits, const MemRegion *R) {
+ [&C, S, BufTy = BufE->getType(), BufV, SizeV, SizeTy,
+ CouldAccessOutOfBound](RegionAndSymbolInvalidationTraits &ITraits,
+ const MemRegion *R) {
// If destination buffer is a field region and access is in bound, do
// not invalidate its super region.
if (MemRegion::FieldRegionKind == R->getKind() &&
- isFirstBufInBound(C, S, BufV, BufTy, SizeV, SizeTy)) {
+ (!CouldAccessOutOfBound ||
+ isFirstBufInBound(C, S, BufV, BufTy, SizeV, SizeTy))) {
ITraits.setTrait(
R,
RegionAndSymbolInvalidationTraits::TK_DoNotInvalidateSuperRegion);
@@ -2223,6 +2227,67 @@ void CStringChecker::evalStrcpyCommon(CheckerContext &C, const CallEvent &Call,
Result = lastElement;
}
+ // For bounded method, amountCopied take the minimum of two values,
+ // for ConcatFnKind::strlcat:
+ // amountCopied = min (size - dstLen - 1 , srcLen)
+ // for others:
+ // amountCopied = min (srcLen, size)
+ // So even if we don't know about amountCopied, as long as one of them will
+ // not cause an out-of-bound access, the whole function's operation will not
+ // too, that will avoid invalidating the superRegion of data member in that
+ // situation.
+ bool CouldAccessOutOfBound = true;
+ if (IsBounded && amountCopied.isUnknown()) {
+ // Get the max number of characters to copy.
+ SizeArgExpr lenExpr = {{Call.getArgExpr(2), 2}};
+ SVal lenVal = state->getSVal(lenExpr.Expression, LCtx);
+
+ // Protect against misdeclared strncpy().
+ lenVal =
+ svalBuilder.evalCast(lenVal, sizeTy, lenExpr.Expression->getType());
+
+ std::optional<NonLoc> lenValNL = lenVal.getAs<NonLoc>();
+
+ auto CouldAccessOutOfBoundForSVal = [&](NonLoc Val) -> bool {
+ return !isFirstBufInBound(C, state, C.getSVal(Dst.Expression),
+ Dst.Expression->getType(), Val,
+ C.getASTContext().getSizeType());
+ };
+
+ if (strLengthNL) {
+ CouldAccessOutOfBound = CouldAccessOutOfBoundForSVal(*strLengthNL);
+ }
+
+ if (CouldAccessOutOfBound && lenValNL) {
+ switch (appendK) {
+ case ConcatFnKind::none:
+ case ConcatFnKind::strcat: {
+ CouldAccessOutOfBound = CouldAccessOutOfBoundForSVal(*lenValNL);
+ break;
+ }
+ case ConcatFnKind::strlcat: {
+ if (!dstStrLengthNL)
+ break;
+
+ SVal freeSpace = svalBuilder.evalBinOpNN(state, BO_Sub, *lenValNL,
+ *dstStrLengthNL, sizeTy);
+ if (!isa<NonLoc>(freeSpace))
+ break;
+
+ freeSpace =
+ svalBuilder.evalBinOp(state, BO_Sub, freeSpace,
+ svalBuilder.makeIntVal(1, sizeTy), sizeTy);
+ std::optional<NonLoc> freeSpaceNL = freeSpace.getAs<NonLoc>();
+ if (!freeSpaceNL)
+ break;
+
+ CouldAccessOutOfBound = CouldAccessOutOfBoundForSVal(*freeSpaceNL);
+ break;
+ }
+ }
+ }
+ }
+
// Invalidate the destination (regular invalidation without pointer-escaping
// the address of the top-level region). This must happen before we set the
// C string length because invalidation will clear the length.
@@ -2232,7 +2297,7 @@ void CStringChecker::evalStrcpyCommon(CheckerContext &C, const CallEvent &Call,
// string, but that's still an improvement over blank invalidation.
state = invalidateDestinationBufferBySize(
C, state, Dst.Expression, Call.getCFGElementRef(), *dstRegVal,
- amountCopied, C.getASTContext().getSizeType());
+ amountCopied, C.getASTContext().getSizeType(), CouldAccessOutOfBound);
// Invalidate the source (const-invalidation without const-pointer-escaping
// the address of the top-level region).
diff --git a/clang/test/Analysis/cstring-should-not-invalidate.cpp b/clang/test/Analysis/cstring-should-not-invalidate.cpp
new file mode 100644
index 0000000000000..14c92447c52ca
--- /dev/null
+++ b/clang/test/Analysis/cstring-should-not-invalidate.cpp
@@ -0,0 +1,107 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection
+// -analyzer-config c++-inlining=constructors -verify %s
+
+// expected-no-diagnostics
+
+typedef unsigned int size_t;
+
+char *strncpy(char *dest, const char *src, size_t x);
+
+// issue 143807
+struct strncpyTestClass {
+ int *m_ptr;
+ char m_buff[1000];
+
+ void KnownLen(char *src) {
+ m_ptr = new int;
+ strncpy(m_buff, src, sizeof(m_buff)); // known len but unknown src size
+ delete m_ptr; // no warning
+ }
+
+ void KnownSrcLen(size_t n) {
+ m_ptr = new int;
+ strncpy(m_buff, "xyz", n); // known src size but unknown len
+ delete m_ptr; // no warning
+ }
+};
+
+void strncpyTest(char *src, size_t n) {
+ strncpyTestClass rep;
+ rep.KnownLen(src);
+ rep.KnownSrcLen(n);
+}
+
+size_t strlcpy(char *dest, const char *src, size_t size);
+
+struct strlcpyTestClass {
+ int *m_ptr;
+ char m_buff[1000];
+
+ void KnownLen(char *src) {
+ m_ptr = new int;
+ strlcpy(m_buff, src, sizeof(m_buff)); // known len but unknown src size
+ delete m_ptr; // no warning
+ }
+
+ void KnownSrcLen(size_t n) {
+ m_ptr = new int;
+ strlcpy(m_buff, "xyz", n); // known src size but unknown len
+ delete m_ptr; // no warning
+ }
+};
+
+void strlcpyTest(char *src, size_t n) {
+ strlcpyTestClass rep;
+ rep.KnownLen(src);
+ rep.KnownSrcLen(n);
+}
+
+char *strncat(char *s1, const char *s2, size_t n);
+
+struct strncatTestClass {
+ int *m_ptr;
+ char m_buff[1000];
+
+ void KnownLen(char *src) {
+ m_ptr = new int;
+ strncat(m_buff, src, sizeof(m_buff)); // known len but unknown src size
+ delete m_ptr; // no warning
+ }
+
+ void KnownSrcLen(size_t n) {
+ m_ptr = new int;
+ strncat(m_buff, "xyz", n); // known src size but unknown len
+ delete m_ptr; // no warning
+ }
+};
+
+void strncatTest(char *src, size_t n) {
+ strncatTestClass rep;
+ rep.KnownLen(src);
+ rep.KnownSrcLen(n);
+}
+
+size_t strlcat(char *dst, const char *src, size_t size);
+
+struct strlcatTestClass {
+ int *m_ptr;
+ char m_buff[1000];
+
+ void KnownLen(char *src) {
+ m_ptr = new int;
+ strlcat(m_buff, src, sizeof(m_buff)); // known len but unknown src size
+ delete m_ptr; // no warning
+ }
+
+ void KnownSrcLen(size_t n) {
+ m_ptr = new int;
+ strlcat(m_buff, "xyz", n); // known src size but unknown len
+ delete m_ptr; // no warning
+ }
+};
+
+void strlcatTest(char *src, size_t n) {
+ strlcatTestClass rep;
+ rep.KnownLen(src);
+ rep.KnownSrcLen(n);
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed PR summary. It makes sense. I had to think about it carefully but I agree with the motivation.
When I looked at the code it looked really complicated. Probably more than I expected it to be.
I think adding another bool parameter to the already crowded function is not ideal. We should look for some other way, maybe a different overload, or restructuring the code in other ways.
I just find out there are three member functions whose name start with |
CI failures seem unlreated to the new change in this PR. |
I resign from review. I don't have time to review 100+ lines PRs right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like this change, but I added a few suggestions in inline comments.
Also please follow the Coding Standard and use UpperCamelCase
variables for variables (including ones that store lambda functions), while lowerCamelCase
is for methods. Of course, it is also right to follow the "local customs" of older code fragments, but I'd suggest gradually transitioning to the global standard by following the coding standard for the freshly added variables and perhaps transitioning the variables which are mostly used in the code that you edit.
I totally agree using new standard in new code, i think i didn't notice that because i basiclly copy the orginal code to here because of the same logic and clangd doesn't give warning about clang-tidy's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
I added a few minor comments, but I hope that after those the PR could be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the CI Checks, one of your testcases is failing:
https://github.com/llvm/llvm-project/actions/runs/16055061425?pr=146212
error: 'expected-warning' diagnostics expected but not seen:
File /home/gha/actions-runner/_work/llvm-project/llvm-project/clang/test/Analysis/cstring-should-not-invalidate.cpp Line 106: TRUE
1 error generated.
Oh, sorry about that. It seems an error node will be generated when emitting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the updates!
Before merging the commit consider adding a note to the PR description / commit message that in the case of strlcat
you use size
as an approximation of the exact amountCopied
.
@NagyDonat, I add a note in the end of PR description and I don't have a merge access, if that looks good please help me merge it, thank you! |
Bounded string functions takes smallest of two values as it's copy size (
amountCopied
variable inevalStrcpyCommon
), and it's used to decided whether this operation will cause out-of-bound access and invalidate it's super region if it does.for
strlcat
:amountCopied = min (size - dstLen - 1 , srcLen)
for others:
amountCopied = min (srcLen, size)
Currently when one of two values is unknown or
SValBuilder
can't decide which one is smaller,amountCopied
will remainUnknownVal
, which will invalidate copy destination's super region unconditionally.This patch add check to see if one of these two values is definitely in-bound, if so
amountCopied
has to be in-bound too, because it‘s less than or equal to them, we can avoid the invalidation of super region and some related false positives in this situation.Note: This patch uses
size
as an approximation ofsize - dstLen - 1
instrlcat
case because currently analyzer doesn't handle complex expressions like this very well.Closes #143807.