[CPP-435] Calls to `memset` and `ZeroMemory` may be deleted by the compiler #1933

zlaski-semmle · 2019-09-13T23:02:21Z

No description provided.

geoffw0

Results on 75 projects here: https://lgtm.com/query/2089939943018653184/

I think we're seeing a lot of false positives on this [early] version of the query:

(1) memset(&variable, ..., ...), where &variable doesn't flow to anywhere but nevertheless variable continues to be used.

(2) similarly memset(buf+offset, ..., ...) where there's no flow from buf+offset but buf continues to be used.

(3) memset to a pointer which, though it isn't used again, points to the same buffer as another pointer that is. i.e.

x = y;
memset(x, 0, sizeof(*x));
... use y

In all of these cases I don't think an optimizing compiler can remove the calls to memset.

cpp/ql/src/Likely Bugs/Memory Management/MemsetMayBeDeleted.ql

geoffw0 · 2019-09-16T13:08:12Z

cpp/ql/src/Likely Bugs/Memory Management/MemsetMayBeDeleted.ql

+ *              by the compiler if said buffer is not subsequently used.  This is not desirable
+ *              behavior if the buffer contains sensitive data that could be exploited by an attacker.
+ *              The workaround is to use `memset_s` or `SecureZeroMemory`, use the `-fno-builtin-memset` compiler flag, or
+ *              to write one's own buffer-clearing routine.  See also


or to write one's own buffer-clearing routine

There's a theoretical risk that a sufficiently smart optimizing compiler might optimize out a call even to a user-defined buffer clearing routine. Can anybody comment about whether this is a real concern with current compilers? Should we be recommending this approach?

See https://jira.semmle.com/browse/CPP-435. We should be recommending the buffer-clearing approaches in https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/yang. @zlaski-semmle I recommend reading that paper. I think the recommendations in the CWE-14 entry are simplistic and not very helpful.

The paper summarizes several scrubbing approaches: separate compilation or weak linking, volatile function or data accesses, memory barriers, assembly-language implementation, disabled use of __builtin_memset. But the technique of choice is to use the secure scrubbing functions provided by each platform.

The most sensible strategy (in my opinion) is the one adopted by Tor: use SecureZeroMemory (Win) if available, then RtlSecureZeroMemory (Win) if available, then BSD’s explicit_bzero if available, then C11's memset_s if available, and then fall back on assembly language implementations and then finally the volatile function pointer technique.

But what do we tell our users? To read the paper? I still think we should point them to the platform-supplied functions, and then perhaps refer them to secure_memzero implemented in https://compsec.sysnet.ucsd.edu/secure_memzero.h.

First, we shouldn't write anything about mitigation in @description. The @description should be rewritten to follow https://github.com/Semmle/ql/blob/master/docs/query-metadata-style-guide.md#query-descriptions-description. That means we don't need to boil our mitigation advice down to a single sentence.

I suggest we tell users to prefer using a platform-specific library function if one is available. We can list them by name, but I don't see a reason we should recommend some of them over others. If they are on a platform without such a library function, we can recommend using https://compsec.sysnet.ucsd.edu/secure_memzero.h. If they want to write their own function, we can refer them to Section 3 of the paper. I don't think we should take it upon us to explain the volatile-based techniques directly. There are too many caveats, and it's too easy to get it wrong.

geoffw0 · 2019-09-16T13:11:22Z

cpp/ql/src/Likely Bugs/Memory Management/MemsetMayBeDeleted.ql

+
+from FunctionCall memset
+where
+  memset.getTarget().getName() = "memset" or memset.getTarget().getName() = "ZeroMemory" and


Or wmemset. But in the long run, I'd like to add something to the models library to cover this.

Don't forget bzero.

geoffw0 · 2019-09-16T13:13:50Z

cpp/ql/src/Likely Bugs/Memory Management/MemsetMayBeDeleted.qhelp

+
+
+<li>MITRE
+<a href="https://cwe.mitre.org/data/definitions/14.html">CWE-14</a>.</li>


I think the CWE reference will be added automatically as the query has a cwe tag. Check what similar queries do.

I just generated the markdown from the .qlhelp, and CWE appears only once (the one I added). Or is there a different toolchain I should be using?

jbj · 2019-09-16T17:08:50Z

Overall, I don't think this query should use data flow or taint tracking as it will give results that are hard to explain. It should instead mimic what a reasonable compiler does.

zlaski-semmle · 2019-09-18T20:06:58Z

Overall, I don't think this query should use data flow or taint tracking as it will give results that are hard to explain. It should instead mimic what a reasonable compiler does.

Let's discuss it during our next meeting. I'm not sure how to make QL act as a "reasonable compiler", nor do I understand why taint tracking is not a proper solution.

jbj · 2019-09-23T09:54:09Z

cpp/ql/src/Likely Bugs/Memory Management/MemsetMayBeDeleted.ql

@@ -17,12 +13,24 @@

 import semmle.code.cpp.dataflow.TaintTracking


I think we want data flow here, not taint tracking. Taint tracking extends data flow with some heuristic rules about how data content may influence other data content, but the correctness of this query does not need to depend on such heuristics.

So I just ran some tests. With DataFlow::localFlow, torvalds/linux produces 599 alerts. With TaintTracking::localTaint, that number goes down to 510 alerts. Intuitively, this makes sense, as modifying the first argument to memset taints more subsequent statements/expressions, and hence more alerts are suppressed.

But what puzzles me is that we are still left with numerous false positives, way too many for this query to be usable. I've been extracting C/C++ code triggering the alert, but then haven't been able to reproduce the alert.

jbj · 2019-09-23T10:14:15Z

cpp/ql/src/Likely Bugs/Memory Management/MemsetMayBeDeleted.ql

+  ) and
+  not exists(Parameter parm |
+    TaintTracking::localTaint(DataFlow::parameterNode(parm),
+      DataFlow::exprNode(arg))


This checking of flow from Parameter is just one case out of the many ways that arg can be aliased. It could be a whack-a-mole game to enumerate them all. I suggest that we make the query the other way around: give an alert only if there's provably no way that the memory could be read after being cleared. That means, to begin with, that we only support stack-allocated arrays.

The easy way to check whether a variable may escape is to use https://github.com/Semmle/ql/blob/master/cpp/ql/src/semmle/code/cpp/dataflow/EscapesTree.qll#L254. Unfortunately it's very conservative, so it would be better to use the IR.

I suggest that we make the query the other way around: give an alert only if there's provably no way that the memory could be read after being cleared.

How would one obtain such a proof? I'm in the dark here.

The easy way to check whether a variable may escape is to use https://github.com/Semmle/ql/blob/master/cpp/ql/src/semmle/code/cpp/dataflow/EscapesTree.qll#L254.

I'm still wrapping my head around this one. So does the VariableAccess parameter correspond to the pointer/array variable that is the first argument to memset? What about the Expr parameter? The description in the sources is underwhelming. I'm looking at ReturnStackAllocatedMemory.ql and it seems that the second parameter could correspond to the value of the return statement, but not always.

Unfortunately it's very conservative, so it would be better to use the IR.

So perhaps it's time for me to start learning about the IR.

zlaski-semmle · 2019-09-24T03:05:44Z

So I've played with TaintTracking::localTaint tracking some more, and discovered some inconsistencies. For example, in

	__builtin_memset(&pw1a[3], 0, PW_SIZE); // GOOD
	return pw1a[4];

the taint from the memset is correctly propagated to the return, whereas in

	__builtin_memset(pw1a + 3, 0, PW_SIZE); // GOOD [FALSE POSITIVE]
	return pw1a[4];

the taint is not propagated, leading to a spurious "memset may be deleted" alert.

zlaski-semmle · 2019-09-24T03:12:34Z

Results on 75 projects here: https://lgtm.com/query/2089939943018653184/

I think we're seeing a lot of false positives on this [early] version of the query:

Indeed, we still are, even after the version bump.

jbj · 2019-09-24T06:30:51Z

I've put this on the agenda for today's team meeting so we can discuss the big picture before diving further into any of these details.

zlaski-semmle · 2019-09-25T01:28:11Z

I have committed an initial version of the Memset.qll model. (I've also committed some other changes but they are not exciting.) I based my work on the existing Memcpy.qll model, but obviously could have gone wrong somewhere. For the time being, I did not add Memset.qll to the import list in Models.qll, since I don't know what that would entail.

jbj

Please move the memset modelling to a separate PR. It can be merged independently and likely much sooner than the full query. Otherwise this PR will end up with 100+ comments on it and will become impossible to follow.

jbj · 2019-09-25T07:09:26Z

cpp/ql/src/semmle/code/cpp/models/implementations/Memset.qll

+    (
+      output.isOutParameterPointer(0) or
+      output.isOutReturnPointer()
+    )


I don't see the value in any of the four flow combinations defined by this predicate. I can't think of a practical query where we'd want such flow.

So what do you think is the correct taint model here? In my way of thinking, both the initializer value and the length affect the output memory buffer.

That's technically true, but traditionally we haven't had much luck with taint through low-bandwidth channels like strlen. It has been the source of many false positives and was recently disabled from the security.TaintTracking library.

I suggest that we don't give a taint model for memset at all unless we have an example of a query and a snapshot where it would be beneficial.

jbj · 2019-09-25T07:10:01Z

cpp/ql/src/semmle/code/cpp/models/implementations/Memset.qll

+    (
+      this.hasName("memset") or
+      this.hasName("__builtin_memset") or
+      this.hasName("FillMemory")


It looks like FillMemory is a macro.

Apparently ZeroMemory and RtlZeroMemory are macros too, but bzero (used on BSD and Mac) is a real function.

Good catch. They both resolve to... memset.

jbj · 2019-09-25T07:15:20Z

cpp/ql/src/semmle/code/cpp/models/implementations/Memset.qll

+    this instanceof TopLevelFunction and
+    (
+      this.hasName("memset") or
+      this.hasName("__builtin_memset") or


The modern way to match these function names is to use hasGlobalName and the multi-argument hasQualifiedName predicates. The even more modern way (#1585) is not merged yet, unfortunately, so you'll have to add hasQualifiedName("std", "memset") to make sure you also match std::memset.

Done in #2027.

zlaski-semmle · 2019-10-02T18:47:43Z

To continue working on this, I would like the following to be merged in first:
#2027.

zlaski-semmle · 2019-10-15T02:17:26Z

I've committed my first stab at an IR-based query. Presently it is quite simple, checking for the presence of a LoadInstruction dominated by the MemsetCallInstruction. This is quite primitive, but amazingly the test results are not that bad.

I've tried tightening the restrictions on the LoadInstruction, e.g., by attempting to match a call argument to the MemsetCallFunction with the address operand of the LoadInstruction, but that proved to be overly restrictive -- (presumably) no LoadInstruction satisfying these criteria could be found, and so the query would flag every single memset call.

cpp/ql/test/query-tests/Likely Bugs/Memory Management/MemsetMayBeDeleted/MemsetMayBeDeleted.cpp

zlaski-semmle · 2019-10-16T21:40:51Z

Thanks for the new test cases, @geoffw0 ! They will be extremely helpful.

… as expected, but there are still false positives in, e.g., the Linux kernel.

It belongs in [zlaski/pointer-overflow-check] branch. This reverts commit 9d6e8a5.

jbj · 2019-10-29T09:24:08Z

@zlaski-semmle I think the Jira ticket you're looking for is CPP-438: Query for pointer address wrapping.

zlaski · 2019-10-29T18:13:03Z

@zlaski-semmle I think the Jira ticket you're looking for is CPP-438: Query for pointer address wrapping.

Yes, indeed. At any rate, I moved the bits to zlaski/pointer-overflow-check.

zlaski · 2019-11-08T22:49:27Z

@dbartol This PR may contain bits suitable for yours (#2207).

geoffw0 reviewed Sep 16, 2019

View reviewed changes

geoffw0 added the C++ label Sep 16, 2019

jbj reviewed Sep 23, 2019

View reviewed changes

jbj reviewed Sep 25, 2019

View reviewed changes

jbj mentioned this pull request Sep 26, 2019

[zlaski/memset-model] QL model for memset and friends #2027

Merged

zlaski-semmle force-pushed the zlaski/cpp435 branch from f921979 to 81add37 Compare October 14, 2019 16:53

geoffw0 reviewed Oct 16, 2019

View reviewed changes

cpp/ql/test/query-tests/Likely Bugs/Memory Management/MemsetMayBeDeleted/MemsetMayBeDeleted.cpp Show resolved Hide resolved

zlaski-semmle force-pushed the zlaski/cpp435 branch from ff62d60 to 304f869 Compare October 16, 2019 21:47

zlaski-semmle force-pushed the zlaski/cpp435 branch from 304f869 to 078bb9a Compare October 24, 2019 01:17

jbj mentioned this pull request Oct 28, 2019

Insecure MemSet #2207

Draft

zlaski-semmle added 11 commits October 28, 2019 12:13

[CPP-435] Initial version of query.

da5671e

[CPP-435] Slight refactor of MemsetMayBeDeleted.ql.

3dcf0a8

[CPP-435] Further tweaks to MemsetMayBeDeleted.{qhelp,ql,c,qlref}.

849f01a

[CPP-435] Add source snippets for .qhelp.

29deb07

[CPP-435] Next version of query + test cases. The test cases function…

60042ec

… as expected, but there are still false positives in, e.g., the Linux kernel.

[CPP-435] Enhancements to test cases, help doc.

2c1c129

[CPP-435] Slight tweak in description.

3135bd8

[CPP-435] A few more test cases, for use with IR analysis.

0f80626

[CPP-435] Incremental change to MemsetMayBeDeleted.ql. Not yet usable.

0deeace

[CPP-435] Initial IR-based version of query.

905f113

[CPP-435] Re-worked IR query. Not yet functional.

7e574cf

zlaski-semmle added 4 commits October 28, 2019 12:13

[CPP-435] Incremental commit; waiting for github#2149.

1bacacd

[CPP-435] Incremental improvement of query.

b749f4b

[CPP-435] A much-improved IR query, still some false negatives.

8aee70d

[CPP-435] Replace said with that and the.

c6e18fe

zlaski-semmle force-pushed the zlaski/cpp435 branch from 078bb9a to c6e18fe Compare October 28, 2019 19:13

zlaski-semmle added 2 commits October 28, 2019 18:58

[CPP-435] Initial version.

9d6e8a5

Revert "[CPP-435] Initial version."

165db5b

It belongs in [zlaski/pointer-overflow-check] branch. This reverts commit 9d6e8a5.

jbj assigned dbartol Nov 11, 2019

kamarcum unassigned dbartol Apr 28, 2020

adityasharad changed the base branch from master to main August 14, 2020 18:35



		<li>MITRE
		<a href="https://cwe.mitre.org/data/definitions/14.html">CWE-14</a>.</li>

		@@ -17,12 +13,24 @@

		import semmle.code.cpp.dataflow.TaintTracking

[CPP-435] Calls to memset and ZeroMemory may be deleted by the compiler #1933

Are you sure you want to change the base?

[CPP-435] Calls to memset and ZeroMemory may be deleted by the compiler #1933

Uh oh!

Conversation

zlaski-semmle commented Sep 13, 2019

Uh oh!

geoffw0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zlaski-semmle Sep 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbj commented Sep 16, 2019

Uh oh!

zlaski-semmle commented Sep 18, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zlaski-semmle Sep 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zlaski-semmle commented Sep 24, 2019

Uh oh!

zlaski-semmle commented Sep 24, 2019

Uh oh!

jbj commented Sep 24, 2019

Uh oh!

zlaski-semmle commented Sep 25, 2019

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zlaski-semmle commented Oct 2, 2019

Uh oh!

zlaski-semmle commented Oct 15, 2019

Uh oh!

Uh oh!

zlaski-semmle commented Oct 16, 2019

Uh oh!

jbj commented Oct 29, 2019

[CPP-435] Calls to `memset` and `ZeroMemory` may be deleted by the compiler #1933

[CPP-435] Calls to `memset` and `ZeroMemory` may be deleted by the compiler #1933

zlaski-semmle Sep 23, 2019 •

edited

Loading

zlaski-semmle Sep 24, 2019 •

edited

Loading