Skip to content

[CPP-370] Non-constant format arguments to printf and friends #1251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Jul 1, 2019

Conversation

zlaski-semmle
Copy link
Contributor

No description provided.

@zlaski-semmle zlaski-semmle requested a review from a team as a code owner April 15, 2019 20:21
@zlaski-semmle zlaski-semmle self-assigned this Apr 15, 2019
@zlaski-semmle
Copy link
Contributor Author

Before I go any further, I have a question regarding cpp/ql/test/query-tests/Likely Bugs/Format/NonConstantFormat/NonConstantFormat.c. The test flags lines 30 and 41 as being problematic, but what about lines 34, 36, 38 and 40? They all should also be flagged, in my opinion.

In cpp/ql/src/Likely Bugs/Format/NonConstantFormat.ql, the mumblegettext functions are being treated specially for some reason. (Since their use is to translate phrases to foreign languages, it is impossible to guarantee that their output would make a suitable format string.) Then there is the mysterious _ (underscore) macro. Are we not dealing with preprocessed text?

I'm basically trying to obtain a "baseline" as to what is and is not correct behavior for this query.

@rdmarsh2
Copy link
Contributor

Since their use is to translate phrases to foreign languages, it is impossible to guarantee that their output would make a suitable format string.

On the other hand, the translation of a valid format string will almost always be a valid format string; producing a result everywhere that a translated string is formatted will mean that the vast majority of results in a project using gettext are false positives. The msgfmt command that generates gettext's translation files can check that the translated string is usable with the same arguments as the untranslated string.

It looks like the _(X) macro is commonly #defined to gettext((X)), which is why it's excluded here.

@rdmarsh2
Copy link
Contributor

We don't include any preprocessor logic as AST elements, but we do capture information about what preprocessor directives were encountered, what macro expansions occurred, and what AST elements were partially or completely generated by a particular macro expansion. Take a look at https://help.semmle.com/qldoc/cpp/semmle/code/cpp/Macro.qll/module.Macro.html for the classes and predicates involved in macro handling.

@zlaski-semmle
Copy link
Contributor Author

So it looks like we're baking in a lot of assumptions regarding ...gettext and msgfmt. Is this what we want moving forward? Even if we somehow convince ourselves that ...gettext() produce valid format strings, the strings are not constant, which is what this query is supposed to catch.

@rdmarsh2
Copy link
Contributor

The main danger this rule is intended to prevent is that two strings with different expected format parameters could flow to the same formatting function call, and therefore one of them is provided the wrong set of parameters. If the gettext toolchain is used properly, that situation is prevented. I think it's better to bake those assumptions in than to produce a huge number of results on every GNU project when we know that the vast majority of them will be false positives.

@geoffw0
Copy link
Contributor

geoffw0 commented Apr 16, 2019

I have a question regarding cpp/ql/test/query-tests/Likely Bugs/Format/NonConstantFormat/NonConstantFormat.c. The test flags lines 30 and 41 as being problematic, but what about lines 34, 36, 38 and 40? They all should also be flagged, in my opinion.

These are deliberate exclusions to keep the 'false positive' (or perhaps I should say 'noisy result') rates down. See https://jira.semmle.com/browse/ODASA-5357 (I think there was a specific one for _() as well but I can't find it right now). These are not results which are technically incorrect (the format string isn't constant) and they're not guaranteed to be safe (gettext could do anything) - but in practice flagging these with alerts leads to hundreds of unhelpful results in projects that use these patterns.

Since their use is to translate phrases to foreign languages, it is impossible to guarantee that their output would make a suitable format string.

A translated string is likely to be loaded from some kind of data store which the developers have similar control over to the source code (admittedly, the truth of this claim will vary) - hence it may be considered 'safe' to a similar degree as a constant string written in the source code.

I'm basically trying to obtain a "baseline" as to what is and is not correct behavior for this query.

I find it helpful to imagine that I 'own' the source code in question and come across the alert on it - am I likely to change the code as a result of seeing this, or just moan that those LGTM developers don't understand why my code is good? Furthermore on LGTM we bias towards high accuracy (low false positives) rather than widest coverage (low false negatives), as frankly most people are pushed for time and would rather look at a smaller set of higher quality alerts - though for critical security queries there's some flexibility here.

geoffw0
geoffw0 previously approved these changes Apr 16, 2019
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your qldoc change LGTM.

@zlaski-semmle
Copy link
Contributor Author

Thanks guys for the ...gettext() feedback. I have seen the light :-)

@geoffw0
Copy link
Contributor

geoffw0 commented Apr 17, 2019

Regarding this PR, are you planning to push any more changes or shall I merge it now?

@zlaski-semmle
Copy link
Contributor Author

@geoffw0, there will be more changes once I figure out how to implement them. :-)

@zlaski-semmle
Copy link
Contributor Author

So I've pushed my initial attempt at using the global DataFlow library. There are now 2 areas in which I need assistance:
(1) Analysis of NonConstantFormat.c and test.cpp to determine which cases should or should not get flagged by the query. Whereas the current results for NonConstantFormat.c look reasonable, I suspect that test.cpp contains both false positives and false negatives;
(2) How to implement the isAdditionalFlowStep(Node, Node) predicate to insert additional edges into the data flow graph. Here I need to determine what is missing to begin with, and then what an implementation of isAdditionalFlowStep(Node, Node) might look like. I've grepped the ql tree but can't really find a usable example.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll continue reviewing when I can read the .expected file and when we've settled the big-picture discussions on what this query should do.

// For the following `...gettext` functions, we assume that
// all translations preserve the type and order of `%` specifiers
// (and hence are safe to use as format strings). This is
// assumption is hard-coded into the query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment would be more helpful in the query itself. In any case, please tweak the formatting so the line numbers in the test don't get shifted down. Otherwise it's too hard to review NonConstantFormat.expected, and that's the most important file to review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm about to push a "new" version of NonConstantFormat.expected with the corrected line numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new version of NonConstantFormat.expected that is in sync with NonConstantFormat.c and test.cpp has been pushed. Please let me know if you're still encountering problems.

@jbj
Copy link
Contributor

jbj commented Apr 24, 2019

Whereas the current results for NonConstantFormat.c look reasonable, I suspect that test.cpp contains both false positives and false negatives;

I looked through test.cpp and found myself agreeing with its annotations of OK and NOT OK. Are there any that you don't agree with? I hope the .expected file matches the annotations in test.cpp, but I didn't actually check that.

@zlaski-semmle
Copy link
Contributor Author

I think the annotations look reasonable (though the false positives would be nice to fix, obviously). But the NonConstantFormat.ql query is still a work in progress and so the .expected output does not contain what is desired.

I need to figure out how to add additional edges to the dataflow graph via isAdditionalFlowStep(). At present, I do not know how to do so (and could not find a suitable example when greping the source tree) but will consult with @adityasharad about it.

@zlaski-semmle
Copy link
Contributor Author

zlaski-semmle commented Apr 24, 2019

On second thought, I do not see why using const_wash in test.cpp should lead to positives (as it signals programmer intent). What do you all think?

@jbj
Copy link
Contributor

jbj commented Apr 24, 2019

On second thought, I do not see why using const_wash in test.cpp should lead to positives (as it signals programmer intent). What do you all think?

You mean that if a non-constant format string is cast to const, then it's fine? Or that it's fine if it passes through a function even if that function is not gettext?

I expect that if you get the interprocedural data flow library working, then the const_wash FP will go away. It arises because the query does too simplistic modelling of parameters and return values, and that's exactly what the data flow library does right.

@zlaski-semmle
Copy link
Contributor Author

You mean that if a non-constant format string is cast to const, then it's fine? Or that it's fine if it passes through a function even if that function is not gettext?

The latter.

printf(_("Hello, World\n")); // OK
{
char hello[] = "hello, World\n";
hello[0] = 'H';
printf(hello); // NOT OK
printf(_(hello)); // NOT OK
printf(_(hello)); // OK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the way you think here: everything that comes out of gettext is safe (for the purpose of this query).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query actually goes a bit further and assumes that _(...) is always OK, regardless of where the _ macro points at. Do you think this is overly lenient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could look into adjusting it, but I'd prefer keeping this PR focused on the data flow change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

|
// we disallow parameters, since they may be bound to unsafe arguments
// at various call sites.
not v instanceof Parameter and source.asExpr() instanceof StringLiteral
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joining the two conjuncts on a single line makes it look like and binds tighter than not when in fact the precedence is the other way around.

// we disallow parameters, since they may be bound to unsafe arguments
// at various call sites.
not v instanceof Parameter and source.asExpr() instanceof StringLiteral
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predicate doesn't relate source and sink, so it computes a Cartesian product. Let's discuss how to fix it in the next meeting.

@zlaski-semmle
Copy link
Contributor Author

I just checked in a rewrite of NonConstantFormat.ql using the taint tracking library.

Feel free to review the query as you like, because it is almost certainly not its final version.

I've run a 66-project query (https://lgtm.com/query/1255083408835090613/) and see a lot of hits in dotnet/coreclr (among others). I think what we're seeing is a cascade of function calls terminating in a call to printf. So, for example, function a() has a call b(get_a_string_func()), b(src) calls c(src), and c(src) calls printf(src). In this situation, the query flags the call site b(get_a_string_func()).

From a data flow perspective, I believe that the query is correct, but its results are confusing to say the least. What do you guys think should be done? We could just check for string literals at the calls to printf, but that would not require any data flow at all!

@zlaski-semmle
Copy link
Contributor Author

Just discovered that I broke test/query-tests/Security/CWE/CWE-134/semmle/consts/NonConstantFormat.qlref, which references the same NonConstantFormat.ql. The new .actual file differs from the .expected, though both files are erroneous in many respects.

@geoffw0
Copy link
Contributor

geoffw0 commented May 9, 2019

This behaviour sounds fine to me, and most of the results look OK as well at a glance. Could you call out one in particular you're not happy with?

@jbj
Copy link
Contributor

jbj commented May 10, 2019

From a data flow perspective, I believe that the query is correct, but its results are confusing to say the least.

I agree that the results from this query can be confusing, but is this PR making it much worse? It would be great to add path explanations to this query, but can that be deferred to a follow-up PR?

@jbj
Copy link
Contributor

jbj commented May 27, 2019

It's my impression you're now happy with the test suite results relative to what we had before. Then the next step is to test the query on our set of LGTM test projects. If it's hard to get an overview of what's changed, you can try to combine the old and the new queries into two "diff queries", where one shows only the results that were gained relative to the old query, and the other shows only the results that were lost.

@jbj jbj added the C++ label Jun 2, 2019
@zlaski-semmle
Copy link
Contributor Author

Before: https://lgtm.com/query/3266353611601973701/
After: https://lgtm.com/query/2338642181633737976/

At first glance, it appears that the DataFlow-based query is strictly more sensitive than the old query, but I will investigate in more detail.

@zlaski-semmle
Copy link
Contributor Author

The new query (https://lgtm.com/query/2338642181633737976/) generates lots of new alerts (esp. for dotnet/coreclr) and most appear to be true positives. However, there are a few false negatives as well. Not sure if this is worth pursuing further, since the query results are fuzzy/noisy to begin with.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pleased to see that the changes lead to new good results for this query. However, I sampled the first ~10 differences between the old and the new queries, and I think there are too many new FPs and FNs. Please try to address some of them so we can confidently say that this query is better than the previous version.

class blab {
void out1(void) {
char *fmt = (char *)__builtin_alloca(10);
diagnostic(fmt); // GOOD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely this is BAD, not GOOD? It's flagged by the query, so if you think it's GOOD, then it's at least GOOD [FALSE POSITIVE]. But I think it's BAD since the only reason to allocate a writeable buffer is to write non-constant data to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant BAD [FALSE POSITIVE] in the comment above. But you're right, this is just plain BAD.

@@ -18,7 +18,7 @@ extern "C" int snprintf ( char * s, int n, const char * format, ... );
struct A {
void do_print(const char *fmt0) {
char buf[32];
snprintf(buf, 32, fmt0); // BAD [should detect at top-most call]
snprintf(buf, 32, fmt0); // GOOD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean BAD? This line is still flagged by the query. We've been a bit back and forth about formats from parameters being passed to non-vprintf functions, and I think you've convinced me that it's BAD.

Overall, it looks like this file has come out of sync with its entries in NonConstantFormat.expected. Please go through this file (and the others) to ensure that the comment starts with // BAD if and only if the line is listed in the expected file.

Copy link
Contributor Author

@zlaski-semmle zlaski-semmle Jun 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the opposite recollection, i.e., that we should leave things as they are (wrt handling vprintf-like and non-vprintf-like calls). Also, is it true that // BAD always imply an alert being signaled? In nested.cpp above, I (originally) marked line 21 as // BAD since the alert was in the wrong place (i.e., a false positive), but I also marked line 42 as // BAD since that is where the alert should have gone (a false negative). Am I getting my nomenclature mixed up?

Copy link
Contributor Author

@zlaski-semmle zlaski-semmle Jun 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will tentatively mark line 21 as // BAD [FALSE POSITIVE] and line 42 as // GOOD [NOT DETECTED]. This will adhere to your rule that // BAD iff the line gets flagged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please go through this file (and the others) to ensure that the comment starts with // BAD if and only if the line is listed in the expected file.

What I wrote there did not make sense, sorry. Let me try again.

We write // BAD to mean that the format call ought to be flagged by a perfect query. If it's not actually flagged in the expected file, we add [NOT DETECTED]. We write // GOOD to mean that the format call should not be flagged by a perfect query. If it's actually flagged in the expected file, we add [FALSE POSITIVE].

Copy link
Contributor Author

@zlaski-semmle zlaski-semmle Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes more sense, thanks. I still think we should use // FLAGGED and // NOT FLAGGED, but that's a discussion for another day. :-)

@zlaski-semmle
Copy link
Contributor Author

zlaski-semmle commented Jun 5, 2019

For the following code fragment

/*****************************************************************************
 *
 *  Display an ARM64 condition code for the conditional instructions
 */
void emitter::emitDispCond(insCond cond)
{
    const static char* armCond[16] = {"eq", "ne", "hs", "lo", "mi", "pl", "vs", "vc",
                                      "hi", "ls", "ge", "lt", "gt", "le", "AL", "NV"}; // The last two are invalid
    unsigned imm = (unsigned)cond;
    assert((0 <= imm) && (imm < ArrLen(armCond)));
    printf(armCond[imm]);
}

the new query generates an alert whereas the old query did not. The programmer clearly knows what they're doing, so the question really is whether it is acceptable to have a false positive here.

Another problem I'm running into is that I absolutely cannot reproduce this result in a smaller file containing just the function above (plus all the dependencies). (The original function is found in the dotnet/coreclr project.)

@zlaski-semmle
Copy link
Contributor Author

After my latest commit (which took care of handling _ macros), the result differences with the original version are as follows:

"small" projects:
==========================
lastpass/lastpass-cli: 1 true positive
curl/curl: 1 false negative
tesseract-ocr/tesseract: 1 false negative (which is identical to two other true positives!)
Blizzard/s2client-api: 1 negative (true/false?)
musescore/MuseScore: 3 true positives
systemd/systemd: 1 true positive
domoticz/domoticz: 1 true positive
cxong/cdogs-sdl: 1 true positive
ntpsec/ntpsec: 1 false negative
pgsql-jp/jpug-doc: 1 false negative
fawkesrobotics/fawkes: 3 negatives (true/false?)
nishadg246/pybullet-play: 3 true positives
numpy/numpy: 3 true positives
protocolbuffers/protobuf: 1 false negative

"big" projects:
==========================
dotnet/coreclr: 231 -> 363
libretro/RetroArch: 59 -> 54
torvalds/linux: 16 -> 15
wireshark/wireshark: 41 -> 34
coreutils/coreutils: 46 -> 85
ManaPlus/ManaPlus: 12 -> 5
bloomberg/comdb2: 48 -> 39
microsoft/ChakraCore: 16 -> 18
vim/vim: 43 -> 9
FFmpeg/FFmpeg: 4 -> 7

I have not yet analyzed the "big" projects, but if the smaller projects are any indication, I think I would find a smattering of true positives and false negatives.

Can anyone think of how to attack the false negative problem?

@jbj
Copy link
Contributor

jbj commented Jun 12, 2019

I'd like to help with the investigation of those false negatives, but it's unlikely that I'll have time this week.

@zlaski-semmle
Copy link
Contributor Author

I have some good news and some bad news. The good news is that most of the false negatives listed above are true negatives. (The one exception involves aliasing writes that the TaintTracking library may possibly not support yet -- see https://jira.semmle.com/browse/CPP-394).

The bad news is that we're taking a performance hit. I don't have exact numbers, just anecdotal experience with running the new vs. old queries inside of Eclipse.

@zlaski-semmle
Copy link
Contributor Author

zlaski-semmle commented Jun 13, 2019

Some dotnet/coreclr numbers: 23 FNs and 0 TNs for the old query, 7 FPs and 148 TPs for the new query. It is possible that some FNs morphed into TPs due to changes in where the alert was signaled.

As of yet, I did not analyze the vast majority of alerts that are common to both queries; there are probably some FPs there.

@zlaski-semmle
Copy link
Contributor Author

@zlaski-semmle
Copy link
Contributor Author

For coreutils/coreutils, we have (delta) 12 FNs and 2 TNs for the old query and 0 FPs and 53 TPs for the new query.

So a pattern is emerging in which we are finding lots of new good results while losing some existing results.

@zlaski-semmle
Copy link
Contributor Author

vim/vim: 30 FNs and 4 TNs in the old query (no new results in the new query). So we've basically cut down on the noise.

@zlaski-semmle
Copy link
Contributor Author

I think the query needs a bit more tweaking. We treat assignment as a non-const operation (which it is), but I think we should allow it if the RHS is a string literal. I keep seeing the idiom whereby a (const) char * is declared without an initializer, one of several values is assigned to it in subsequent control flow, and the result is finally used in a printf-like call.

@zlaski-semmle
Copy link
Contributor Author

Modified new query run: https://lgtm.com/query/9048479769501953958/

@yh-semmle yh-semmle removed the request for review from a team June 15, 2019 00:06
jbj added 2 commits June 20, 2019 14:00
To speed up the taint analysis in `NonConstantFormat.ql` and to remove
FPs that were due to taint spreading from `i` to `a[i]`, this commit
stops the taint tracking in `NonConstantFormat.ql` at every node that
could not possibly contain a string.

I tested performance on Wireshark, and it's fine. Pulling out the
`isSanitizerNode` prevented `isSanitizer` from turning into four
half-slow RA predicates due to both CPE and `#antijoin_rhs`
transformations happening.
@jbj
Copy link
Contributor

jbj commented Jun 20, 2019

I've opened zlaski-semmle#1 as a PR against this PR. It should fix the array access problems.

C++: NonConstantFormat taint only for string types
@zlaski-semmle
Copy link
Contributor Author

zlaski-semmle commented Jun 29, 2019

After merging in jbj/NonConstantFormat-ArrayExpr , we have improved alert quality:

dotnet/coleclr: 316->300, of which 4 TP and 20 FN.
torvalds/linux: 15->14, 1 FN
bloomberg/comdb2: 38->37, 1 FN
FFmpeg/FFmpeg: 7->4, 3 FN

@zlaski-semmle
Copy link
Contributor Author

Created a separate PR - #1533 - to deal with the 1.22 change note describing the improved query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants