CPP: Add query for CWE-369: Divide By Zero. #10431

ihsinme · 2022-09-15T06:49:03Z

good afternoon.
this query should find situations where the return value from some function might become part of the denominator and result in a division by zero.
this error is widely represented in projects.

detecting: CVE-2018-18190.
it is worth noting that the detection would be much wider, for example, it would cover colors such as CVE-2021-34069 if GlobalValueNumbering could work with FieldAccess.

I also ask for help in optimizing this query, after trying to make it look laconic, I got extremely poor performance, while quite trivial changes, such as moving msg to a common place or removing divFn, lead to an even greater slowdown.
It is difficult for me to move without having an understanding of the work of the optimizer, so I ask for any advice. if you don’t have them, I’ll have to move back, moving away from the laconic look.

Thanks.

cpp/ql/src/experimental/Security/CWE/CWE-369/DivideByZeroUsingReturnValue.ql

geoffw0

Hi @ihsinme, sorry it took me a while to getting around to looking at this. I agree that there's a performance problem that needs to be fixed here - in the vast majority of projects I tried this query on, it timed out. I was able to produce a handful of promising results, however.

cpp/ql/src/experimental/Security/CWE/CWE-369/DivideByZeroUsingReturnValue.ql

ihsinme · 2022-10-10T09:48:05Z

@geoffw0, thanks for the idea, I fixed it.

it would be great if you could give advice on how to compare the speed of different sections of code, different predicates, functions, or at least different queries.
for example I have a hypothesis that in the first part of the query
this partition

Expr getMulDivOperand(Expr e1) {
  result = e1 or
  result = e1.(MulExpr).getAnOperand() or
  result = e1.(DivExpr).getLeftOperand()
}
...
getMulDivOperand(globalValueNumber(div.getRV()).getAnExpr()) = findVal and

is the slowest and I don't know if it will work faster if I change it to

Expr getMulDivOperand(Expr e1) {
  result = globalValueNumber(e1).getAnExpr() or
  result = globalValueNumber(e1).getAnExpr().(MulExpr).getAnOperand() or
  result = globalValueNumber(e1).getAnExpr().(DivExpr).getLeftOperand()
}
...
getMulDivOperand(div.getRV()) = findVal and

in general, it is very difficult for me to understand how the optimizer will behave in a given situation, but simply comparing the execution time is not the best idea (there is caching and the next run may be faster than the previous one)

Of course, I can split this query into 2 or more and they will work better, but it's not very pretty.

geoffw0 · 2022-10-10T12:03:32Z

it would be great if you could give advice on how to compare the speed of different sections of code, different predicates, functions, or at least different queries.

There's a document about troubleshooting performance issues here: https://codeql.github.com/docs/writing-codeql-queries/troubleshooting-query-performance/

That said its tricky, there tends to be quite a lot of wobble in timings and QL is optimized globally so a change can have a large and unexpected effect on something else far away. There is usually something we can ultimately 'blame' and fix, and usually its a predicate that's generating a very large set of results when it doesn't need to (such as a 'cartesian product').

You can look at the evaluator log (in particular the "Most expensive predicates" table) for some hints about where to start. You can access this after running a query in VSCode by right-clicking on the query run in the query history, "Show Evaluator Log (Summary Text)".

Of course, I can split this query into 2 or more and they will work better, but it's not very pretty.

Not necessarily. We generally only split queries when that is clearer or more useful to the end user, not for performance reasons.

geoffw0 · 2022-10-10T12:48:36Z

My intuition is that your first version of getMulDivOperand will be faster, by the way, as it creates fewer rows of data by not applying global value numbering. It does mean more computation has to be done elsewhere, however, so I could be wrong.

ihsinme · 2022-10-11T10:15:01Z

@geoffw0 i made some adjustments, please rescan the code.

tell me what this line (38s) Starting to evaluate predicate let0#shared#3/9@3ba7d58g means
especially the hash at the end

Thank you.

geoffw0 · 2022-10-11T15:20:13Z

tell me what this line (38s) Starting to evaluate predicate let0#shared#3/9@3ba7d58g means

I don't know, I saw something similar in the evaluator log myself. I'll see if I can find someone who can explain it...

geoffw0 · 2022-10-11T15:51:51Z

Hi again @ihsinme,

Apparently when a predicate is large and complicated (particularly lots of negation, aggregates or nested disjunction) the evaluator sometimes breaks it up, and the component parts don't always get given useful names. That's where let0 comes from. The #shared may mean this chunk of code is common to more than one of the component parts. The 3ba7d58g is probably not useful information to us, just a hash of some data or other.

None of this is really a problem in itself, but if you break up some of your bigger predicates you might get more directly useful output from the evaluator log.

ihsinme · 2022-11-08T08:23:17Z

Hi @geoffw0.
I noticed a few points and would like to discuss them with you.

most of all, the code slows down the use of globalValueNumber, but if we reduce it, we will significantly narrow the detection
this is a late constraint, for example, changeInt limits the function after we have calculated changeInt from the entire set of functions. It would be great to find a technique for sequential constraint.
My experiment shows better performance if I get rid of findVal by inserting globalValueNumber(fn.getACallToThisFunction()).getAnExpr() everywhere, although logic suggests it should be the other way around, this can only explain the effect of the number of functions passed to the call. if so, could you suggest, for example, how to evaluate how many function calls are transferred to one or another predicate. The top stats that VS give are not very helpful, they are tied to time and can vary greatly from launch to launch.

Thanks.

geoffw0 · 2022-11-08T11:11:40Z

most of all, the code slows down the use of globalValueNumber, but if we reduce it, we will significantly narrow the detection

Global value numbering is expensive to compute but shouldn't be causing a timeout by itself - and should be cached once its been computed for one query on any particular project.

this is a late constraint, for example, changeInt limits the function after we have calculated changeInt from the entire set of functions. It would be great to find a technique for sequential constraint.

I don't understand what you're trying to say here.

My experiment shows better performance if I get rid of findVal by inserting globalValueNumber(fn.getACallToThisFunction()).getAnExpr() everywhere, although logic suggests it should be the other way around, this can only explain the effect of the number of functions passed to the call.

It may be that in each place where findVal is used, the functions where it needs to be evaluated is narrowed down by surrounding context making the overall computation cheaper when its done in place.

Have you tried the middle road of making it:

findVal = globalValueNumber(fn.getACallToThisFunction())

and replacing each use of findVal with findVal.getAnExpr()?

could you suggest, for example, how to evaluate how many function calls are transferred to one or another predicate. The top stats that VS give are not very helpful, they are tied to time and can vary greatly from launch to launch.

Its not so simple as one thing calling another. Predicates may be inlined into other predicates, evaluated differently relative to different contexts, broken into parts by the optimizer etc. You can look through the log file for areas that took a long time (clock time) and then look for Starting to evaluate predicate ... and Tuple counts lines nearby for an idea of where the problem may be. It can be quite confusing though.

ihsinme · 2022-11-09T08:52:07Z

@geoffw0 thanks for the direction, if I can interpret the entry 5 rows. as the number of elements in the predicate argument, then that's what I need.

however, the caching issue remains, ie. the first time it returns some values, but the second time they are nulled or changed.
could you elaborate on how i can disable startup results caching.
Thanks.

geoffw0 · 2022-11-09T08:57:01Z

could you elaborate on how i can disable startup results caching.

That depends how your running your queries. In VSCode, you can press control+shift+P and select the command "CodeQL: Clear Cache". If you're running from the command line I believe there are arguments to do it.

ihsinme · 2022-11-11T08:41:21Z

@geoffw0 look please. I think I fixed the problem with speed.

ihsinme · 2022-11-20T10:29:54Z

@geoffw0 i hope you have time to look at my corrections.

in addition, we can discuss the addition of divFc.getArgument(posArg) != findVal.getAnExpr() in the region of 238-239. this adds speed, although not significantly.

geoffw0 · 2022-11-21T08:56:16Z

Yep, sorry, I'll take a look today...

geoffw0 · 2022-11-21T18:22:28Z

This is definitely an improvement. I tried the query on several projects locally, all of which timed out before the change, and now one of them finishes (the others still time out). 👍

It appears much of the remaining slowness is caused by a large evaluation of getValueOperand, I think because the optimizer is struggling to figure out which inputs are usefully constrained. We can help it by changing the three calls of the form:

a = getValueOperand(x, y, z)

to:

pragma[only_bind_into](a) = getValueOperand(x, y, z)

which tells the optimizer to figure out the right-hand-side of the equality first, then feed it into the left (and not vice-versa). Please keep use of pragmas to a minimum though, the optimizer should generally make good decisions and they can do more harm than good.

There still appears to be a problem with the way GVN::getAnExpr is used, possibly in mayBeReturnValue.

in addition, we can discuss the addition of divFc.getArgument(posArg) != findVal.getAnExpr() in the region of 238-239. this adds speed, although not significantly.

I'm not sure if this helps or makes no difference. Certainly doesn't seem to do any harm.

ihsinme · 2022-11-22T07:12:23Z

There still appears to be a problem with the way GVN::getAnExpr is used, possibly in mayBeReturnValue.

thanks for the idea i will think about it

ihsinme · 2022-11-24T14:27:04Z

hi @geoffw0. fixed predicate mayBeReturnValue and did 2 dry runs,
https://lgtm.com/query/284701396232437153/
https://lgtm.com/query/7034272461887222108/
i think now i need to re-look at your suggestion to use pragma[only_bind_into](a) = getValueOperand(x, y, z)

ihsinme · 2022-12-03T11:31:17Z

hi @geoffw0. fixed predicate mayBeReturnValue and did 2 dry runs, https://lgtm.com/query/284701396232437153/ https://lgtm.com/query/7034272461887222108/ i think now i need to re-look at your suggestion to use pragma[only_bind_into](a) = getValueOperand(x, y, z)

@geoffw0 I'm afraid you won't see it. lgtm.com will be closed.

geoffw0 · 2022-12-05T09:10:07Z

lgtm.com will be closed.

I know, I feel a bit sad about it given the amount I've used the platform. Still, code scanning lives on...

This branch is still timing out on a little over half of the runs. I feel like we're getting closer but we're not quite there.

ihsinme · 2022-12-17T21:41:19Z

I improved the query, now all projects from my collection are scanned.
@geoffw0 look please.

ihsinme · 2023-01-12T15:32:21Z

@geoffw0 take a look, please

geoffw0 · 2023-01-24T20:23:09Z

Hi @ihsinme - the new version is definitely an improvement on smaller projects but still doesn't work well at all on medium-to-large projects. It looks like the issue has to do with the liberal use of global value numbering in this query.

I have struggled to find any significant improvements without just removing some of these GVN uses. However I am not sure which ones are essential for getting good results from this query and which ones are not.

ihsinme · 2023-01-26T09:13:07Z

@geoffw0 thanks for the answer.
please give me the names of the projects where you see performance issues.
I don't see any problems in all my collections.

geoffw0 · 2023-01-26T11:32:22Z

The query finished on a little under half the projects I ran it on. Some (arbitrary) examples where it didn't finish: apple/cups, fish-shell/fish-shell and vim/vim.

ihsinme · 2023-02-06T16:43:52Z

Good afternoon @geoffw0 .
In the release, I missed one inline, so the assembly turned out to be not the same as in my tests, the projects that you sent me are not assembled, although not quickly.
in order to move this query forward, I propose to limit the condition in predicate checkConditions1, so we lose 5 simple detects from the test file, but the speed of work will increase significantly?

geoffw0 · 2023-02-07T10:02:08Z

in order to move this query forward, I propose to limit the condition in predicate checkConditions1, so we lose 5 simple detects from the test file, but the speed of work will increase significantly?

That sounds like a good tradeoff. Feel free to leave comments describing the removed logic in case we decide to repair it in future.

ihsinme · 2023-02-12T19:00:45Z

@geoffw0 could you tell me if the current version of the request meets the requirements for speed.
if it is, then i will fix the test file.
If not, I'll try to limit the request more.

Thank you.

geoffw0 · 2023-02-17T13:21:48Z

Hi @ihsinme , yes, this is performing nicely now and completed on all the projects I tried it on! Please do update the tests then I'll run the PR checks.

ihsinme · 2023-02-19T18:44:35Z

@geoffw0 i fixed the test suite.

github-actions · 2023-02-20T12:55:45Z

QHelp previews:

cpp/ql/src/experimental/Security/CWE/CWE-369/DivideByZeroUsingReturnValue.qhelp

Divide by zero using return value

Possible cases of division by zero when using the return value from functions.

Example

The following example shows the use of a function with an error when using the return value and without an error.

...
  a = getc(f);
  if (a < 123) ret = 123/a; // BAD
...
  if (a != 0) ret = 123/a; // GOOD
...

References

CERT Coding Standard: INT33-C. Ensure that division and remainder operations do not result in divide-by-zero errors - SEI CERT C Coding Standard - Confluence.
Common Weakness Enumeration: CWE-369.

geoffw0

Tests passed, performance is good, I'm happy with this PR!

Thank you for your contribution.

ihsinme · 2023-03-03T10:41:05Z

@geoffw0 This PR is not merged, can I do anything?

geoffw0 · 2023-03-03T10:42:09Z

Sorry, I had approved it, I thought you could merge it.

Merged.

create new branchihsinme-patch-111 in fork

50105c1

ihsinme requested a review from a team as a code owner September 15, 2022 06:49

github-actions bot added C++ documentation labels Sep 15, 2022

ihsinme mentioned this pull request Sep 26, 2022

CPP: Add query for CWE-369: Divide By Zero. github/securitylab#705

Closed

2 tasks

github-advanced-security bot found potential problems Oct 3, 2022

View reviewed changes

Update DivideByZeroUsingReturnValue.ql

392c948

geoffw0 reviewed Oct 7, 2022

View reviewed changes

cpp/ql/src/experimental/Security/CWE/CWE-369/DivideByZeroUsingReturnValue.ql Outdated Show resolved Hide resolved

Update DivideByZeroUsingReturnValue.ql

01b357e

Update DivideByZeroUsingReturnValue.ql

3954da5

Update DivideByZeroUsingReturnValue.ql

0fb1ded

Update DivideByZeroUsingReturnValue.ql

5ceab40

ihsinme added 3 commits December 18, 2022 00:34

Update DivideByZeroUsingReturnValue.ql

c790b0f

Update DivideByZeroUsingReturnValue.expected

945cdef

Update test.cpp

a2836dc

Update DivideByZeroUsingReturnValue.ql

065ca3c

ihsinme added 3 commits February 19, 2023 21:41

Update DivideByZeroUsingReturnValue.ql

49af5ec

Update test.cpp

54acbf7

Update DivideByZeroUsingReturnValue.expected

213abc6

geoffw0 added the ready-for-internal-ci label Feb 20, 2023

qlucie bot removed the ready-for-internal-ci label Feb 20, 2023

geoffw0 approved these changes Feb 20, 2023

View reviewed changes

geoffw0 merged commit 7b596f4 into github:main Mar 3, 2023

CPP: Add query for CWE-369: Divide By Zero. #10431

CPP: Add query for CWE-369: Divide By Zero. #10431

Uh oh!

Conversation

ihsinme commented Sep 15, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

geoffw0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ihsinme commented Oct 10, 2022

Uh oh!

geoffw0 commented Oct 10, 2022

Uh oh!

geoffw0 commented Oct 10, 2022

Uh oh!

ihsinme commented Oct 11, 2022

Uh oh!

geoffw0 commented Oct 11, 2022

Uh oh!

geoffw0 commented Oct 11, 2022

Uh oh!

ihsinme commented Nov 8, 2022

Uh oh!

geoffw0 commented Nov 8, 2022

Uh oh!

ihsinme commented Nov 9, 2022

Uh oh!

geoffw0 commented Nov 9, 2022

Uh oh!

ihsinme commented Nov 11, 2022

Uh oh!

ihsinme commented Nov 20, 2022

Uh oh!

geoffw0 commented Nov 21, 2022

Uh oh!

geoffw0 commented Nov 21, 2022

Uh oh!

ihsinme commented Nov 22, 2022

Uh oh!

ihsinme commented Nov 24, 2022

Uh oh!

ihsinme commented Dec 3, 2022

Uh oh!

geoffw0 commented Dec 5, 2022

Uh oh!

ihsinme commented Dec 17, 2022

Uh oh!

ihsinme commented Jan 12, 2023

Uh oh!

geoffw0 commented Jan 24, 2023

Uh oh!

ihsinme commented Jan 26, 2023

Uh oh!

geoffw0 commented Jan 26, 2023

Uh oh!

ihsinme commented Feb 6, 2023

Uh oh!

geoffw0 commented Feb 7, 2023

Uh oh!

ihsinme commented Feb 12, 2023

Uh oh!

geoffw0 commented Feb 17, 2023

Uh oh!

ihsinme commented Feb 19, 2023

Uh oh!

github-actions bot commented Feb 20, 2023

Divide by zero using return value

Example

References

Uh oh!

geoffw0 left a comment