Skip to content

CPP: Add query for CWE-369: Divide By Zero. #10431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 3, 2023

Conversation

ihsinme
Copy link
Contributor

@ihsinme ihsinme commented Sep 15, 2022

good afternoon.
this query should find situations where the return value from some function might become part of the denominator and result in a division by zero.
this error is widely represented in projects.

detecting: CVE-2018-18190.
it is worth noting that the detection would be much wider, for example, it would cover colors such as CVE-2021-34069 if GlobalValueNumbering could work with FieldAccess.

I also ask for help in optimizing this query, after trying to make it look laconic, I got extremely poor performance, while quite trivial changes, such as moving msg to a common place or removing divFn, lead to an even greater slowdown.
It is difficult for me to move without having an understanding of the work of the optimizer, so I ask for any advice. if you don’t have them, I’ll have to move back, moving away from the laconic look.

Thanks.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ihsinme, sorry it took me a while to getting around to looking at this. I agree that there's a performance problem that needs to be fixed here - in the vast majority of projects I tried this query on, it timed out. I was able to produce a handful of promising results, however.

@ihsinme
Copy link
Contributor Author

ihsinme commented Oct 10, 2022

@geoffw0, thanks for the idea, I fixed it.

it would be great if you could give advice on how to compare the speed of different sections of code, different predicates, functions, or at least different queries.
for example I have a hypothesis that in the first part of the query
this partition

Expr getMulDivOperand(Expr e1) {
  result = e1 or
  result = e1.(MulExpr).getAnOperand() or
  result = e1.(DivExpr).getLeftOperand()
}
...
getMulDivOperand(globalValueNumber(div.getRV()).getAnExpr()) = findVal and

is the slowest and I don't know if it will work faster if I change it to

Expr getMulDivOperand(Expr e1) {
  result = globalValueNumber(e1).getAnExpr() or
  result = globalValueNumber(e1).getAnExpr().(MulExpr).getAnOperand() or
  result = globalValueNumber(e1).getAnExpr().(DivExpr).getLeftOperand()
}
...
getMulDivOperand(div.getRV()) = findVal and

in general, it is very difficult for me to understand how the optimizer will behave in a given situation, but simply comparing the execution time is not the best idea (there is caching and the next run may be faster than the previous one)

Of course, I can split this query into 2 or more and they will work better, but it's not very pretty.

@geoffw0
Copy link
Contributor

geoffw0 commented Oct 10, 2022

it would be great if you could give advice on how to compare the speed of different sections of code, different predicates, functions, or at least different queries.

There's a document about troubleshooting performance issues here: https://codeql.github.com/docs/writing-codeql-queries/troubleshooting-query-performance/

That said its tricky, there tends to be quite a lot of wobble in timings and QL is optimized globally so a change can have a large and unexpected effect on something else far away. There is usually something we can ultimately 'blame' and fix, and usually its a predicate that's generating a very large set of results when it doesn't need to (such as a 'cartesian product').

You can look at the evaluator log (in particular the "Most expensive predicates" table) for some hints about where to start. You can access this after running a query in VSCode by right-clicking on the query run in the query history, "Show Evaluator Log (Summary Text)".

Of course, I can split this query into 2 or more and they will work better, but it's not very pretty.

Not necessarily. We generally only split queries when that is clearer or more useful to the end user, not for performance reasons.

@geoffw0
Copy link
Contributor

geoffw0 commented Oct 10, 2022

My intuition is that your first version of getMulDivOperand will be faster, by the way, as it creates fewer rows of data by not applying global value numbering. It does mean more computation has to be done elsewhere, however, so I could be wrong.

@ihsinme
Copy link
Contributor Author

ihsinme commented Oct 11, 2022

@geoffw0 i made some adjustments, please rescan the code.

tell me what this line (38s) Starting to evaluate predicate let0#shared#3/9@3ba7d58g means
especially the hash at the end

Thank you.

@geoffw0
Copy link
Contributor

geoffw0 commented Oct 11, 2022

tell me what this line (38s) Starting to evaluate predicate let0#shared#3/9@3ba7d58g means

I don't know, I saw something similar in the evaluator log myself. I'll see if I can find someone who can explain it...

@geoffw0
Copy link
Contributor

geoffw0 commented Oct 11, 2022

Hi again @ihsinme,

Apparently when a predicate is large and complicated (particularly lots of negation, aggregates or nested disjunction) the evaluator sometimes breaks it up, and the component parts don't always get given useful names. That's where let0 comes from. The #shared may mean this chunk of code is common to more than one of the component parts. The 3ba7d58g is probably not useful information to us, just a hash of some data or other.

None of this is really a problem in itself, but if you break up some of your bigger predicates you might get more directly useful output from the evaluator log.

@ihsinme
Copy link
Contributor Author

ihsinme commented Nov 8, 2022

Hi @geoffw0.
I noticed a few points and would like to discuss them with you.

  1. most of all, the code slows down the use of globalValueNumber, but if we reduce it, we will significantly narrow the detection
  2. this is a late constraint, for example, changeInt limits the function after we have calculated changeInt from the entire set of functions. It would be great to find a technique for sequential constraint.
  3. My experiment shows better performance if I get rid of findVal by inserting globalValueNumber(fn.getACallToThisFunction()).getAnExpr() everywhere, although logic suggests it should be the other way around, this can only explain the effect of the number of functions passed to the call. if so, could you suggest, for example, how to evaluate how many function calls are transferred to one or another predicate. The top stats that VS give are not very helpful, they are tied to time and can vary greatly from launch to launch.

Thanks.

@geoffw0
Copy link
Contributor

geoffw0 commented Nov 8, 2022

most of all, the code slows down the use of globalValueNumber, but if we reduce it, we will significantly narrow the detection

Global value numbering is expensive to compute but shouldn't be causing a timeout by itself - and should be cached once its been computed for one query on any particular project.

this is a late constraint, for example, changeInt limits the function after we have calculated changeInt from the entire set of functions. It would be great to find a technique for sequential constraint.

I don't understand what you're trying to say here.

My experiment shows better performance if I get rid of findVal by inserting globalValueNumber(fn.getACallToThisFunction()).getAnExpr() everywhere, although logic suggests it should be the other way around, this can only explain the effect of the number of functions passed to the call.

It may be that in each place where findVal is used, the functions where it needs to be evaluated is narrowed down by surrounding context making the overall computation cheaper when its done in place.

Have you tried the middle road of making it:

findVal = globalValueNumber(fn.getACallToThisFunction())

and replacing each use of findVal with findVal.getAnExpr()?

could you suggest, for example, how to evaluate how many function calls are transferred to one or another predicate. The top stats that VS give are not very helpful, they are tied to time and can vary greatly from launch to launch.

Its not so simple as one thing calling another. Predicates may be inlined into other predicates, evaluated differently relative to different contexts, broken into parts by the optimizer etc. You can look through the log file for areas that took a long time (clock time) and then look for Starting to evaluate predicate ... and Tuple counts lines nearby for an idea of where the problem may be. It can be quite confusing though.

@ihsinme
Copy link
Contributor Author

ihsinme commented Nov 9, 2022

@geoffw0 thanks for the direction, if I can interpret the entry 5 rows. as the number of elements in the predicate argument, then that's what I need.

however, the caching issue remains, ie. the first time it returns some values, but the second time they are nulled or changed.
could you elaborate on how i can disable startup results caching.
Thanks.

@geoffw0
Copy link
Contributor

geoffw0 commented Nov 9, 2022

could you elaborate on how i can disable startup results caching.

That depends how your running your queries. In VSCode, you can press control+shift+P and select the command "CodeQL: Clear Cache". If you're running from the command line I believe there are arguments to do it.

@ihsinme
Copy link
Contributor Author

ihsinme commented Nov 11, 2022

@geoffw0 look please. I think I fixed the problem with speed.

@ihsinme
Copy link
Contributor Author

ihsinme commented Nov 20, 2022

@geoffw0 i hope you have time to look at my corrections.

in addition, we can discuss the addition of divFc.getArgument(posArg) != findVal.getAnExpr() in the region of 238-239. this adds speed, although not significantly.

@geoffw0
Copy link
Contributor

geoffw0 commented Nov 21, 2022

Yep, sorry, I'll take a look today...

@geoffw0
Copy link
Contributor

geoffw0 commented Nov 21, 2022

This is definitely an improvement. I tried the query on several projects locally, all of which timed out before the change, and now one of them finishes (the others still time out). 👍


It appears much of the remaining slowness is caused by a large evaluation of getValueOperand, I think because the optimizer is struggling to figure out which inputs are usefully constrained. We can help it by changing the three calls of the form:

a = getValueOperand(x, y, z)

to:

pragma[only_bind_into](a) = getValueOperand(x, y, z)

which tells the optimizer to figure out the right-hand-side of the equality first, then feed it into the left (and not vice-versa). Please keep use of pragmas to a minimum though, the optimizer should generally make good decisions and they can do more harm than good.


There still appears to be a problem with the way GVN::getAnExpr is used, possibly in mayBeReturnValue.


in addition, we can discuss the addition of divFc.getArgument(posArg) != findVal.getAnExpr() in the region of 238-239. this adds speed, although not significantly.

I'm not sure if this helps or makes no difference. Certainly doesn't seem to do any harm.

@ihsinme
Copy link
Contributor Author

ihsinme commented Nov 22, 2022

There still appears to be a problem with the way GVN::getAnExpr is used, possibly in mayBeReturnValue.

thanks for the idea i will think about it

@ihsinme
Copy link
Contributor Author

ihsinme commented Nov 24, 2022

hi @geoffw0. fixed predicate mayBeReturnValue and did 2 dry runs,
https://lgtm.com/query/284701396232437153/
https://lgtm.com/query/7034272461887222108/
i think now i need to re-look at your suggestion to use pragma[only_bind_into](a) = getValueOperand(x, y, z)

@ihsinme
Copy link
Contributor Author

ihsinme commented Dec 3, 2022

hi @geoffw0. fixed predicate mayBeReturnValue and did 2 dry runs, https://lgtm.com/query/284701396232437153/ https://lgtm.com/query/7034272461887222108/ i think now i need to re-look at your suggestion to use pragma[only_bind_into](a) = getValueOperand(x, y, z)

@geoffw0 I'm afraid you won't see it. lgtm.com will be closed.

@geoffw0
Copy link
Contributor

geoffw0 commented Dec 5, 2022

lgtm.com will be closed.

I know, I feel a bit sad about it given the amount I've used the platform. Still, code scanning lives on...


This branch is still timing out on a little over half of the runs. I feel like we're getting closer but we're not quite there.

@ihsinme
Copy link
Contributor Author

ihsinme commented Dec 17, 2022

I improved the query, now all projects from my collection are scanned.
@geoffw0 look please.

@ihsinme
Copy link
Contributor Author

ihsinme commented Jan 12, 2023

@geoffw0 take a look, please

@geoffw0
Copy link
Contributor

geoffw0 commented Jan 24, 2023

Hi @ihsinme - the new version is definitely an improvement on smaller projects but still doesn't work well at all on medium-to-large projects. It looks like the issue has to do with the liberal use of global value numbering in this query.

I have struggled to find any significant improvements without just removing some of these GVN uses. However I am not sure which ones are essential for getting good results from this query and which ones are not.

@ihsinme
Copy link
Contributor Author

ihsinme commented Jan 26, 2023

@geoffw0 thanks for the answer.
please give me the names of the projects where you see performance issues.
I don't see any problems in all my collections.

@geoffw0
Copy link
Contributor

geoffw0 commented Jan 26, 2023

The query finished on a little under half the projects I ran it on. Some (arbitrary) examples where it didn't finish: apple/cups, fish-shell/fish-shell and vim/vim.

@ihsinme
Copy link
Contributor Author

ihsinme commented Feb 6, 2023

Good afternoon @geoffw0 .
In the release, I missed one inline, so the assembly turned out to be not the same as in my tests, the projects that you sent me are not assembled, although not quickly.
in order to move this query forward, I propose to limit the condition in predicate checkConditions1, so we lose 5 simple detects from the test file, but the speed of work will increase significantly?

@geoffw0
Copy link
Contributor

geoffw0 commented Feb 7, 2023

in order to move this query forward, I propose to limit the condition in predicate checkConditions1, so we lose 5 simple detects from the test file, but the speed of work will increase significantly?

That sounds like a good tradeoff. Feel free to leave comments describing the removed logic in case we decide to repair it in future.

@ihsinme
Copy link
Contributor Author

ihsinme commented Feb 12, 2023

@geoffw0 could you tell me if the current version of the request meets the requirements for speed.
if it is, then i will fix the test file.
If not, I'll try to limit the request more.

Thank you.

@geoffw0
Copy link
Contributor

geoffw0 commented Feb 17, 2023

Hi @ihsinme , yes, this is performing nicely now and completed on all the projects I tried it on! Please do update the tests then I'll run the PR checks.

@ihsinme
Copy link
Contributor Author

ihsinme commented Feb 19, 2023

@geoffw0 i fixed the test suite.

@github-actions
Copy link
Contributor

QHelp previews:

cpp/ql/src/experimental/Security/CWE/CWE-369/DivideByZeroUsingReturnValue.qhelp

Divide by zero using return value

Possible cases of division by zero when using the return value from functions.

Example

The following example shows the use of a function with an error when using the return value and without an error.

...
  a = getc(f);
  if (a < 123) ret = 123/a; // BAD
...
  if (a != 0) ret = 123/a; // GOOD
...

References

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests passed, performance is good, I'm happy with this PR!

Thank you for your contribution.

@ihsinme
Copy link
Contributor Author

ihsinme commented Mar 3, 2023

@geoffw0 This PR is not merged, can I do anything?

@geoffw0 geoffw0 merged commit 7b596f4 into github:main Mar 3, 2023
@geoffw0
Copy link
Contributor

geoffw0 commented Mar 3, 2023

Sorry, I had approved it, I thought you could merge it.

Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants