Description
Hi 👋,
I'm trying to pinpoint CodeQLs limitations with finding vulnerabilities in certain vulnerable npm packages.
I've noticed that breaks in the call graph (CallNodes without resolved callees) seem to be a good place to start looking.
I have a query to find CallNodes without resolved callees, which works fairly well, except I've noticed that for some results data flow continues even though the callee is missing.
For instance path.join()
is a CallNode without a resolved callee for cases in my data, except it does not break data flow.
for instance in this simple example:
const path = require('path');
function run() {
let input = location.hash.substring(1);
foo(path.join(input, ''));
}
function foo(input) {
eval(input);
}
run();
path.join()
does not break dataflow, the vulnerability is still caught. I've noticed that most if not all of these cases seem to be popular methods that I believe have rules for propagation in the libraries. I think path.join()
s logic is here
whereas in this example:
const path = require('path'); // so CodeQL can handle callbacks
function getUserInput(input) {
return input
}
function run(callback) {
let input = location.hash.substring(1);
//store callback func in array
let callbacks = []
callbacks.push(callback)
//call the callback
foo(callbacks[0](input))
}
function foo(input) {
eval(input);
}
run(getUserInput);
The CallNode callbacks[0](input)
missing a callee causes data flow to stop and the vulnerability to be missed.
My issue is that I want to filter out CallNodes that are technically missed but still allow data flow like path.join()
, since they are not relevant to the vulnerability being missed.
I've tried to solve this by checking if the CallNodes contain any flow edges emanating from them:
predicate filtered_call(DataFlow::CallNode node) {
node.getCalleeName() = "require"
or node.getReceiver().toString() = "console"
}
predicate missing_callee_with_flow_step(DataFlow::CallNode callee, DataFlow::Node next) {
DataFlow::AdditionalFlowStep::step(callee, next)
or DataFlow::SharedFlowStep::step(callee, next)
}
from DataFlow::CallNode node
where not exists(node.getACallee(0))
and not filtered_call(node)
and not( missing_callee_with_flow_step(node, _))
select
node,
"missing callee from call node " + node.toString() +
" | Callee Name: " + node.getCalleeName() +
" | found at line: " + node.getStartLine() +
" | column: " + node.getStartColumn() +
" | file: " + node.getFile().getAbsolutePath()
Unfortunately this doesn't seem to work well. I'm having a hard time finding a way to filter out these CallNodes.
I suppose I could manually filter out all these nodes but that would be very tedious and also inaccurate in cases where duplicate receiver & callee names exists.
My main question is what can I add to my query to filter out all CallNodes that still propagate data flow? or
How can I find all CallNodes that truly break data flow?
Thanks!