Skip to content

How to find CallNodes that break data flow? #19037

Closed
@jghebre

Description

@jghebre

Hi 👋,

I'm trying to pinpoint CodeQLs limitations with finding vulnerabilities in certain vulnerable npm packages.
I've noticed that breaks in the call graph (CallNodes without resolved callees) seem to be a good place to start looking.

I have a query to find CallNodes without resolved callees, which works fairly well, except I've noticed that for some results data flow continues even though the callee is missing.

For instance path.join() is a CallNode without a resolved callee for cases in my data, except it does not break data flow.

for instance in this simple example:

const path = require('path');

function run() {
    let input = location.hash.substring(1);
    foo(path.join(input, ''));
}

function foo(input) {
    eval(input);
}

run();

path.join() does not break dataflow, the vulnerability is still caught. I've noticed that most if not all of these cases seem to be popular methods that I believe have rules for propagation in the libraries. I think path.join()s logic is here

whereas in this example:

const path = require('path'); // so CodeQL can handle callbacks

function getUserInput(input) {
    return input
}

function run(callback) {
    let input = location.hash.substring(1);

    //store callback func in array
    let callbacks = []
    callbacks.push(callback) 

    //call the callback
    foo(callbacks[0](input))
}

function foo(input) {
    eval(input);
}

run(getUserInput);

The CallNode callbacks[0](input) missing a callee causes data flow to stop and the vulnerability to be missed.

My issue is that I want to filter out CallNodes that are technically missed but still allow data flow like path.join(), since they are not relevant to the vulnerability being missed.

I've tried to solve this by checking if the CallNodes contain any flow edges emanating from them:

 predicate filtered_call(DataFlow::CallNode node) {
  node.getCalleeName() = "require" 
  or node.getReceiver().toString() = "console"
 }

 predicate missing_callee_with_flow_step(DataFlow::CallNode callee, DataFlow::Node next) {
   DataFlow::AdditionalFlowStep::step(callee, next)
  or DataFlow::SharedFlowStep::step(callee, next)
 }
 from DataFlow::CallNode node
 where not exists(node.getACallee(0))
 and not filtered_call(node)
 and not( missing_callee_with_flow_step(node, _))
 select 
   node, 
   "missing callee from call node " + node.toString() + 
   " | Callee Name: " + node.getCalleeName() + 
   " | found at line: " + node.getStartLine() + 
   " | column: " + node.getStartColumn() + 
   " | file: " + node.getFile().getAbsolutePath()

Unfortunately this doesn't seem to work well. I'm having a hard time finding a way to filter out these CallNodes.

I suppose I could manually filter out all these nodes but that would be very tedious and also inaccurate in cases where duplicate receiver & callee names exists.

My main question is what can I add to my query to filter out all CallNodes that still propagate data flow? or
How can I find all CallNodes that truly break data flow?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions