C/C++: how to optimize function pointer tracing?

I am trying to extract functions `VariableCall`s may want to call, in my kernel database... And this is my query:

```ql
predicate relatedFunctionApproximate(VariableCall vc, Function func) {
    count(vc.getAnArgument()) = count(func.getAParameter()) and
    forall(
        int index | exists(vc.getArgument(index)) | sameType( vc.getArgument(index), func.getParameter(index) )
    ) 
}

class FunctionPointerFlow extends DataFlow4::Configuration {
    VariableCall vc;
    FunctionPointerFlow() { this = "FunctionPointerFlow" }
    
    override predicate isSource(DataFlow::Node node) {
        node.asExpr() instanceof FunctionAccess and
        relatedFunctionApproximate(vc, node.asExpr().(FunctionAccess).getTarget())
    }
    
    override predicate isSink(DataFlow::Node node) {
        node.asExpr().(VariableAccess).getType() instanceof FunctionPointerType and 
        vc.getExpr() = node.asExpr()
    }
    //...
}

class FunctionPointerVariableAccess extends VariableAccess {
    FunctionPointerVariableAccess() {
        this.getType() instanceof FunctionPointerType
    }
}

Function traceFunctionPointer(FunctionPointerVariableAccess fp) {
    exists(
        DataFlow::Node src, DataFlow::Node sk, FunctionAccess source, FunctionPointerFlow config |
        src.asExpr() = source and sk.asExpr() = fp and
        result = source.getTarget() and
        config.hasFlow(src, sk)
        )
}

predicate relatedFunction(VariableCall vc, Function func) {
    func = traceFunctionPointer(vc.getExpr())
}
```

As you can see, I tried to reduce the set of Source/Sink via some restriction, but its performance is still too bad to work normally.

For example, there is 100 `VariableCall`s in my database, and for each VariableCall, it has 3 related canditate functions. 

If I want to figure out these 100 `VariableCall`'s related functions within **one** DataFlow configuration, like my code posted above, there will be 300 sources and 100 sinks, due to bottom-up fashion of codeql, dataflow procedure will have to be repeated for 300 * 100 times to get the results.

I already know that one sink node can only relates to 3 source nodes, and this infomation can be fetched with `relatedFunctionApproximate()`, then I can just cut this whole process into 100 little parts, one Dataflow config for one specific sink node, and its corresponding source nodes count 3. In this situation, the total dataflow procedure will only have to be repeated for 3 * 100 times.

But obviously, coding 100 dataflow configs is just not practical, what should I do to optimize this query? how can I let the dataflow procedure understand the relationship between source and sink?Is there any useful feature I can use to solve this problem?

Please help me  T-T

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C/C++: how to optimize function pointer tracing? #13198

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

C/C++: how to optimize function pointer tracing? #13198

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions