Skip to content

C/C++: how to optimize function pointer tracing? #13198

Open
@iiins0mn1a

Description

@iiins0mn1a

I am trying to extract functions VariableCalls may want to call, in my kernel database... And this is my query:

predicate relatedFunctionApproximate(VariableCall vc, Function func) {
    count(vc.getAnArgument()) = count(func.getAParameter()) and
    forall(
        int index | exists(vc.getArgument(index)) | sameType( vc.getArgument(index), func.getParameter(index) )
    ) 
}

class FunctionPointerFlow extends DataFlow4::Configuration {
    VariableCall vc;
    FunctionPointerFlow() { this = "FunctionPointerFlow" }
    
    override predicate isSource(DataFlow::Node node) {
        node.asExpr() instanceof FunctionAccess and
        relatedFunctionApproximate(vc, node.asExpr().(FunctionAccess).getTarget())
    }
    
    override predicate isSink(DataFlow::Node node) {
        node.asExpr().(VariableAccess).getType() instanceof FunctionPointerType and 
        vc.getExpr() = node.asExpr()
    }
    //...
}

class FunctionPointerVariableAccess extends VariableAccess {
    FunctionPointerVariableAccess() {
        this.getType() instanceof FunctionPointerType
    }
}

Function traceFunctionPointer(FunctionPointerVariableAccess fp) {
    exists(
        DataFlow::Node src, DataFlow::Node sk, FunctionAccess source, FunctionPointerFlow config |
        src.asExpr() = source and sk.asExpr() = fp and
        result = source.getTarget() and
        config.hasFlow(src, sk)
        )
}

predicate relatedFunction(VariableCall vc, Function func) {
    func = traceFunctionPointer(vc.getExpr())
}

As you can see, I tried to reduce the set of Source/Sink via some restriction, but its performance is still too bad to work normally.

For example, there is 100 VariableCalls in my database, and for each VariableCall, it has 3 related canditate functions.

If I want to figure out these 100 VariableCall's related functions within one DataFlow configuration, like my code posted above, there will be 300 sources and 100 sinks, due to bottom-up fashion of codeql, dataflow procedure will have to be repeated for 300 * 100 times to get the results.

I already know that one sink node can only relates to 3 source nodes, and this infomation can be fetched with relatedFunctionApproximate(), then I can just cut this whole process into 100 little parts, one Dataflow config for one specific sink node, and its corresponding source nodes count 3. In this situation, the total dataflow procedure will only have to be repeated for 3 * 100 times.

But obviously, coding 100 dataflow configs is just not practical, what should I do to optimize this query? how can I let the dataflow procedure understand the relationship between source and sink?Is there any useful feature I can use to solve this problem?

Please help me T-T

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions