Description
I am trying to extract functions VariableCall
s may want to call, in my kernel database... And this is my query:
predicate relatedFunctionApproximate(VariableCall vc, Function func) {
count(vc.getAnArgument()) = count(func.getAParameter()) and
forall(
int index | exists(vc.getArgument(index)) | sameType( vc.getArgument(index), func.getParameter(index) )
)
}
class FunctionPointerFlow extends DataFlow4::Configuration {
VariableCall vc;
FunctionPointerFlow() { this = "FunctionPointerFlow" }
override predicate isSource(DataFlow::Node node) {
node.asExpr() instanceof FunctionAccess and
relatedFunctionApproximate(vc, node.asExpr().(FunctionAccess).getTarget())
}
override predicate isSink(DataFlow::Node node) {
node.asExpr().(VariableAccess).getType() instanceof FunctionPointerType and
vc.getExpr() = node.asExpr()
}
//...
}
class FunctionPointerVariableAccess extends VariableAccess {
FunctionPointerVariableAccess() {
this.getType() instanceof FunctionPointerType
}
}
Function traceFunctionPointer(FunctionPointerVariableAccess fp) {
exists(
DataFlow::Node src, DataFlow::Node sk, FunctionAccess source, FunctionPointerFlow config |
src.asExpr() = source and sk.asExpr() = fp and
result = source.getTarget() and
config.hasFlow(src, sk)
)
}
predicate relatedFunction(VariableCall vc, Function func) {
func = traceFunctionPointer(vc.getExpr())
}
As you can see, I tried to reduce the set of Source/Sink via some restriction, but its performance is still too bad to work normally.
For example, there is 100 VariableCall
s in my database, and for each VariableCall, it has 3 related canditate functions.
If I want to figure out these 100 VariableCall
's related functions within one DataFlow configuration, like my code posted above, there will be 300 sources and 100 sinks, due to bottom-up fashion of codeql, dataflow procedure will have to be repeated for 300 * 100 times to get the results.
I already know that one sink node can only relates to 3 source nodes, and this infomation can be fetched with relatedFunctionApproximate()
, then I can just cut this whole process into 100 little parts, one Dataflow config for one specific sink node, and its corresponding source nodes count 3. In this situation, the total dataflow procedure will only have to be repeated for 3 * 100 times.
But obviously, coding 100 dataflow configs is just not practical, what should I do to optimize this query? how can I let the dataflow procedure understand the relationship between source and sink?Is there any useful feature I can use to solve this problem?
Please help me T-T