Skip to content

C++: Model indirect data flow through external functions #19151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fbesler opened this issue Mar 28, 2025 · 2 comments
Closed

C++: Model indirect data flow through external functions #19151

fbesler opened this issue Mar 28, 2025 · 2 comments
Labels
question Further information is requested

Comments

@fbesler
Copy link

fbesler commented Mar 28, 2025

I'm trying to model indirect flow through external functions, similar to the example below. I want to follow the taint from taint_source to taint_sink through process_taint and process_taint2. Therefore, I need to model that the taint of the data member is propagated to the outputs.

struct S {
  int data;
  int dummy
};

int taint_source();

S* process_taint(S* input);
void process_taint2(S* input, S* output);

void taint_sink(int tainted);

void df1() {
  S* s = new S();
  S* t;

  s->data = taint_source();

  process_taint2(s, t);

  taint_sink(t->data);
  taint_sink(t->dummy);
}

void df2() {
  S* u = new S();
  S* v;

  u->data = taint_source();

  v = process_taint(u);

  taint_sink(v->data);
  taint_sink(v->dummy);
}

int main(int argc, char* argv[]) {
  df1();
  df2();

  return 0;
}

I use this basic query for the example

/**
 * @kind path-problem
 */

import cpp
import semmle.code.cpp.dataflow.new.TaintTracking
import MyFlow::PathGraph

module MyFlowConf implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    source.asExpr() = any(Call c | c.getTarget().hasName("taint_source))
  }

  predicate isSink(DataFlow::Node sink) {
    sink.asExpr() = any(Call c | c.getTarget().hasName("taint_sink)).getAnArgument()
  }
}

module MyFlow = TaintTracking::Global<MyFlowConf>;

from MyFlow::PathNode source, MyFlow::PathNode sink
where MyFlow::flowPath(source, sink)
select sink, source, sink, "Flow"

Using the following MaD I can get a correct taint flow.

extensions:
  - addsTo:
    pack: codeql/cpp-all
    extensible: summaryModel
  data:
    - ["", "", False, "process_taint", "", "", "Argument[*0].Field[data]", "ReturnValue[*].Field[data]", "taint", "manual"]
    - ["", "", False, "process_taint2", "", "", "Argument[*0].Field[data]", "Argument[*1].Field[data]", "taint", "manual"]

However, as far as I can see that would require to copy the rule for every member of S using the same pattern. For JS there seems to be an AnyMember keyword but it looks like this is not available in C++. Is there a wildcard to specify the same field/access path in the input and output?

Alternatively I tried to model it as an additional flow step like this

predicate isAdditionalFlowStep2(DataFlow::Node source, DataFlow::Node sink) {
  exists(Assignment a |
    a.getLValue().getAChild() = sink.asIndirectExpr()
    and a.getRValue() = source.asExpr()
    and source.asExpr().(Call).getTarget().hasName("taint_source")
  )
  or
  exists(Call c |
    c.getTarget().hasName("process_taint")
    and sink.asIndiretExpr() = c
    and source.asIndirectExpr() = c.getAnArgument()
  )
  or
  exists(Call c |
    c.getTarget().hasName("process_taint2")
    and source.asIndirectExpr() = c.getArgument(0)
    and sink.asDefiningArgument() = c.getArgument(1)
  )
  or
  source.asIndirectExpr() = sink.asExpr().(FieldAccess).getQualifier()
}

Which finds the flows but also produces false positives where dummy is given to taint_sink as it should not be tainted.

How can I model the propagation of indirect data flow with the correct access path for external functions?

@fbesler fbesler added the question Further information is requested label Mar 28, 2025
@jketema
Copy link
Contributor

jketema commented Mar 31, 2025

Hi @fbesler

Thanks for your question. C++ does not have AnyMemberkeyword like JS does. However, from what you shared, I would assume that ignoring the fields and making your MaD rows "value-preserving" will work:

    - ["", "", False, "process_taint", "", "", "Argument[*0]", "ReturnValue[*]", "value", "manual"]
    - ["", "", False, "process_taint2", "", "", "Argument[*0]", "Argument[*1]", "value", "manual"]

This means that dataflow will remember that data is tainted after s->data = taint_source();, and (because we modeled them as value-preserving) the calls to process_taint and process_taint2 will also imply dataflow for all the members of S.

Unfortunately, isAdditionalFlowStep cannot be used here. There's an unwritten rule here, that when the access path is non-empty (i.e., we are tracking a field that’s been written into an object, but not yet read), then the flows added via isAdditionalFlowStep aren’t used. This also explains why you needed to add

source.asIndirectExpr() = sink.asExpr().(FieldAccess).getQualifier()

to get any kind of flow to a field.

With the above MaD row you should be able to delete your isAdditionalFlowStep altogether.

@fbesler
Copy link
Author

fbesler commented Apr 1, 2025

Thank you for the clarification. The solution works for my scenario.

@fbesler fbesler closed this as completed Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants