Skip to content

Control flow and Pointer tainting

Wei Ming Khoo edited this page Aug 5, 2020 · 4 revisions

There are at least three types of taint dependency. Let x be tainted.

  1. Direct/Data-flow dependence, e.g. y = x
  2. Indirect/Control-flow dependence, e.g. if(x){ y = 2; }
  3. Address/Pointer dependence, e.g. y = a[x] and y = *x

Taintgrind, which follows Valgrind memcheck, only implements 1, not 2 or 3. This means it will under-taint, i.e. it will miss some dependencies. On the other hand, it is tricky to handle 2 and 3, as it may lead to over-tainting, i.e. reporting dependencies where there is none.

Let's take an example (http://bitblaze.cs.berkeley.edu/papers/dta%2B%2B-ndss11.pdf Fig. 3):

    char output[256];
    long input = user_input();
    long len = 0;
    if (input > 100) {
        strcpy(output, "large");
        len = 5;
    } else {
        strcpy(output, "small");
        len = 5;
    }
    print_output(output, len);

In this case, len has a control-flow dependence on input. However, len is not dependent on input because it is 5 no matter which branch is taken, so transferring taint from input to len may be considered incorrect.

Other Dynamic Taint Analysis Tools

Some other dynamic taint analysis tools I'm aware of, but have not tried (and the info may not be up-to-date):

  1. libdft: According to their paper, "in this work, we do not consider cases of implicit data flow that are in accordance with previous work on the subject". (http://nsl.cs.columbia.edu/papers/2012/libdft.vee12.pdf, Pg. 2)
  2. triton: "Dynamic Taint Analysis. (DTA) aims to detect which data and instructions along an execution depend on user input. We consider direct tainting.(https://triton.quarkslab.com/files/DIMVA2018-deobfuscation-salwan-bardin-potet.pdf, Pg. 6)
  3. bap: If you want to experiment with and implement different taint rules, I hear that bap will let you do that (but again, I have not tried it).
  4. polytracker: Polytracker is an LLVM pass that instruments the programs it compiles to track which bytes of an input file are operated on by which functions. It outputs a JSON file containing the function-to-input-bytes mapping.

For more information on the theoretical aspects of taint analysis, check out https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf.