Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowman should use a more sophisticated disassembly technique than linear #10

Open
jrmuizel opened this issue Jun 3, 2015 · 9 comments

Comments

@jrmuizel
Copy link

jrmuizel commented Jun 3, 2015

ARM decompilation currently seems to suffer quite a bit from confusing code and data and this should help there.

There are lots of options for a better technique

  • mcsema has something better but I don't know much about it.
  • ByteWeight http://security.ece.cmu.edu/byteweight/ seems to be what BAP is switching or is at least is a good candidate.
  • Dagger uses MCObjectDisassembler (a recursive traversal disassembler) from LLVM which made it upstream but was removed. In my experience it did not work very well.
  • I haven't looked at what radare uses.
@yegord
Copy link
Owner

yegord commented Jun 3, 2015

I agree. I would try something like:

  1. Scan through the executable section, try to disassemble N instructions in a row (you can also use various hints, e.g., symbols and the executable's entry point, for the starting points).
  2. If disassembly succeeds, run recursive traversal from the address of the first instruction.

The traversal I have already implemented once, although never got to actually using it: 7f1e836. The traversal constructs control-flow graph on the fly and uses DataflowAnalyzer to perform abstract interpretation, so, it should be able to tell you the jump destinations, in particular, switches from/to THUMB mode.

One can take the above code as a starting point and do some experiments with it.

@nihilus
Copy link

nihilus commented Jun 16, 2015

Well couldnt this already piggyback on IDA? However a purely free ARM decompiler is welcomed.

@yegord
Copy link
Owner

yegord commented Jun 16, 2015

IDA already knows the ranges of addresses belonging to a function. Not sure, although, if it includes data into these ranges. If it does not, the IDA plug-in should not have problems with interpreting data as code. If it does, maybe we need to find where the instructions exactly are (IDA has getFlags() function for this), and update IdaFrontend::functionAddresses() to report only ranges of addresses containing executable code.

But this does not dismiss the need in a better discovery of the code.

@yegord
Copy link
Owner

yegord commented Jul 12, 2015

seems to suffer quite a bit from confusing code

Can you provide an example?

@hlide
Copy link

hlide commented Jul 12, 2015

are you speaking about stuff like ROP gadget ?

@yegord
Copy link
Owner

yegord commented Jul 12, 2015

Related: #14 (comment)

yegord pushed a commit that referenced this issue Aug 9, 2015
@yegord
Copy link
Owner

yegord commented Oct 5, 2015

Related: #51

@nihilus
Copy link

nihilus commented Oct 5, 2015

Moreover #59 is an actual way to achieve this.

@yegord
Copy link
Owner

yegord commented Oct 5, 2015

No, it's not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants