Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ALLEGREX architecture to snowman #36

Open
hlide opened this issue Aug 5, 2015 · 129 comments
Open

Adding ALLEGREX architecture to snowman #36

hlide opened this issue Aug 5, 2015 · 129 comments

Comments

@hlide
Copy link

hlide commented Aug 5, 2015

I may plan to add a new architecture instead of using the current MIPS architecture which is a work in progress. because ALLEGREX is not recognized by capstone framework and the fact the latter is using LLVM makes the implementation of ALLEGREX too complex. There are subtle differences which make the use of MIPS architecture not viable for decompiling ALLEGREX code.

So, I am about to provide a specific disassembler (the one provided by pspdecompiler, based on prxtools one, but with addition of a decomposer) for the architecture analyser. It means handling of all instructions including VFPU.

I am pretty sure people from uOFW project may be interested as well. But I am also pretty sure that it will be hard to decompile kernel modules because they may use some tricks which are not ABI compliant, so I am expecting for more tasks to do than making a simple disassembler/analyzer.

As for the author in http://lists.derevenets.com/pipermail/snowman/2015-August/000002.html, it may be great that he/she contributes as well here (PRX handling).

@hlide
Copy link
Author

hlide commented Aug 5, 2015

By the way, if a MIPS code is accessing a memory slot which is not a variable in the stack, it should be imperatively kept in the c++. Especially, kernel modules have a large range of address for MMIO, the controllers of which may be triggered even if the access may look like a dummy one by the decompiler. Decompiler should not be smarter than the programmer because it does not know the context of such memory access.

@yegord
Copy link
Owner

yegord commented Aug 5, 2015

Following you reasoning, one can go far. :-) For example, the decompiler should not reconstruct expressions then: reconstruction of expressions involves reordering of operations, including memory accesses, which may change the semantics of the program. We can then just print IR in a C form (like LLVM C backend did, before it was removed), and go home. :-)

My proposal is to first find such examples, and then discuss what to do with them.

P.S. Assume you have a load to r1 from address r2 that is not used. What do you want to be generated for it? r1 = *r2;? Just *r2;?

@hlide
Copy link
Author

hlide commented Aug 5, 2015

*r2; is fine but I would expect something more like *((int8_t*)0xb8c0004); which could mean sending an ack to controller by reading at this address OR just reading a byte in a FIFO queue and discard it because it is just a byte we don't care. What I mean is if a memory access as a read or a write is done and is not relative to a stack, there is a reason to be here and to keep it. As for register or stack slots being assigned with a value of a memory access, they can be discarded but not the memory access. I may be wrong with my impression, but I had the impression snowman is discarding the memory access as well if it is not propagated, as if it was the same as a register.

@hlide
Copy link
Author

hlide commented Aug 5, 2015

And yes, you SHOULD never reorder the memory accesses (I mean those not involving with variable in stack) - remember a case I gave you where there was like an exchange between two memory slots : c++ result was wrong because it reverses order, preventing to keep the old value in safe place. And for MMIO, it is extremely sensitive to keep the order or you will mess up with the controller.

@yegord
Copy link
Owner

yegord commented Aug 5, 2015

One can here https://github.com/yegord/snowman/blob/master/src/nc/core/ir/cgen/DefinitionGenerator.cpp#L636, instead of returning null, recursively traverse assignment->right(), find there all memory accesses that you want to preserve, aggregate them into a single expression using, say, operator comma, and return an ExpressionStatement with this expression. If you really want this feature so much, you can prepare a patch.

@hlide
Copy link
Author

hlide commented Aug 5, 2015

Are memory accesses to variable in stack transformed into some virtual registers so I can distinguish them from the other memory accesses and keep only the latter?

@hlide
Copy link
Author

hlide commented Aug 5, 2015

Is there any reason to return a big expression with operator comma instead of a block of expressions? I mean the left value is discarded (lhs = rhs-> rhs). Oh you mean something like:
lhs = f(rhs1, rhs2, rhs3); --> rhs1, rhs2; instead of f(rhs1, rhs2, rhs3); when rhs1 and rhs2 are memory accesses to keep and lhs is discardable?

@yegord
Copy link
Owner

yegord commented Aug 5, 2015

On Wed, Aug 05, 2015 at 02:59:02PM -0700, hlide wrote:

Do memory accesses to variable in stack are transformed into some
virtual registers so I can distinguish them from the other memory
accesses and keep only the latter?

No. But you can query the memory location of a given term by calling
dataflow_.getMemoryLocation(term). By looking at the location's domain
you can find out, where the object identified by the term is: on the
stack, in the global memory, in a register (see MemoryDomain.h).

Yegor Derevenets

@hlide
Copy link
Author

hlide commented Aug 5, 2015

Ok, I will not work on it at once. Priority is to make the Allegrex analyzer to work and implement every instructions as many as possible.

@yegord
Copy link
Owner

yegord commented Aug 5, 2015

On Wed, Aug 05, 2015 at 03:06:28PM -0700, hlide wrote:

Oh you mean something like: lhs = f(rhs1, rhs2, rhs3); --> rhs1, rhs2; instead of f(rhs1, rhs2, rhs3); when rhs1 and rhs2 are
memory access to keep and lhs is discardable?
Yes.

In practice, the commas should be rarely needed.

You can keep the whole expression, if this is what you want to.

Ah, and yes, you will have to modify LivenessAnalyzer to call makeLive
on the terms representing these rhs1, rhs2, rhs3. But it is no harder
than modifying DefinitionGenerator.

And one more thing: for recursive descent through the terms you can use
Term::callOnChildren().

Yegor Derevenets

@nihilus
Copy link

nihilus commented Aug 6, 2015

Not reading thru all the post I'd say that @hlide is on to something and I'm to kinda into this subset of this 'something'

Bringing in the ALLEGREX ISA (as well as me @hlide talked about some Lexra ISA support) is more than welcomed by me even if it necessary by means split up me and @hlide into two diversions. I rest assure that @hlide will help and improve my (like he/she has done all the way down) understanding and vice-versa. And @hlide knows more than me for for sure 👍

@nihilus
Copy link

nihilus commented Aug 6, 2015

@yegord @hlide ... hate to admit this and it is 'off the record' but of us all @hlide choosed the most loving and kind avatar,.. :--)

@hlide you indicated before that you are in the central European timezone, is this correct (one of my brother's still use free.fr when working in Provence)?

@hlide
Copy link
Author

hlide commented Aug 6, 2015

@nihilus I'm in France and a french man despite of my avatar :-).

As for ALLEGREX, there are too many dialects for MIPS and capstone is pretty irritating and lacking when not used as a simple disassembler. Beside, VFPU is too specific and complex to handle it as a dialect for MIPS (VFPU is an awesome kind of proprietary coprocessor). But MIPS and ALLEGREX will benefit together.

@hlide
Copy link
Author

hlide commented Aug 6, 2015

Ok, disassembler seems ok:

image

CPU, FPU and VFPU instructions can be seen in this image. I also change the detection of architecture to be sure to select allegrex only if elfhdr.e_flagscontains a special value to tell the machine is ALLEGREX, so MIPS architecture could still be used as a fallback.

@nihilus
Copy link

nihilus commented Aug 6, 2015

@hlide is that capstone?

@hlide
Copy link
Author

hlide commented Aug 6, 2015

nope, AllegrexDisassembler has its own disassembler.

@nihilus
Copy link

nihilus commented Aug 6, 2015

@hlide neat that you integrated it.

@hlide
Copy link
Author

hlide commented Aug 7, 2015

@yegord I have an issue with branch likely instructions (B_xx_L) when generating AST.

See this example:

case I_BEQL: {
    AllegrexExpressionFactoryCallback then(factory, program->createBasicBlock(), instruction);
    _[
        jump(gpr(0) == gpr(1),
             (delayslot(then)[jump(imm(2))]).basicBlock(),
             directSuccessorButOne())
    ];
    break;
}

As you can see, this instruction NEVER jumps to directSuccessor (address + 4) but to directSuccessorButOne (address + 8). The instruction at address + 4 is inserted just before the jump as a branch delay slot when taken. I think it may fool the function IRGenerator::addJumpToDirectSuccessor.

@hlide
Copy link
Author

hlide commented Aug 7, 2015

@yegord more info here:

Source:

case I_BLTZL: {
    AllegrexExpressionFactoryCallback then(factory, program->createBasicBlock(), instruction);
    _[
        jump(signed_(gpr(0)) < constant(0),
             (delayslot(then)[jump(imm(1))]).basicBlock(),
             directSuccessorButOne())
    ];
}

image

image

diff --git a/src/nc/core/ir/cgen/DefinitionGenerator.cpp b/src/nc/core/ir/cgen/DefinitionGenerator.cpp
index c81c88c..9567cd8 100644
--- a/src/nc/core/ir/cgen/DefinitionGenerator.cpp
+++ b/src/nc/core/ir/cgen/DefinitionGenerator.cpp
@@ -24,6 +24,7 @@

 #include "DefinitionGenerator.h"

+#include <nc/common/LogToken.h>
 #include <nc/common/Foreach.h>
 #include <nc/common/Range.h>
 #include <nc/common/Unreachable.h>
@@ -507,6 +508,12 @@ std::unique_ptr<likec::Expression> DefinitionGenerator::makeExpression(const cfl
             std::unique_ptr<likec::Expression> expression;

             if (const Jump *jump = statement->asJump()) {
+
+                if (jump != basicNode->basicBlock()->getJump())
+                {
+                    nc::LogToken::instance()->error(QString("%1: %2").arg(jump->instruction()->addr(), -8, 16).arg(jump->instruction()->toString()));
+                }
+
                 assert(jump == basicNode->basicBlock()->getJump());

                 expression = makeExpression(jump->condition());
@@ -710,7 +717,6 @@ std::unique_ptr<likec::Statement> DefinitionGenerator::doMakeStatement(const Sta
                                         parent().makeType(parent().types().getType(returnValueTerm)),
                                         std::move(callOperator))));
                         }
-                        }
                     }
                 }
             }
auto delayslot = [&](AllegrexExpressionFactoryCallback & callback) -> AllegrexExpressionFactoryCallback & {
    auto delayslot = checked_cast<const AllegrexInstruction *>(instructions_->get(instruction->endAddr()).get());
    if (delayslot) {
        createStatements(callback, delayslot, program);
    }
    else {
        throw core::irgen::InvalidInstructionException(tr("Cannot find a delay slot at 0x%1.").arg(instruction->endAddr(), 0, 16));
    }
    return callback;
};

core::ir::BasicBlock *cachedDirectSuccessor = nullptr;
core::ir::BasicBlock *cachedNextDirectSuccessor = nullptr;
auto directSuccessor = [&]() -> core::ir::BasicBlock * {
    if (!cachedDirectSuccessor) {
        cachedDirectSuccessor = program->createBasicBlock(instruction->endAddr());
    }
    return cachedDirectSuccessor;
};
auto directSuccessorButOne = [&]() -> core::ir::BasicBlock * {
    if (!cachedNextDirectSuccessor) {
        cachedNextDirectSuccessor = program->createBasicBlock(instruction->endAddr() + instruction->size());
    }
    return cachedNextDirectSuccessor;
};

@hlide
Copy link
Author

hlide commented Aug 7, 2015

As expected, if I comment all branch likely instructions (B_xx_L), I can decompile:

image

@nihilus
Copy link

nihilus commented Aug 7, 2015

@hlide that should go for MIPS as well then... :-/ However cool and fast work.

@hlide
Copy link
Author

hlide commented Aug 7, 2015

@nihilus yeah the issue is the same for MIPS. And I hope @yegord knows a solution because I'm clueless about it.

@yegord
Copy link
Owner

yegord commented Aug 7, 2015

On Fri, Aug 07, 2015 at 05:20:14AM -0700, hlide wrote:

The instruction at address + 4 is inserted just before the jump as a
branch delay slot when taken. I think it may fool the function
IRGenerator::addJumpToDirectSuccessor.

ir::Program::createBasicBlock(ByteAddr) and cgen::DefinitionGenerator::
isDominating expect that the addresses of the instructions of the
statements in a single basic block constitute a non-descending sequence.

When generating statements for the delay-slot instruction, you should
pass the preceding instruction to the factory callback, to make these
statements appear to originate from this preceding instruction.

Yegor Derevenets

@hlide
Copy link
Author

hlide commented Aug 7, 2015

is it not what I am already doing with:

AllegrexExpressionFactoryCallback then(factory, program->createBasicBlock(), instruction);

?

instruction is the preceding instruction of delayslot, that is, the branch instruction.

Just for you information, branch instruction not unlikely appear to work fine:

case I_BEQ: {
    AllegrexExpressionFactoryCallback then(factory, program->createBasicBlock(), instruction);
    _[
        jump(gpr(0) == gpr(1),
             (delayslot(then)[jump(imm(2))]).basicBlock(),
             directSuccessor())
    ];
    break;
}

while this one doesn't work:

case I_BEQL: {
    AllegrexExpressionFactoryCallback then(factory, program->createBasicBlock(), instruction);
    _[
        jump(gpr(0) == gpr(1),
             (delayslot(then)[jump(imm(2))]).basicBlock(),
             directSuccessorButOne())
    ];
    break;
}

the only diffrence is about directSuccessor() and directSuccessorButOne() as the last argument for jump.

@yegord
Copy link
Owner

yegord commented Aug 8, 2015

On Fri, Aug 07, 2015 at 03:54:29PM -0700, hlide wrote:

is it now what I am already doing with:

AllegrexExpressionFactoryCallback then(factory, program->createBasicBlock(), instruction);

?
Yes. After rereading the code I got it.

I have pushed small well-formedness checks in 82f0a71. Do they fail?
If yes, it should be easy to find the problem source (e.g., print the
basic block with the problem). If no, you can give me the commit id, the
input file, and the steps to reproduce the problem — I can have a look.

Yegor Derevenets

@hlide
Copy link
Author

hlide commented Aug 8, 2015

Ok, I merged your changes. Now it also fails on branch not likely too.

I have this assert:

#ifndef NDEBUG
    /*
     * Check that jump is always the last instruction in a basic block.
     */
    foreach (auto basicBlock, program_->basicBlocks()) {
        foreach (auto statement, basicBlock->statements()) {
            if (auto jump = statement->asJump()) {
                if (jump != basicBlock->statements().back()) {
                    nc::LogToken::instance()->error(QString("%1: %2").arg(jump->instruction()->addr(), -8, 16).arg(jump->instruction()->toString()));
                    nc::LogToken::instance()->error(QString("jump: %1").arg(jump->toString()));
                    nc::LogToken::instance()->error(QString("basicBlock->statements().back(): %1").arg(basicBlock->statements().back()->toString()));
                }
                assert(jump == basicBlock->statements().back());
            }
        }
    }
#endif

Log:

[Info] Decompiling.
[Info] Creating intermediate representation of the program.
[Error] 890179c : blez       $s0, 0x08901844
[Error] jump: if (<1000:512..543> (signed)< 0x11) goto basic block 0xa88bb8d80 else goto basic block 0xa88bb8e10

[Error] basicBlock->statements().back(): goto address 0x8901844

[Error] basicBlock: basicBlock0xa88bb8510 [shape=box,label="Address: None\n<1000:0..31> = 0x0\nif (<1000:512..543> (signed)< 0x11) goto basic block 0xa88bb8d80 else goto basic block 0xa88bb8e10\ngoto address 0x8901844\n"];

I dumped jump and basicBlock->statements().back() and basicBlock.

EDIT: added basicBlock.

@yegord
Copy link
Owner

yegord commented Aug 8, 2015

On Sat, Aug 08, 2015 at 01:12:08PM -0700, hlide wrote:

Log:

[Info] Decompiling.
[Info] Creating intermediate representation of the program.
[Error] 890179c : blez       $s0, 0x08901844
[Error] jump: if (<1000:512..543> (signed)< 0x11) goto basic block 0x9402543ac0 else goto basic block 0x9402544a80

[Error] basicBlock->statements().back(): goto address 0x8901844

I dumped jump and basicBlock->statements().back().

Can you print the whole basic block? I would like to see: all
statements, all instructions, whether the basic block has an address.

Yegor Derevenets

@hlide
Copy link
Author

hlide commented Aug 8, 2015

done after edited

@hlide
Copy link
Author

hlide commented Aug 8, 2015

always check my posts on github, I often edited them again (mistakes, more info, etc.)

@nihilus
Copy link

nihilus commented Aug 9, 2015

Yes, this is would be nice to transform it into ulw / usw.

@hlide
Copy link
Author

hlide commented Aug 9, 2015

On the other hand, I wonder if my disassemble/decomposer should be able to handle a sequence of two instructions with one instruction ID. Snowman is able to handle instructions with variable size. So I could handle LA and ULW, and USW with an instruction ID and a 8-byte size instead of the usual 4-byte size.

@nihilus
Copy link

nihilus commented Aug 9, 2015

Is there any easy way to log what 'auto memval = *(ea & constant(-4));' actually is?

@hlide
Copy link
Author

hlide commented Aug 9, 2015

memval.toString() ?

@yegord
Copy link
Owner

yegord commented Aug 9, 2015

Concerning commenting out well-formedness checks — fix your code. :)
Concerning LWL/LWR pairs, I would make a separate pass transforming them into usual loads. (Similar to how implicit zero-extend on x86 is handled.)
Concerning memval, you can make an IR statement with it, using the usual [] stuff, and then print the statement, using operator<< or Statement::toString().

@hlide
Copy link
Author

hlide commented Aug 9, 2015

oh you mean to output on log window, I added some stuff to do so...

From 6d67bbf9164ac422dba55a009e0865217ae6a53c Mon Sep 17 00:00:00 2001
From: hlide
Date: Mon, 10 Aug 2015 01:03:04 +0200
Subject: [PATCH] Added log facility

---
 src/nc/common/LogToken.h  | 2 ++
 src/nc/gui/MainWindow.cpp | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/nc/common/LogToken.h b/src/nc/common/LogToken.h
index 01a705b..2135e6c 100644
--- a/src/nc/common/LogToken.h
+++ b/src/nc/common/LogToken.h
@@ -40,6 +40,8 @@ class LogToken {
     std::shared_ptr<Logger> logger_;

 public:
+    static LogToken* instance(LogToken* token = nullptr) { static LogToken* token___ = nullptr; if (token && !token___) token___ = token; return token___; }
+
     /**
      * Default constructor.
      *
diff --git a/src/nc/gui/MainWindow.cpp b/src/nc/gui/MainWindow.cpp
index 082c1f6..0450595 100644
--- a/src/nc/gui/MainWindow.cpp
+++ b/src/nc/gui/MainWindow.cpp
@@ -86,7 +86,7 @@ MainWindow::MainWindow(Branding branding, QWidget *parent):
     connect(logger.get(), SIGNAL(onMessage(const QString &)), progressDialog_, SLOT(setLabelText(const QString &)));
     connect(logger.get(), SIGNAL(onMessage(const QString &)), this, SLOT(setStatusText(const QString &)));

-    logToken_ = LogToken(logger);
+    LogToken::instance(&(logToken_ = LogToken(logger)));

     settings_ = new QSettings(branding_.organizationName(), branding_.applicationName(), this);
     loadSettings();
-- 
1.9.4.msysgit.2

@nihilus
Copy link

nihilus commented Aug 9, 2015

ah, thx

@yegord
Copy link
Owner

yegord commented Aug 9, 2015

You could just use qDebug() — it's output is redirected to the log window.

@hlide
Copy link
Author

hlide commented Aug 9, 2015

@yegord I'm not sure I can fix the issue with sorted addresses because you need to put a delay slot instruction (address + 4) BEFORE the real branch instruction (address).

@nihilus
Copy link

nihilus commented Aug 9, 2015

@yegord I've also started to get 'Assertion failed: (statement == basicBlock->statements().back()), function generate, file /Users/nietzsche/Downloads/snowman/src/nc/core/irgen/IRGenerator.cpp, line 110.'

@hlide
Copy link
Author

hlide commented Aug 9, 2015

@nihilus the issue you have is almost the same I had with Allegrex.

@yegord
Copy link
Owner

yegord commented Aug 9, 2015

@hlide You need to mark the statements of the delay slot instruction that you put before the statements of the delay slot owner as belonging to the delay slot owner. In my proposed patch I do that.

@nihilus This is good. Fix your code.

@hlide
Copy link
Author

hlide commented Aug 9, 2015

You proposed patch did not work so I reverted while keeping some ideas of yours. But I will try the thing about delay slot owner.

@hlide
Copy link
Author

hlide commented Aug 9, 2015

@nihilus when sober (:-)), just read above all the posts regarding issue with branch instructions - they are the same for MIPS.

@hlide
Copy link
Author

hlide commented Aug 9, 2015

Good news. With all the checking activated, it passes now.

@yegord
Copy link
Owner

yegord commented Aug 9, 2015

Just a small comment on why statements in a basic block should be sorted by instruction's addresses.
IRGenerator sometimes wants to split an existing basic block, when there is a jump into the middle of it.
So, if you have a basic block with instructions with addresses 1, 2, 3, 2, 1 and a jump to address 2, it is not very clear, how to split the basic block, and what the boundaries of the new two basic blocks should be.
I think, in case of MIPS, you can only have something like 1, 2, 4, 3, and the current implementation of splitting will de-facto work, but I would prefer to stay on the side where things are simple.

@uxmal
Copy link

uxmal commented Aug 10, 2015

You've probably studied this already, but here is a paper [http://www.cs.tufts.edu/~nr/pubs/xtoplas-acm.pdf] details how to cope with delay slots. I believe this is what the boomerang folks based their implementation on.

@yegord
Copy link
Owner

yegord commented Aug 10, 2015

Hmm. In the paper the authors even try to give semantics to jumps in delay slots, something that seems to be not needed for MIPS:

If a branch or jump instruction is placed in the branch delay slot, the operation of both instructions is undefined.

http://electronics.stackexchange.com/questions/28444/mips-pic32-branch-vs-branch-likely

@hlide
Copy link
Author

hlide commented Aug 10, 2015

@yegord that is fine, I know how to handle correctly.

As for Allegrex, you may find some games using a branch instruction inside - surprise ! - a branch instruction. I know what happens but that's specific to Allegrex which has a 7-stage pipeline. Bxx[L] has two bubbles (three cycles of latency) and Jxx has one bubble (two cycles of latency). If both nested branch isntructions are taken-able, the second is taken for all combinations except for one :

def interpret_delay_slot(pc, is_cond):
    insn = fetch(pc)    
    if insn.is_cond_branch() is True:
        if insn.test_cond_branch() is True:
            return insn
        else:
            return None
    elif insn.is_uncond_branch() is true:
        if is_cond is false:
            return insn 
        else:
            return None
    else:
        ...

def interpret(pc, delay_slot):
    insn = fetch(pc)
    if insn.is_cond_branch() is True:
        if insn.test_cond_branch() is True:       
            if delay_slot is False:
                delay_slot_insn = interpret_delay_slot(pc + 4, True)
                if delay_slot_insn is not None:
                    return delay_slot_insn.target_branch()
            return insn.target_branch()
        else:
            return pc + 4
    elif insn.is_uncond_branch() is true:
        if delay_slot is False:
            delay_slot_insn = interpret_delay_slot(pc + 4, False)
            if delay_slot_insn is not None:
                return delay_slot_insn.target_branch()
        return insn.target_branch()
    else:
        ...
  1. All conditional branch instructions included COP1 and COP2 have one delay slot + 2 bubbles when branch is taken.

1.1) When not taken, the next instruction is executed as a normal instruction, not as delay slot instruction.

1.2) When taken, if its delay slot instruction is also a conditional branch instruction which is taken, only the second instruction jumps to its target.

1.3) When taken, if its delay slot instruction is a unconditional branch instruction, only the first instruction jumps to its target.

  1. All unconditional branch instructions have one delay slot + 1 bubble when branch is taken.

2.1) When not taken, the next instruction is executed as a normal instruction, not as delay slot instruction.

2.2) When taken, if its delay slot instruction is a branch instruction which is taken, only the second instruction jumps to its target.

@nihilus
Copy link

nihilus commented Aug 10, 2015

@uxmal: No we did not based boomerang upon that paper (or should I say van Emmerik)... Every machine is represented in a language called SLED invented by Norman Ramsey and the NJ Machine Code Toolkit as found here: https://www.cs.tufts.edu/~nr/toolkit/

from there everything will get an immediate representation which is basically a minimalistic Turning-machine. Etcetc.

@hlide
Copy link
Author

hlide commented Aug 11, 2015

Interesting stuff. Is there a git-mirrored Boomerang repository? and for this toolkit? they may give some ideas for a multi-platform assembler/disassembler with a focus upon composer/decomposer.

@hlide
Copy link
Author

hlide commented Aug 11, 2015

OFF TOPIC MODE: on

I was starting a run-time assembler with kinda of composer in pure c++ template here but x86 is a horrid beast to tame :(.

OFF TOPIC: off

@uxmal
Copy link

uxmal commented Aug 11, 2015

There are many Boomerang repositories on github, here's one: https://github.com/nemerle/boomerang

@nihilus
Copy link

nihilus commented Aug 11, 2015

@uxmul yes but the official site is at http://boomerang.SF.net however I am looking for a new admin.

@uxmal
Copy link

uxmal commented Aug 11, 2015

Is any active development being done on boomerang? I see a lot of punters cloning the project but it seems to me that they're mostly doing code polishing, but no actual development.

@nihilus
Copy link

nihilus commented Aug 11, 2015

@uxmal: I asked nermerle to take over but he didnt respond. No I dont do anything activly on it.

It is very accurate but unstable and has got no smart pointers etc.

IMO using Capstone with Snowman is much easier to get things working since the NJMC ML is outdated and I cannot get it running natively on OS X. Ramsey is too occupied to care about it. Etc etc.

@nihilus
Copy link

nihilus commented Aug 11, 2015

@hlide see @uxmal :-) Yes, boomerang have some nice features like function recognition based on C-header files.

There is also a branch for a plugin-API, which would be nice to have for snowman as well.

However it does implement it own object parsers like Snowman instead of reusing code.

There is a static linux binary for NJML / NJMC and some MIPS-stubs there which wrote ages ago.

@yegord
Copy link
Owner

yegord commented Aug 11, 2015

On Mon, Aug 10, 2015 at 02:56:13PM -0700, Markus Gothe wrote:

@yegord 'Concerning memval, you can make an IR statement with it,
using the usual [] stuff, and then print the statement, using
operator<< or Statement::toString().'

Could you please give me a hint. I included QDebug etc... But how to
convert memval to a statement :-/?

Something like:

_[*zero ^= memval];
qDebug() << _.basicBlock()->statements()->back()->asAssignment()
->right()->toString();

Or, without adding to the basic block:

qDebug() << factory.createStatement(*zero ^= memval)
->asAssignment()->right()->toString();

Yegor Derevenets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants