-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding ALLEGREX architecture to snowman #36
Comments
By the way, if a MIPS code is accessing a memory slot which is not a variable in the stack, it should be imperatively kept in the c++. Especially, kernel modules have a large range of address for MMIO, the controllers of which may be triggered even if the access may look like a dummy one by the decompiler. Decompiler should not be smarter than the programmer because it does not know the context of such memory access. |
Following you reasoning, one can go far. :-) For example, the decompiler should not reconstruct expressions then: reconstruction of expressions involves reordering of operations, including memory accesses, which may change the semantics of the program. We can then just print IR in a C form (like LLVM C backend did, before it was removed), and go home. :-) My proposal is to first find such examples, and then discuss what to do with them. P.S. Assume you have a load to |
|
And yes, you SHOULD never reorder the memory accesses (I mean those not involving with variable in stack) - remember a case I gave you where there was like an exchange between two memory slots : c++ result was wrong because it reverses order, preventing to keep the old value in safe place. And for MMIO, it is extremely sensitive to keep the order or you will mess up with the controller. |
One can here https://github.com/yegord/snowman/blob/master/src/nc/core/ir/cgen/DefinitionGenerator.cpp#L636, instead of returning null, recursively traverse |
Are memory accesses to variable in stack transformed into some virtual registers so I can distinguish them from the other memory accesses and keep only the latter? |
Is there any reason to return a big expression with operator comma instead of a block of expressions? I mean the left value is discarded ( |
On Wed, Aug 05, 2015 at 02:59:02PM -0700, hlide wrote:
No. But you can query the memory location of a given term by calling Yegor Derevenets |
Ok, I will not work on it at once. Priority is to make the Allegrex analyzer to work and implement every instructions as many as possible. |
On Wed, Aug 05, 2015 at 03:06:28PM -0700, hlide wrote:
In practice, the commas should be rarely needed. You can keep the whole expression, if this is what you want to. Ah, and yes, you will have to modify LivenessAnalyzer to call makeLive And one more thing: for recursive descent through the terms you can use Yegor Derevenets |
Not reading thru all the post I'd say that @hlide is on to something and I'm to kinda into this subset of this 'something' Bringing in the ALLEGREX ISA (as well as me @hlide talked about some Lexra ISA support) is more than welcomed by me even if it necessary by means split up me and @hlide into two diversions. I rest assure that @hlide will help and improve my (like he/she has done all the way down) understanding and vice-versa. And @hlide knows more than me for for sure 👍 |
@nihilus I'm in France and a french man despite of my avatar :-). As for ALLEGREX, there are too many dialects for MIPS and capstone is pretty irritating and lacking when not used as a simple disassembler. Beside, VFPU is too specific and complex to handle it as a dialect for MIPS (VFPU is an awesome kind of proprietary coprocessor). But MIPS and ALLEGREX will benefit together. |
@hlide is that capstone? |
nope, AllegrexDisassembler has its own disassembler. |
@hlide neat that you integrated it. |
@yegord I have an issue with branch likely instructions (B_xx_L) when generating AST. See this example: case I_BEQL: {
AllegrexExpressionFactoryCallback then(factory, program->createBasicBlock(), instruction);
_[
jump(gpr(0) == gpr(1),
(delayslot(then)[jump(imm(2))]).basicBlock(),
directSuccessorButOne())
];
break;
} As you can see, this instruction NEVER jumps to directSuccessor (address + 4) but to directSuccessorButOne (address + 8). The instruction at address + 4 is inserted just before the jump as a branch delay slot when taken. I think it may fool the function IRGenerator::addJumpToDirectSuccessor. |
@yegord more info here: Source: case I_BLTZL: {
AllegrexExpressionFactoryCallback then(factory, program->createBasicBlock(), instruction);
_[
jump(signed_(gpr(0)) < constant(0),
(delayslot(then)[jump(imm(1))]).basicBlock(),
directSuccessorButOne())
];
} diff --git a/src/nc/core/ir/cgen/DefinitionGenerator.cpp b/src/nc/core/ir/cgen/DefinitionGenerator.cpp
index c81c88c..9567cd8 100644
--- a/src/nc/core/ir/cgen/DefinitionGenerator.cpp
+++ b/src/nc/core/ir/cgen/DefinitionGenerator.cpp
@@ -24,6 +24,7 @@
#include "DefinitionGenerator.h"
+#include <nc/common/LogToken.h>
#include <nc/common/Foreach.h>
#include <nc/common/Range.h>
#include <nc/common/Unreachable.h>
@@ -507,6 +508,12 @@ std::unique_ptr<likec::Expression> DefinitionGenerator::makeExpression(const cfl
std::unique_ptr<likec::Expression> expression;
if (const Jump *jump = statement->asJump()) {
+
+ if (jump != basicNode->basicBlock()->getJump())
+ {
+ nc::LogToken::instance()->error(QString("%1: %2").arg(jump->instruction()->addr(), -8, 16).arg(jump->instruction()->toString()));
+ }
+
assert(jump == basicNode->basicBlock()->getJump());
expression = makeExpression(jump->condition());
@@ -710,7 +717,6 @@ std::unique_ptr<likec::Statement> DefinitionGenerator::doMakeStatement(const Sta
parent().makeType(parent().types().getType(returnValueTerm)),
std::move(callOperator))));
}
- }
}
}
} auto delayslot = [&](AllegrexExpressionFactoryCallback & callback) -> AllegrexExpressionFactoryCallback & {
auto delayslot = checked_cast<const AllegrexInstruction *>(instructions_->get(instruction->endAddr()).get());
if (delayslot) {
createStatements(callback, delayslot, program);
}
else {
throw core::irgen::InvalidInstructionException(tr("Cannot find a delay slot at 0x%1.").arg(instruction->endAddr(), 0, 16));
}
return callback;
};
core::ir::BasicBlock *cachedDirectSuccessor = nullptr;
core::ir::BasicBlock *cachedNextDirectSuccessor = nullptr;
auto directSuccessor = [&]() -> core::ir::BasicBlock * {
if (!cachedDirectSuccessor) {
cachedDirectSuccessor = program->createBasicBlock(instruction->endAddr());
}
return cachedDirectSuccessor;
};
auto directSuccessorButOne = [&]() -> core::ir::BasicBlock * {
if (!cachedNextDirectSuccessor) {
cachedNextDirectSuccessor = program->createBasicBlock(instruction->endAddr() + instruction->size());
}
return cachedNextDirectSuccessor;
}; |
@hlide that should go for MIPS as well then... :-/ However cool and fast work. |
On Fri, Aug 07, 2015 at 05:20:14AM -0700, hlide wrote:
ir::Program::createBasicBlock(ByteAddr) and cgen::DefinitionGenerator:: When generating statements for the delay-slot instruction, you should Yegor Derevenets |
is it not what I am already doing with: AllegrexExpressionFactoryCallback then(factory, program->createBasicBlock(), instruction); ? instruction is the preceding instruction of delayslot, that is, the branch instruction. Just for you information, branch instruction not unlikely appear to work fine: case I_BEQ: {
AllegrexExpressionFactoryCallback then(factory, program->createBasicBlock(), instruction);
_[
jump(gpr(0) == gpr(1),
(delayslot(then)[jump(imm(2))]).basicBlock(),
directSuccessor())
];
break;
} while this one doesn't work: case I_BEQL: {
AllegrexExpressionFactoryCallback then(factory, program->createBasicBlock(), instruction);
_[
jump(gpr(0) == gpr(1),
(delayslot(then)[jump(imm(2))]).basicBlock(),
directSuccessorButOne())
];
break;
} the only diffrence is about directSuccessor() and directSuccessorButOne() as the last argument for jump. |
On Fri, Aug 07, 2015 at 03:54:29PM -0700, hlide wrote:
I have pushed small well-formedness checks in 82f0a71. Do they fail? Yegor Derevenets |
Ok, I merged your changes. Now it also fails on branch not likely too. I have this assert: #ifndef NDEBUG
/*
* Check that jump is always the last instruction in a basic block.
*/
foreach (auto basicBlock, program_->basicBlocks()) {
foreach (auto statement, basicBlock->statements()) {
if (auto jump = statement->asJump()) {
if (jump != basicBlock->statements().back()) {
nc::LogToken::instance()->error(QString("%1: %2").arg(jump->instruction()->addr(), -8, 16).arg(jump->instruction()->toString()));
nc::LogToken::instance()->error(QString("jump: %1").arg(jump->toString()));
nc::LogToken::instance()->error(QString("basicBlock->statements().back(): %1").arg(basicBlock->statements().back()->toString()));
}
assert(jump == basicBlock->statements().back());
}
}
}
#endif Log:
I dumped EDIT: added basicBlock. |
On Sat, Aug 08, 2015 at 01:12:08PM -0700, hlide wrote:
Can you print the whole basic block? I would like to see: all Yegor Derevenets |
done after edited |
always check my posts on github, I often edited them again (mistakes, more info, etc.) |
Yes, this is would be nice to transform it into ulw / usw. |
On the other hand, I wonder if my disassemble/decomposer should be able to handle a sequence of two instructions with one instruction ID. Snowman is able to handle instructions with variable size. So I could handle LA and ULW, and USW with an instruction ID and a 8-byte size instead of the usual 4-byte size. |
Is there any easy way to log what 'auto memval = *(ea & constant(-4));' actually is? |
|
Concerning commenting out well-formedness checks — fix your code. :) |
oh you mean to output on log window, I added some stuff to do so... From 6d67bbf9164ac422dba55a009e0865217ae6a53c Mon Sep 17 00:00:00 2001
From: hlide
Date: Mon, 10 Aug 2015 01:03:04 +0200
Subject: [PATCH] Added log facility
---
src/nc/common/LogToken.h | 2 ++
src/nc/gui/MainWindow.cpp | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/src/nc/common/LogToken.h b/src/nc/common/LogToken.h
index 01a705b..2135e6c 100644
--- a/src/nc/common/LogToken.h
+++ b/src/nc/common/LogToken.h
@@ -40,6 +40,8 @@ class LogToken {
std::shared_ptr<Logger> logger_;
public:
+ static LogToken* instance(LogToken* token = nullptr) { static LogToken* token___ = nullptr; if (token && !token___) token___ = token; return token___; }
+
/**
* Default constructor.
*
diff --git a/src/nc/gui/MainWindow.cpp b/src/nc/gui/MainWindow.cpp
index 082c1f6..0450595 100644
--- a/src/nc/gui/MainWindow.cpp
+++ b/src/nc/gui/MainWindow.cpp
@@ -86,7 +86,7 @@ MainWindow::MainWindow(Branding branding, QWidget *parent):
connect(logger.get(), SIGNAL(onMessage(const QString &)), progressDialog_, SLOT(setLabelText(const QString &)));
connect(logger.get(), SIGNAL(onMessage(const QString &)), this, SLOT(setStatusText(const QString &)));
- logToken_ = LogToken(logger);
+ LogToken::instance(&(logToken_ = LogToken(logger)));
settings_ = new QSettings(branding_.organizationName(), branding_.applicationName(), this);
loadSettings();
--
1.9.4.msysgit.2 |
ah, thx |
You could just use qDebug() — it's output is redirected to the log window. |
@yegord I'm not sure I can fix the issue with sorted addresses because you need to put a delay slot instruction (address + 4) BEFORE the real branch instruction (address). |
@yegord I've also started to get 'Assertion failed: (statement == basicBlock->statements().back()), function generate, file /Users/nietzsche/Downloads/snowman/src/nc/core/irgen/IRGenerator.cpp, line 110.' |
@nihilus the issue you have is almost the same I had with Allegrex. |
You proposed patch did not work so I reverted while keeping some ideas of yours. But I will try the thing about delay slot owner. |
@nihilus when sober (:-)), just read above all the posts regarding issue with branch instructions - they are the same for MIPS. |
Good news. With all the checking activated, it passes now. |
Just a small comment on why statements in a basic block should be sorted by instruction's addresses. |
You've probably studied this already, but here is a paper [http://www.cs.tufts.edu/~nr/pubs/xtoplas-acm.pdf] details how to cope with delay slots. I believe this is what the boomerang folks based their implementation on. |
Hmm. In the paper the authors even try to give semantics to jumps in delay slots, something that seems to be not needed for MIPS:
http://electronics.stackexchange.com/questions/28444/mips-pic32-branch-vs-branch-likely |
@yegord that is fine, I know how to handle correctly. As for Allegrex, you may find some games using a branch instruction inside - surprise ! - a branch instruction. I know what happens but that's specific to Allegrex which has a 7-stage pipeline. Bxx[L] has two bubbles (three cycles of latency) and Jxx has one bubble (two cycles of latency). If both nested branch isntructions are taken-able, the second is taken for all combinations except for one : def interpret_delay_slot(pc, is_cond):
insn = fetch(pc)
if insn.is_cond_branch() is True:
if insn.test_cond_branch() is True:
return insn
else:
return None
elif insn.is_uncond_branch() is true:
if is_cond is false:
return insn
else:
return None
else:
...
def interpret(pc, delay_slot):
insn = fetch(pc)
if insn.is_cond_branch() is True:
if insn.test_cond_branch() is True:
if delay_slot is False:
delay_slot_insn = interpret_delay_slot(pc + 4, True)
if delay_slot_insn is not None:
return delay_slot_insn.target_branch()
return insn.target_branch()
else:
return pc + 4
elif insn.is_uncond_branch() is true:
if delay_slot is False:
delay_slot_insn = interpret_delay_slot(pc + 4, False)
if delay_slot_insn is not None:
return delay_slot_insn.target_branch()
return insn.target_branch()
else:
...
1.1) When not taken, the next instruction is executed as a normal instruction, not as delay slot instruction. 1.2) When taken, if its delay slot instruction is also a conditional branch instruction which is taken, only the second instruction jumps to its target. 1.3) When taken, if its delay slot instruction is a unconditional branch instruction, only the first instruction jumps to its target.
2.1) When not taken, the next instruction is executed as a normal instruction, not as delay slot instruction. 2.2) When taken, if its delay slot instruction is a branch instruction which is taken, only the second instruction jumps to its target. |
@uxmal: No we did not based boomerang upon that paper (or should I say van Emmerik)... Every machine is represented in a language called SLED invented by Norman Ramsey and the NJ Machine Code Toolkit as found here: https://www.cs.tufts.edu/~nr/toolkit/ from there everything will get an immediate representation which is basically a minimalistic Turning-machine. Etcetc. |
Interesting stuff. Is there a git-mirrored Boomerang repository? and for this toolkit? they may give some ideas for a multi-platform assembler/disassembler with a focus upon composer/decomposer. |
OFF TOPIC MODE: onI was starting a run-time assembler with kinda of composer in pure c++ template here but x86 is a horrid beast to tame :(. OFF TOPIC: off |
There are many Boomerang repositories on github, here's one: https://github.com/nemerle/boomerang |
@uxmul yes but the official site is at http://boomerang.SF.net however I am looking for a new admin. |
Is any active development being done on boomerang? I see a lot of punters cloning the project but it seems to me that they're mostly doing code polishing, but no actual development. |
@uxmal: I asked nermerle to take over but he didnt respond. No I dont do anything activly on it. It is very accurate but unstable and has got no smart pointers etc. IMO using Capstone with Snowman is much easier to get things working since the NJMC ML is outdated and I cannot get it running natively on OS X. Ramsey is too occupied to care about it. Etc etc. |
@hlide see @uxmal :-) Yes, boomerang have some nice features like function recognition based on C-header files. There is also a branch for a plugin-API, which would be nice to have for snowman as well. However it does implement it own object parsers like Snowman instead of reusing code. There is a static linux binary for NJML / NJMC and some MIPS-stubs there which wrote ages ago. |
On Mon, Aug 10, 2015 at 02:56:13PM -0700, Markus Gothe wrote:
Something like: _[*zero ^= memval]; Or, without adding to the basic block: qDebug() << factory.createStatement(*zero ^= memval) Yegor Derevenets |
I may plan to add a new architecture instead of using the current MIPS architecture which is a work in progress. because ALLEGREX is not recognized by capstone framework and the fact the latter is using LLVM makes the implementation of ALLEGREX too complex. There are subtle differences which make the use of MIPS architecture not viable for decompiling ALLEGREX code.
So, I am about to provide a specific disassembler (the one provided by pspdecompiler, based on prxtools one, but with addition of a decomposer) for the architecture analyser. It means handling of all instructions including VFPU.
I am pretty sure people from uOFW project may be interested as well. But I am also pretty sure that it will be hard to decompile kernel modules because they may use some tricks which are not ABI compliant, so I am expecting for more tasks to do than making a simple disassembler/analyzer.
As for the author in http://lists.derevenets.com/pipermail/snowman/2015-August/000002.html, it may be great that he/she contributes as well here (PRX handling).
The text was updated successfully, but these errors were encountered: