IR Optimization #123

ghost · 2017-04-04T16:03:04Z

Consider this assembly code:
41b00f: lea eax, [edx*4+0x0] 41b016: sub eax, edx 41b018: add eax, eax 41b01a: mov edx, eax 41b01c: shl eax, 0x3 41b01f: add edx, eax
Which translates to:
(edx10 * 4 - edx10) * 2 + (edx10 * 4 - edx10) * 2 * 8)
but is actually a multiplication by 54
We can fold this in the LikeC tree, but struct members and global variable types are determined prior to tree generation from what I can tell. So I wrote up some proof-of-concept code for analyzing preceding IR statements in the current basic block and merging them. Is this a good idea, or would you prefer it be implemented a different way?

The text was updated successfully, but these errors were encountered:

ghost · 2017-04-04T16:30:19Z

The following is what the prologue of createStatements in X86InstructionAnalyzer looks like with this added:
`
assert(instr != nullptr);
assert(program != nullptr);
using namespace nc::core::ir;
currentInstruction_ = instr;
BasicBlock *thisBlock = program->getBasicBlockForInstruction(instr);
auto disassembleInstr = [this](const X86Instruction *x86instr) {
ud_set_pc(&ud_obj_, x86instr->addr());
ud_set_input_buffer(&ud_obj_, const_cast<uint8_t *>(x86instr->bytes()),
checked_caststd::size_t(x86instr->size()));
ud_disassemble(&ud_obj_);
};
auto disassembleCurrent = &disassembleInstr, instr { disassembleInstr(instr); };

    disassembleCurrent();
    assert(ud_obj_.mnemonic != UD_Iinvalid);

    BasicBlock *cachedDirectSuccessor = nullptr;
    auto directSuccessor = [&]() -> BasicBlock * {
        if (!cachedDirectSuccessor) {
            cachedDirectSuccessor = program->createBasicBlock(instr->endAddr());
        }
        return cachedDirectSuccessor;
    };

    auto gatherBlockX86Instructions = [](BasicBlock *blk) -> std::vector<const X86Instruction *> {
        std::vector<const X86Instruction *> result;
        foreach (auto stmt, blk->statements()) {
            bool alreadyAdded = false;
            const X86Instruction *currInstr = static_cast<const X86Instruction *>(stmt->instruction());
            foreach (auto instr, result) {
                if (instr == currInstr) {
                    alreadyAdded = true;
                    break;
                }
            }
            if (!alreadyAdded)
                result.push_back(currInstr);
        }
        return std::move(result);
    };
    auto gatherIRStmtsForInsn = [thisBlock](const X86Instruction *x86insn) {
        std::vector<Statement *> ownedIR;
        foreach (Statement *stmt, thisBlock->statements()) {
            if (stmt->instruction() == x86insn)
                ownedIR.push_back(stmt);
        }
        return ownedIR;
    };
    auto findMemLocWrite = [this](std::vector<Statement *> stmts,
                                  const MemoryLocation &memloc) -> core::ir::Statement * {

        foreach (auto stmt, stmts) {
            /*
             * register potentially clobbered
             */
            if (stmt->is<Call>() || stmt->is<Jump>() || stmt->is<InlineAssembly>())
                return nullptr;
            if (auto asg = stmt->as<Assignment>()) {
                auto assigned = asg->left();
                if (auto locAccess = assigned->asMemoryLocationAccess()) {
                    if (locAccess->memoryLocation().covers(memloc) || locAccess->memoryLocation().overlaps(memloc))
                        return stmt;
                }
            }
        }
        // no preceding write in this range
        return nullptr;
    };
    auto findRegisterLocWrite = [this, &findMemLocWrite](std::vector<Statement *> stmts,
                                                         ud_type reg) -> core::ir::Statement * {
        auto regAccess = createRegisterAccess(reg);
        // if(!regAccess)
        //    return nullptr;
        assert(regAccess);
        auto regMemAccess = regAccess->asMemoryLocationAccess();
        // if(!regMemAccess)
        //    return nullptr;
        assert(regMemAccess);
        auto regMemLoc = regMemAccess->memoryLocation();

        return findMemLocWrite(stmts, regMemLoc);
    };

    X86ExpressionFactory factory(architecture_);
    X86ExpressionFactoryCallback _(factory, thisBlock, instr);

    using namespace core::irgen::expressions;

    auto getPreceding = [&]() -> const X86Instruction * {
        // auto stmts = thisBlock->statements();
        if (thisBlock->statements().size()) {
            auto back = thisBlock->statements().back();
            if (back) {
                return static_cast<const X86Instruction *>(back->instruction());
            }
        }
        return nullptr;
    };
    auto removeIRStatements = [&thisBlock](std::vector<Statement *> stmts) {
        foreach (auto stmt, stmts)
            thisBlock->erase(stmt);

    };
    auto pred = getPreceding();

    const core::arch::Instruction *predecessor = getPreceding();
    ud_operand udOp0 = std::move(ud_obj_.operand[0]);
    ud_operand udOp1 = std::move(ud_obj_.operand[1]);
    ud_operand udOp2 = std::move(ud_obj_.operand[2]);

And here is a (messy) example of it being used to combine instructions:
case UD_Iadd: {

        if (udOp0.type == UD_OP_REG && udOp0.type == udOp1.type && udOp0.base == udOp1.base) {
            if (pred) {
                auto predIR = gatherIRStmtsForInsn(pred);
                auto lastWrite = findRegisterLocWrite(predIR, udOp0.base);
                // only one preceding IR statement, and its a write to this one
                if (predIR.size() == 1 && lastWrite) {
                    auto fusedTerm = predIR[0]->as<Assignment>()->right()->clone();
                    auto asBinary = fusedTerm->as<BinaryOperator>();
                    if (asBinary && asBinary->operatorKind() == BinaryOperator::MUL) {
                        assert(asBinary->right()->is<Constant>());
                        auto constantRight = asBinary->right()->as<Constant>();

                        SizedValue distributed(constantRight->size(), constantRight->value().value() * 2);
                        _[temporary(operand(0).size()) ^= TermExpression(std::make_unique<BinaryOperator>(
                              BinaryOperator::MUL, lastWrite->as<Assignment>()->left()->clone(),
                              static_cast<std::unique_ptr<Term>>(std::make_unique<Constant>(distributed)),
                              asBinary->size()))];
                    } else
                        _[temporary(operand(0).size()) ^= TermExpression(std::move(fusedTerm)) * constant(2)];
                    removeIRStatements(predIR);
                    goto flags_add;
                } else if (lastWrite) {
                    disassembleInstr(pred);
                    auto predo0 = std::move(ud_obj_.operand[0]);
                    auto predo1 = std::move(ud_obj_.operand[1]);
                    if (ud_obj_.mnemonic == UD_Iadd) {
                        if (predo0.type == udOp0.type && predo1.type == udOp1.type && predo0.base == udOp0.base &&
                            predo1.base == udOp1.base) {
                            assert(predIR.size() > 1);
                            auto right = lastWrite->as<Assignment>()->right();
                            if (!right->is<BinaryOperator>()) {
                                assert(right->is<MemoryLocationAccess>());
                                /*
                                 * find the write to the temporary register instead
                                 */
                                auto tempWrite =
                                    findMemLocWrite(predIR, right->asMemoryLocationAccess()->memoryLocation());
                                assert(tempWrite->is<Assignment>());
                                right = tempWrite->as<Assignment>()->right();
                                assert(right->is<BinaryOperator>());
                            }
                            auto asBinary = right->as<BinaryOperator>();
                            assert(asBinary->operatorKind() == BinaryOperator::MUL);
                            auto constantRight = asBinary->right()->as<Constant>();
                            SizedValue distributed(constantRight->size(), constantRight->value().value() * 2);

                            _[temporary(operand(0).size()) ^= TermExpression(std::make_unique<BinaryOperator>(
                                  BinaryOperator::MUL, lastWrite->as<Assignment>()->left()->clone(),
                                  static_cast<std::unique_ptr<Term>>(std::make_unique<Constant>(distributed)),
                                  asBinary->size()))];

                            disassembleCurrent();
                            removeIRStatements(predIR);
                            goto flags_add;
                        }
                    }
                    disassembleCurrent();
                }
            }
            _[temporary(operand(0).size()) ^= operand(0) * constant(2)];
        } else
            _[temporary(operand(0).size()) ^= operand(0) + operand(1)];
    flags_add:
        _[cf ^= unsigned_(temporary(operand(0).size())) < unsigned_(operand(0)),
          operand(0) ^= temporary(operand(0).size()), pf ^= intrinsic(), zf ^= operand(0) == constant(0),
          sf ^= signed_(operand(0)) < constant(0), of ^= intrinsic(), af ^= intrinsic(), less ^= ~(sf == of),
          less_or_equal ^= less | zf, below_or_equal ^= cf | zf];

        break;
    }

`
This combines multiple adds that have the same destination and source into a single multiplication IR statement, removing the IR statements generated from the previous instruction, including ones that modify the flags, as their values are discarded by the current instruction anyway.

yegord · 2017-04-04T21:14:59Z

On Tue, Apr 04, 2017 at 09:03:05AM -0700, chrisps wrote: Is this a good idea, or would you prefer it be implemented a different way?

These transformations do not depend on the architecture, so, I would not implement them in InstructionAnalyzer. Probably, I would put them in a separate pass over IR, for example, next after the MasterAnalyzer:: createProgram() call in MasterAnalyzer::decompile(). You might want to use a basic-block-level dataflow analysis, to discover which register is defined where, without searching manually. There is an example of how to do it in IRGenerator::computeJumpTargets().

…

-- Yegor Derevenets

yegord · 2017-04-04T21:25:25Z

Probably you will benefit from reading nc/core/ir/misc/PatternRecognition.cpp — you might want to implement a function similar to those in that file, called, e.g., recognizeMultiplicationByConstant().

Once you have this function, you could also change recognizeArrayAccess() to use it.

ghost · 2017-04-04T22:02:57Z

Thank you for the information, it can be a little difficult figuring out the best way to do these things sometimes when you're not familiar with the code. I'll try adding something after decompile()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IR Optimization #123

IR Optimization #123

ghost commented Apr 4, 2017

ghost commented Apr 4, 2017 •

edited by ghost

yegord commented Apr 4, 2017 via email

yegord commented Apr 4, 2017 •

edited

ghost commented Apr 4, 2017

IR Optimization #123

IR Optimization #123

Comments

ghost commented Apr 4, 2017

ghost commented Apr 4, 2017 • edited by ghost

yegord commented Apr 4, 2017 via email

yegord commented Apr 4, 2017 • edited

ghost commented Apr 4, 2017

ghost commented Apr 4, 2017 •

edited by ghost

yegord commented Apr 4, 2017 •

edited