Skip to content

unrustled-jimmies/CodeDefenderAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An Analysis and Static Deobfuscation of CodeDefender Binaries

Warning

This is unrustledjimmies (UJ). I've been analyzing CodeDefender's obfuscation over the past few days and am sharing my findings here. I reserve the right to be wrong on any or all points. These observations are made through the narrow lens of a handful of protected binaries, so some conclusions may only hold for these specific samples and not generalize to all CodeDefender configurations.

Caution

This research is published for educational purposes only. We do not support or assist in bypassing anti-cheat systems, circumventing software protection for unauthorized use, or software piracy.

CodeDefender is a next-generation software protection tool that provides protection against tampering, exploitation, and reverse-engineering, while ensuring stability, speed, and support for modern Windows security mechanisms. Unlike VMProtect or Themida, CodeDefender is not a virtual machine — there is no bytecode, no handler table, no virtual program counter. Instead, it applies inline code mutation, expanding each original instruction into thousands of junk instructions with the real operation buried in noise.

In this article I walk through every obfuscation technique I observed across three CodeDefender-protected binaries, then show how I statically recovered the original code. The samples I analyzed:

  • codedefender_helloworld_sample.exe — A sample where the full PE is obfuscated. The original main function was obfuscated to ~2,400 instructions across 11KB of code, deobfuscated back down to 6.
  • codedefender_battleye_shellcode_sample.bin — A real world example demonstrating CodeDefender's full PE obfuscation support where every function is obfuscated.
  • codedefender_ntoskrnl2_sample.exe — A CodeDefender-obfuscated Windows kernel demonstrating kernel-level obfuscation. Selected functions including NtOpenFile, IopCreateFile, ExAllocatePoolWithTag, NtQuerySystemInformation, MmCopyMemory, and others are obfuscated while the rest of the kernel runs unmodified.

This article documents every obfuscation technique I encountered across the feature set that these binaries had at the time of this writing. The upshot is that I was able to explore the obfuscated control flow graph — linear flow, branches, loops, and convergence points (diamond DAG, where two branches reconverge, demonstrating phi recovery) — and deobfuscate them into readable code.

CodeDefender Architecture

Code Layout

Opening codedefender_helloworld_sample.exe in IDA, I see three distinct regions:

0x140001000 - 0x140001d6c  .text     Original function thunks (preserved)
0x140004000 - 0x140004174  .text_2   Original code (CRT helpers, unobfuscated)
0x140007000 - 0x140521000  (unnamed) ALL obfuscated code — r-x, ~5MB

The Thunk Pattern

In codedefender_helloworld_sample.exe, obfuscated functions keep a tiny stub in .text with this byte sequence:

48 83 EC 28 90 E8 ?? ?? ?? ?? 90 48 83 C4 28 C3

Disassembled:

sub  rsp, 0x28
nop
call <obfuscated_body>    ; E8 rel32 — the only varying bytes
nop
add  rsp, 0x28
ret

The nop padding before and after the call is one of several stub shapes I observed. The sub rsp, 0x28 / add rsp, 0x28 frame is the standard Windows x64 shadow space allocation — the thunk itself does no real work. The call target is a relative offset into the obfuscated section, where the actual function body begins.

In the other samples, the thunk pattern differs. codedefender_ntoskrnl2_sample.exe uses 5-byte E9 (jmp rel32) stubs at each obfuscated function entry:

0x140482310:  jmp <belabs_section_dispatcher>   ; E9 xx xx xx xx

And codedefender_battleye_shellcode_sample.bin similarly uses E9 jmp thunks from its .text section into the obfuscated code.

Constant Obfuscation

CodeDefender uses two distinct mechanisms for hiding constants, serving different purposes.

Simple Constant Obfuscation (User Code Constants)

Constants that appear in the original program — NTSTATUS codes, magic values, flags — are hidden behind XMM opaque predicates and XOR splitting. The constant is split into two halves where one half is derived from an XMM chain and the other is a literal immediate. XOR combines them at runtime.

From the VEH exception handler in codedefender_battleye_shellcode_sample.bin, here's how STATUS_BREAKPOINT (0x80000003) is derived:

// XMM opaque predicate chain produces rax = 0xF0
char rax = _mm_extract_epi64(
    _mm_cmpeq_epi16(_mm_cvtepi16_epi32(arg1[0].q), arg1), 8).b;

int32_t rbx;
rbx.b = rax + 0x10;
if (rax >= 0xf0)
    rbx = -0x4ebceace;       // = 0xB1431532

int32_t rbx_1 = rbx ^ 0x31431531;  // 0xB1431532 ^ 0x31431531 = 0x80000003
int32_t rax_2 = **arg2;             // load exception code from EXCEPTION_RECORD
bool match = rax_2 == rbx_1;        // exception_code == STATUS_BREAKPOINT?
assert (0xB1431532 ^ 0x31431531) == 0x80000003

After deobfuscation, this entire chain collapses to:

arg9[0x32bfae1e].b = r11 == 0x80000003

Neither 0x80000003 nor its XOR halves reveal the constant statically — the XMM chain must be evaluated to recover the first half.

Hash-Based Constant Encoding

In codedefender_helloworld_sample.exe, two constants — the string pointer 0x140002250 and the printf wrapper address 0x140001A61 — are hidden through a different mechanism: two 64-iteration SplitMix64-style loops whose outputs are combined post-loop through imul and add (covered in detail in the Loop-Based Constant Encoding section).

0x14050f5be:  and  rax, r11              ; 0x2D45FA4 & loop-computed R11 = 0x2D401A4
0x14050f5c6:  imul rdx, rdx, 0x8E2A3EB4 ; 0x2D401A4 * 0xFFFFFFFF8E2A3EB4 = 0xFEBE0EBE925EDF50
0x14050f5cd:  lea  rsi, [rbp+r15]        ; 0xC9F0A576C7DBB8 + 0x78009D36D96748 = 0x141F142ADA14300
0x14050f5da:  lea  rdx, [rdx+rsi]        ; 0xFEBE0EBE925EDF50 + 0x141F142ADA14300 = 0x140002250
  0xFEBE0EBE925EDF50  (imul result)
+ 0x0141F142ADA14300  (loop1 output + loop2 output)
= 0x0000000140002250

The printf address 0x140001A61 uses a RIP-relative anchor combined with loop outputs through a separate chain:

0x14050f3e7:  lea  r13, [rip-0x2C3B72]      ; R13 = 0x14024B87C
0x14050f69a:  mov  r9, 0x0
0x14050f6a5:  sub  r9, r13                   ; R9 = -0x14024B87C
0x14050f6af:  sub  r10, r9                   ; R10 = 0x14024B87C (double negation)
0x14050f6b6:  imul rdx, r10, 0x3F887345      ; RDX = 0x4F73AD0F4EA56D6C
; ... accumulated junk from loops ...
0x14050f6bd:  lea  rdx, [rdx+r8]             ; RDX = 0x140001A61
  0x4F73AD0F4EA56D6C  (imul of anchor)
+ 0xB08C52F1F15AACF5  (accumulated from loops)
= 0x0000000140001A61

Verification

# Chain 1: String pointer
assert (0x2D401A4 * 0xFFFFFFFF8E2A3EB4) % 2**64 == 0xFEBE0EBE925EDF50
assert (0xC9F0A576C7DBB8 + 0x78009D36D96748) % 2**64 == 0x141F142ADA14300
assert (0xFEBE0EBE925EDF50 + 0x141F142ADA14300) % 2**64 == 0x140002250

# Chain 2: Printf address
assert (0x14024B87C * 0x3F887345) % 2**64 == 0x4F73AD0F4EA56D6C
assert (0x578B81843354123C + 0x5900D16DBE069AB9) % 2**64 == 0xB08C52F1F15AACF5
assert (0x4F73AD0F4EA56D6C + 0xB08C52F1F15AACF5) % 2**64 == 0x140001A61

# STATUS_BREAKPOINT derivation
assert (0xB1431532 ^ 0x31431531) == 0x80000003

Neither the string pointer, the printf address, nor STATUS_BREAKPOINT appear as literals anywhere in the obfuscated binary.

XMM Opaque Predicates

CodeDefender threads SIMD operations through the entire function body, producing conditions that always resolve the same way given known XMM state. A static analyzer cannot prove them constant without tracking all 128 bits through dozens of shuffles, compares, and extractions.

Here's the core pattern, from codedefender_helloworld_sample.exe at 0x14050eb65:

pcmpgtd xmm3, xmm4     ; compare dwords: xmm3 > xmm4?
                         ; both contain 5368741968 → result = 0 (not greater)
pxor    xmm3, xmm4      ; xmm3 = 0 xor 5368741968
movmskpd eax, xmm4       ; extract sign bits → GPR
; ...
cmovle  rcx, rbx         ; conditional move based on XMM-derived flags
cmovns  rcx, rbx         ; always takes the same path

By the time the result reaches a GPR, it has passed through enough SIMD operations that proving the outcome requires full 128-bit symbolic evaluation. The obfuscator knows the concrete answer — it generated the chain to produce a specific branch direction — but any tool that doesn't fully model 128-bit SIMD state will see an opaque branch.

The variety is deliberate. I observed 91 unique SSE mnemonics across the function:

pslldq   xmm7, 0xd
pcmpgtd  xmm4, xmm5
movq     r8, xmm4         ; extract to GPR
cmp      r8, 0x40
cmovnp   r8, r9            ; conditional on ptest-derived flags
phminposuw xmm2, xmm3     ; horizontal minimum — used purely for its flag side-effects
ptest    xmm0, xmm1        ; set ZF/CF from XMM comparison
pshufhw  xmm5, xmm6, 0x1  ; shuffle high words
pextrq   rax, xmm2, 0x1   ; extract qword lane

91 unique SSE mnemonics were observed across the codedefender_helloworld_sample.exe trace.

Flag Manipulation Stubs

CodeDefender uses small callable stubs to inject predetermined flag state into the CPU flags register. Each stub reads a byte from a specific offset deep in the caller's stack frame and ORs it into EFLAGS via the lahf/sahf mechanism. Subsequent cmov or jcc instructions then branch based on these injected flags — producing opaque predicates that are delivered through the stack rather than computed inline.

Here are three stubs from codedefender_helloworld_sample.exe, all sharing the same skeleton but reading from different stack offsets with different bit manipulations:

;; Stub at 0x14051EC6D — reads [rsp+0x208], uses btr ax,8
lea rsp, [rsp - 8]           ; allocate scratch
seto byte ptr [rsp]           ; save OF (lahf/sahf can't touch OF)
mov  word ptr [rsp + 2], ax  ; save AX
lahf                          ; load SF/ZF/AF/PF/CF into AH
mov  al, byte ptr [rsp + 0x208]  ; read predetermined byte from caller's stack
btr  ax, 8                   ; clear CF in AH, old CF → carry flag
or   ah, al                  ; OR stack byte into flags
ror  byte ptr [rsp], 1       ; rotate saved OF
sahf                          ; write modified flags back
mov  ax, word ptr [rsp + 2]  ; restore AX
lea  rsp, [rsp + 8]          ; deallocate scratch
ret
;; Stub at 0x14051AA18 — reads [rsp+0x20a], shifts and uses btr ax,0xe
lahf
mov  al, byte ptr [rsp + 0x20a]
shl  al, 6                   ; shift stack byte left 6
btr  ax, 0xe                 ; clear bit 14 of AX
or   ah, al
sahf
;; Stub at 0x14051970E — reads [rsp+0x209], shifts and uses btr ax,0xf
lahf
mov  al, byte ptr [rsp + 0x209]
shl  al, 7                   ; shift stack byte left 7
btr  ax, 0xf                 ; clear bit 15 of AX
or   ah, al
sahf

The mechanism: the obfuscator places specific bytes at stack offsets +0x208, +0x209, +0x20a etc. earlier in the function. Each stub reads its assigned byte and ORs it into the flags. The seto/ror at the beginning handles the overflow flag separately, since lahf/sahf only cover SF, ZF, AF, PF, and CF — not OF.

The result is a flag-based opaque predicate delivery system. Instead of computing the branch condition inline (like the XMM predicates), the desired flag state is pre-stored on the stack and loaded back at the right moment. A tool that doesn't follow these call/ret pairs will see flag-dependent branches with no visible flag-setting instruction.

SSE-Based Branchless Conditional Jmp

CodeDefender replaces branches with an SSE-based computation that produces the target address in a general-purpose register, ending in an indirect jmp reg. The two possible destinations are encoded as constants within the dispatch. I observed this pattern in codedefender_ntoskrnl2_sample.exe. The pseudocode below shows the shape of the computation with branch targets shown as symbolic constants — these won't appear as literal bytes in the binary's instruction stream.

Stage 1: Condition Extraction

The original comparison result flows through two shifts that extract its sign bit:

%shifted1    = lshr i64 %comparison_result, 30
%sign_bit    = lshr i64 %shifted1, 33           ; total shift: 63sign bit

%widened     = zext i64 %sign_bit to i128
%masked      = and i128 %widened, 18446744073709551615   ; mask to 64-bit
%low_byte    = trunc i128 %masked to i8
%condition   = icmp ult i8 0, %low_byte          ; 0 < bytecore boolean

%condition is the single i1 that drives everything downstream. Every select in the cascade is gated on this value.

Stage 2: Embedding the Two Targets

The targets never appear as branch operands. They're literal constants in AND masks:

; Target A (0x14085CC83) — masked into the i128 PMINUB path:
%target_a_masked = and i128 %inverted_sub, 5377477763     ; = 0x14085CC83

; Target B (0x140860D0C) — masked at i64 level in the final assembly:
%target_b_masked = and i64 %sub_result, 5377494284        ; = 0x140860D0C

Target A enters the i128 cascade. Target B stays in the i64 path. The final OR merges them.

Stage 3: PMINUB Byte Cascade

The ones-complement of the sub result is widened to i128 and AND-masked with Target A. Then a byte-by-byte unsigned minimum compares the full value against the masked value:

%inverted    = zext i64 %ones_complement to i128
%and_target  = and i128 %inverted, 5377477763              ; & 0x14085CC83

; Per byte lane (0, 8, 16, 24, 32):
%byte_full   = trunc(lshr(%inverted, N)) to i8
%byte_masked = trunc(lshr(%and_target, N)) to i8
%cmp         = icmp ult i8 %byte_full, %byte_masked
%min         = select i1 %cmp, i8 %byte_full, i8 %byte_masked

; Packed back via zext+shl+or into %packed_min (i128)

This builds an i128 with the byte-level minimum across 5 lanes (low 40 bits). Upper bytes pass through from the target constant.

Stage 4: PMAXUW → PAVGW → PSIGNW

The packed minimum feeds through three more SSE-emulated stages, all operating on i16 word lanes:

PMAXUW — word-by-word unsigned maximum:

%word_target = trunc(lshr(%and_target, N)) to i16
%word_min    = trunc(lshr(%packed_min, N)) to i16
%cmp         = icmp ugt i16 %word_target, %word_min
%max         = select i1 %cmp, i16 %word_target, i16 %word_min

PAVGW — word averaging with rounding:

%sum   = add i32 (zext %a), (zext %b)
%sum1  = add i32 %sum, 1
%avg   = trunc(lshr(%sum1, 1)) to i16

PSIGNW — sign-dependent negate or zero, per word lane:

%neg     = sub i16 0, %data_word
%is_neg  = icmp slt i16 %sign_word, 0
%is_zero = icmp eq i16 %sign_word, 0
%inner   = select i1 %is_zero, i16 0, i16 %data_word
%result  = select i1 %is_neg, i16 %neg, i16 %inner

Every one of these selects traces back to %condition. They're all correlated — when the condition is true, every lane takes the same arm.

Stage 5: MOVMSKPD Sign Extraction

The PSIGNW result gets its sign bits extracted — emulating MOVMSKPD to produce a 2-bit value from the 128-bit vector:

%upper_masked = and i128 %psignw_result, -18446744073709551616  ; upper 64 bits
%low_word     = trunc i128 %psignw_result to i16

; Extract sign of each 64-bit lane:
%bit63        = lshr i128 %broadcast, 63
%movmsk_b0    = and i128 %bit63, 1              ; sign of low qword
%bit127       = lshr i128 %broadcast, 127
%movmsk_b1    = shl i128 %bit127, 1             ; sign of high qword
%movmskpd     = or i128 %movmsk_b0, %movmsk_b1 ; 2-bit result
%mask_i64     = trunc i128 %movmskpd to i64

Stage 6: Final Address Assembly

Two paths converge through OR to produce the jump target:

; Path B: sub_result AND'd with Target B
%path_b      = and i64 %sub_result, 5377494284          ; & 0x140860D0C

; OR in the MOVMSKPD extraction
%path_b_mask = or i64 %path_b, %mask_i64

; Path A: truncate the i128 cascade to i64
%path_a      = trunc i128 %packed_cascade to i64

; Final: combine both paths
%target      = or i64 %path_b_mask, %path_a

; Jump to computed address
%target_ptr  = inttoptr i64 %target to ptr
; jmp rax

When the condition is true, the cascade zeroes out the Target B path and lets Target A (0x14085CC83) through. When false, the cascade zeroes Target A and the AND/OR lets Target B (0x140860D0C) through.

SSE-based branchless conditional jmp data flow

The entire pipeline exists to implement what was originally a single jcc. The two target addresses 0x14085CC83 and 0x140860D0C are embedded as constants within the cascade.

Loop-Based Constant Generation

Not all constants are produced by the simple XMM/XOR splitting described above. In codedefender_helloworld_sample.exe, I observed a second mechanism where constants are generated through loops — two sequential loops, each running exactly 64 iterations of an obfuscated SplitMix64-style hash-mixing accumulation.

Counter Mechanism

Both loops share the same counter pattern. A register starts at 1 and doubles each iteration via shl, but the shift amount is buried in a 7-instruction XMM opaque predicate chain:

pextrq rcx, xmm7, 0x20       ; extract from XMM chain → always 0x3E
xor    rcx, 0x3F              ; 0x3E ^ 0x3F = 1
shl    r8, cl                 ; shift counter left by 1 (CL=1)

The real operation is shl r8, 1. Counter evolution: 1 → 2 → 4 → ... → 2^63 → 0. After 64 doublings the bit overflows, producing zero, and the exit condition fires.

Exit Condition

Both loops test their counter through XMM laundering. Loop 1:

movq   xmm7, r11             ; load counter into XMM
psrldq xmm7, 0xB             ; shift right 88 bits → zeroes any 64-bit value
;; opaque chain → EAX = 0
add    rax, r11               ; 0 + R11 = R11, sets ZF iff R11 == 0
je     exit                   ; exits when counter overflows to 0

Loop 2 uses a different wrapper but the same principle:

movq   xmm7, rbx             ; load counter (RBX) into XMM
psrlq  xmm6, 0x40            ; shift by 64 → 0
pmaxud xmm6, xmm7            ; max(0, counter) = counter
movq   rcx, xmm6             ; extract counter
or     rcx, rbx              ; rcx = counter | counter = counter
je     exit                   ; exit if counter == 0

What Each Iteration Computes

The loop body matches SplitMix64's next() function with each xor replaced by its MBA equivalent:

uint64_t z = (x += 0x9E3779B97F4A7C15);                              // golden ratio increment
z = ((z | (z >> 30)) - (z & (z >> 30))) * 0xBF58476D1CE4E5B9;        // (A|B)-(A&B) = A^B
z = ((z | (z >> 27)) - (z & (z >> 27))) * 0x94D049BB133111EB;
return (z | (z >> 31)) - (z & (z >> 31));

In the actual disassembly, this looks like:

mov    rdx, 0x9E3779B97F4A7C15   ; golden ratio × 2^64
imul   rax, rdx                  ; accumulator step
mov    rdx, 0xBF58476D1CE4E5B9   ; SplitMix64 mix constant
imul   rcx, rdx
shr    rdx, 0x21                 ; >> 33
;; (A|B)-(A&B) replacing A^B:
or     r10, rbx
and    rbx, r10
sub    ...                       ; MBA rewrite of xor-shift

XMM opaque predicates (phminposuw, ptest, cmovbe) are interleaved between steps.

Loop Outputs

Both loops' outputs feed into both final constants together:

Loop-based constant generation data flow

Loop 1 produces 0x78009D36D96748. Loop 2 produces 0x2D401A4 and 0xC9F0A576C7DBB8. These are combined post-loop through imul and add (see the Constant Obfuscation section). Neither loop independently produces a final target value — the outputs are entangled in the post-loop mixing.

Polymorphic Constants

The hash constants observed in codedefender_helloworld_sample.exe — SplitMix64 (0x9E3779B97F4A7C15, 0xBF58476D1CE4E5B9, 0x94D049BB133111EB) and MurmurHash3 fmix64 (0xFF51AFD7ED558CCD) — and the per-value constants (0x2D45FA4, 0x8E2A3EB4) appear zero times in both codedefender_battleye_shellcode_sample.bin and codedefender_ntoskrnl2_sample.exe. I have only observed this loop-based mechanism once, so this may or may not be the generic implementation.

Import Obfuscation

In the hello world sample (codedefender_helloworld_sample.exe), the obfuscated code calculates the printf wrapper address 0x140001A61 into RBX (see the Constant Obfuscation and Loop-Based Constant Generation sections), then call rbx invokes the wrapper which eventually reaches __stdio_common_vfprintf.

BattlEye shellcode: full runtime resolution

The BattlEye shellcode sample (codedefender_battleye_shellcode_sample.bin) resolves every API at runtime through a manual GetProcAddress implementation. This could be a CodeDefender obfuscation-time rewrite of the original imports, or it could be a hand-rolled import resolver in the original code that was then obfuscated by CodeDefender. In either case, the CodeDefender-obfuscated manual GetProcAddress was fully deobfuscated. The decompiled initialization routine shows string construction followed by GetProcAddress calls:

0040023d  __builtin_strcpy(dest: arg14 - -0xcafebed8, src: "GetSystemTimeAsFileTime")
0040043e  __builtin_strncpy(dest: arg14 - -0xcafebe18, src: "GetCurrentThreadId", count: 0x13)
004004a3  void* rax_5 = (&GetProcAddress)(hModule, lpProcName: 0xcafebe18)  // "GetCurrentThreadId"
004004ce  __builtin_strncpy(dest: arg14 - -0xcafebec0, src: "NtProtectVirtualMemory", count: 0x17)
0040054b  *(arg14 - -0xcafec1a8) = (&GetProcAddress)(hModule: hModule_1, lpProcName: 0xcafebec0)
00400575  __builtin_strcpy(dest: arg14 - -0xcafebf10, src: "AddVectoredExceptionHandler")
004005e0  void* (* AddVectoredExceptionHandler)(uint32_t First, PVECTORED_EXCEPTION_HANDLER Handler) =
              (&GetProcAddress)(hModule: hModule_2, lpProcName: 0xcafebf10)

The resolved function is immediately used. The VEH installation:

00400625  void* result = AddVectoredExceptionHandler(First: 1, Handler: 0x7ffdad3c10e0)

The custom GetProcAddress

The deobfuscated output reveals a manual PE export directory walk. In the original binary, this logic was spread across multiple handlers connected by opaque jumps — what you see here is the recovered CFG after deobfuscation. The outer loop iterates over NumberOfNames, resolving each export name RVA and comparing byte-by-byte against the target string:

0040068f  while (true)
              IMAGE_EXPORT_DIRECTORY* rax = rax_2
              int64_t rdx_4 = sx.q(r14_5)
              uint64_t rdx_6 =
                  zx.q(*(zx.q(*(&rax_2->AddressOfNames + arg14)) + arg14 + (rdx_4 << 2)))

The inner loop walks the export name character by character, comparing against the target string at offset 0x14e078:

004006e8      while (true)
                  int32_t r11_4 = sx.d(*(arg14 + rax_12 + 0x14e078))  // target string byte
                  int32_t r8_4  = sx.d(*(arg14 + rdx_6 + rax_12))     // export name byte
                  int32_t rdx_8 = r11_4 - r8_4                        // difference

On mismatch, the branch at 0x00400779 falls through to increment r12_4 and continue the inner loop. On match (when the difference chain resolves to zero and the null terminator is hit), it dispatches through the SSE-based branchless dispatch to extract the function address via AddressOfNameOrdinals and AddressOfFunctions.

GetProcAddress implementation

There's a fallback path: when NumberOfNames is zero (the check at 0x004003f3), the code skips the export walk entirely and calls the real GetProcAddress directly:

NumberOfNames zero fallback

Deobfuscation

The techniques used to fully devirtualize VMProtect and Themida — iteratively lifting each handler and calculating the outgoing edges — do not directly apply here. CodeDefender is not a virtual machine, so there are no handlers to lift. The obfuscated code is native x64 with the real operations buried in noise, opaque predicates gating every branch, and constants hidden behind loops and XMM chains.

Recovering the control flow graph — linear flow, branches, loops, and convergence points — I had to unlock the vault. This article is a high-level view of the obfuscation techniques I observed. A follow-up article will cover the deobfuscation engine and how I recovered the CFG in detail.

Results

codedefender_helloworld_sample.exe

Obfuscated code, ~2,400 instructions over multiple handlers via indirect jumps, deobfuscated back down to 6 instructions.

Obfuscated main Deobfuscated main

codedefender_battleye_shellcode_sample.bin (BattlEye shellcode)

Obfuscated PE export resolver, spread across multiple handlers via indirect jumps, deobfuscated back into a function with loops, branches, and merge points.

I don't want to include too much deobfuscated BattlEye code here. Here is the GetProcAddress implementation already discussed in the Import Obfuscation section, fully deobfuscated. In the original binary, this logic was spread across multiple handlers connected by opaque jumps:

Obfuscated GetProcAddress Deobfuscated GetProcAddress

00400126        void* r14 = arg12
00400175        *(r14 - -0xcafeba80) = 0x7ffdad3c75a5
0040017e        *(r14 - -0xcafeba78) = arg4
00400187        *(r14 - -0xcafeba70) = arg6
00400190        *(r14 - -0xcafeba68) = arg7
00400199        *(r14 - -0xcafeba60) = arg5
004001a2        *(r14 - -0xcafeba58) = arg8
004001ab        *(r14 - -0xcafeba50) = arg9
004001b4        *(r14 - -0xcafeba48) = arg10
004001bd        *(r14 - -0xcafeba40) = arg11
004001c6        __builtin_memset(dest: r14 - -0xcafeb9e8, ch: 0, count: 0x50)
00400223        uint64_t r10_1 = zx.q(*(r14 + sx.q(*(r14 + 0x3c)) + 0x88))
00400230        *(r14 - -0xcafeba90) = r10_1
00400239        *(r14 - -0xcafeba8c) = 0
00400241        int32_t rax_2 = *(r14 + r10_1 + 0x18)
00400248        int32_t rdx_2 = neg.d(rax_2)
00400251        int32_t r9_1 = sx.d(rdx_2.b)
00400251        
0040025f        if (r9_1 u>> 0x18 u<= rdx_2 u>> 0x18)
0040025f            r9_1 = rdx_2
0040025f        
00400269        if (((r9_1 & rdx_2) | rax_2) s< 0)
0040026f            void* rcx_1 = 0xcafeba90 + r14
00400277            int32_t* rdi_1 = 0xcafeba8c + r14
00400285            char* r8_3 = 0xcafeb9d8 + r14
0040028d            uint64_t* rcx_2 = 0xcafeba98 + r14
00400296            int64_t* r11_3 = 0xcafeb9d0 + r14
0040029f            char* r15_1 = 0xcafeb9c8 + r14
004002a8            int16_t* r12_1 = 0xcafeb9ca + r14
004002b0            int32_t* rax_3 = 0xcafeba88 + r14
004002bb            int32_t rdx_3 = 0
004002df            uint64_t* var_a0_1 = rcx_2
004002e4            uint64_t var_a8_1 = r10_1
004002e9            char* var_88_1 = r8_3
004002fd        label_4002fd:
004002fd            int64_t rdx_4 = sx.q(rdx_3)
00400309            uint64_t rax_5 = zx.q(*(zx.q(*(r14 + r10_1 + 0x20)) + r14 + (rdx_4 << 2)))
0040030c            *r8_3 = 0
00400310            *rcx_2 = rax_5
00400316            *r11_3 = 0x7ffdad3c6a06
0040031c            *r12_1 = 0
00400326            *r15_1 = 0
0040032f            void* rax_6 = rax_5 + r14
0040033a            int32_t rdx_6 = 0
0040033a            
00400348            while (true)
00400348                *rax_3 = rdx_6
0040034a                int64_t r8_4 = sx.q(rdx_6)
0040034d                void* r11_4 = r14
00400350                int32_t rax_8 = sx.d(*(r14 + r8_4 + 0x14e078))
00400361                int32_t rdi_2 = sx.d(*(rax_6 + r8_4))
00400369                int32_t r8_6 = rax_8 - rdi_2
0040036f                int32_t rcx_5 = neg.d(r8_6)
00400371                int32_t r8_7 = r8_6 | rcx_5
00400371                
00400385                if (rcx_5 u>> 0x18 u> r8_7 u>> 0x18)
00400385                    r8_7 = rcx_5
00400385                
00400389                uint64_t r8_8 = zx.q(r8_7 s>> 0x1f)
0040038d                int64_t rcx_6 = sx.q(r8_8.d)
0040039b                int64_t r10_5 = rcx_6 & 0x7ffdad3c63b0
004003a1                uint64_t r14_4 = zx.q(rcx_6.d) & 0xad3c63b0
004003a1                
004003ab                if (rcx_6.d u< r10_5.d)
004003ab                    r14_4 = r8_8
004003ab                
004003c3                uint64_t r8_9 = r8_8 << 0x20
004003c3                
004003cd                if (rcx_6.d u>= (r10_5 u>> 0x20).d)
004003cd                    r8_9 = rcx_6 & 0x7ffd00000000
004003cd                
004003df                if (((not.q(rcx_6) & 0x7ffdad3c6b93) | r14_4 | r8_9) == 0x7ffdad3c63b0)
0040041e                    *rdi_1 = rdx_3 + 1
00400420                    r14 = r11_4
00400428                    int32_t rcx_12 = *(r11_4 + var_a8_1 + 0x18)
00400441                    int32_t r9_8 = (rdx_3 + 1 - rcx_12) & (rcx_12 | (0xfffffffe - rdx_3))
0040044b                    int32_t r10_7 = sx.d(r9_8.b)
0040044b                    
00400459                    if (r10_7 u>> 0x18 u<= r9_8 u>> 0x18)
00400459                        r10_7 = r9_8
00400459                    
00400462                    bool cond:0_1 = ((r10_7 & r9_8) | ((0xfffffffe - rdx_3) & rcx_12)) s< 0
00400465                    r10_1 = var_a8_1
00400468                    rcx_2 = var_a0_1
0040046d                    r8_3 = var_88_1
0040047a                    rdx_3 += 1
0040047a                    
00400483                    if (cond:0_1)
00400483                        goto label_4002fd
00400483                    
004004a5                    0x7ffdad3c75a5(zx.q(rdi_2), rax_5, arg8, arg9, r14)
004004a7                    goto label_4004e8
004004a7                
004003e6                if (rax_8.b == 0)
004004a9                    r14 = r11_4
004004e5                    0x7ffdad3c75a5(
004004e5                        zx.q(*(zx.q(*(r11_4 + var_a8_1 + 0x1c)) + r11_4 + (
004004e5                            zx.q(*(zx.q(*(r11_4 + var_a8_1 + 0x24)) + r11_4 + (rdx_4 << 1)))
004004e5                            << 2))), 
004004e5                        rdx_4, arg8, arg9, r11_4)
004004e8                label_4004e8:
004004e8                    void* var_b8_3 = r14
004004f2                    (*rcx_1)()
004004f4                    void* var_b8_4 = r14
004004fe                    (*var_a0_1)()
00400500                    void* var_b8_5 = r14
0040050a                    (*(r14 - -0xcafebaa0))()
0040050a                    break
0040050a                
004003f3                *r11_3 = 0x7ffdad3c684c
004003f6                *r12_1 = 0x8ca5
004003fc                *r15_1 = 0x80
00400401                rdx_6 += 1
00400403                r14 = r11_4
00400403        
00400523        return 0

Summary

I was just messing with this stuff after work. If you find anything interesting as well, happy to accept pull requests.

About

An analysis and static deobfuscation of codedefender.io protected samples.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages