# Chrysalis Unpacking Walkthrough (Notebook)

This notebook demonstrates the **offline + emulation-assisted** steps we used to unpack the Lotus Blossom/Chrysalis sample chain described by Rapid7.

It focuses on three practical artifacts:
- `input/encrypted_shellcode.bin` (Rapid7 `BluetoothService`): extract + decrypt RC4 config (offline)
- `input/log.dll`: emulate `LogWrite` to decrypt the stage1 buffer (Unicorn-based, still 100% Python)
- Decrypt and materialize the **main module** into a PE-like memory image / patched PE (offline transform)

## Requirements
- Install dependencies into the same Python environment your Jupyter kernel uses: `unicorn`, `pefile`, `capstone`, `bokeh` (see `requirements.txt`).
- Inputs must exist under `input/`:
  - `input/log.dll`
  - `input/encrypted_shellcode.bin`
  - `input/BluetoothService.exe`

Outputs are written under `output/` (created if missing).

## Optional plots
This notebook tries to use:
- `bokeh` if available
- else it will fall back to text summaries.

If you want interactive plots, install `bokeh` into the same Python environment your Jupyter kernel is using.


In [1]:
from __future__ import annotations

import hashlib
import struct
import sys
from pathlib import Path

# Ensure repo root is on sys.path even if Jupyter started in notebooks/.
_cwd = Path.cwd().resolve()
for _d in [_cwd, *_cwd.parents]:
    if (_d / 'chrysalis_notebook_lib.py').exists():
        sys.path.insert(0, str(_d))
        break

from chrysalis_notebook_lib import (
    ArgStruct,
    ConfigExtractor,
    MainModuleMaterializer,
    Stage1Decryptor,
    find_repo_root,
    sha256_bytes,
    shannon_entropy,
)

ROOT = find_repo_root(Path.cwd())
INPUT_DIR = ROOT / 'input'
OUTPUT_DIR = ROOT / 'output'
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

LOG_DLL = INPUT_DIR / 'log.dll'
ENC_SHELLCODE = INPUT_DIR / 'encrypted_shellcode.bin'
CONTAINER_EXE = INPUT_DIR / 'BluetoothService.exe'

for _p in [LOG_DLL, ENC_SHELLCODE, CONTAINER_EXE]:
    assert _p.exists(), f'Missing input file: {_p}'

def sha256_path(p: Path) -> str:
    return hashlib.sha256(p.read_bytes()).hexdigest()

print('ROOT:', ROOT)
print('log.dll sha256:', sha256_path(LOG_DLL))
print('encrypted_shellcode.bin sha256:', sha256_path(ENC_SHELLCODE))
print('BluetoothService.exe sha256:', sha256_path(CONTAINER_EXE))


ROOT: /Users/yakovgoldberg/Downloads/malware
log.dll sha256: 3bdc4c0637591533f1d4198a72a33426c01f69bd2e15ceee547866f65e26b7ad
encrypted_shellcode.bin sha256: 77bfea78def679aa1117f569a35e8fd1542df21f7e00e27f192c907e61d63a2e
BluetoothService.exe sha256: 2da00de67720f5f13b17e9d985fe70f10f153da60c9ab1086fe58f069a156924


In [2]:
def sha256_bytes(b: bytes) -> str:
    return hashlib.sha256(b).hexdigest()

def sha256_path(p: Path) -> str:
    return sha256_bytes(p.read_bytes())

def shannon_entropy(b: bytes) -> float:
    if not b:
        return 0.0
    counts = [0] * 256
    for x in b:
        counts[x] += 1
    import math
    n = len(b)
    ent = 0.0
    for c in counts:
        if c == 0:
            continue
        p = c / n
        ent -= p * math.log2(p)
    return ent

def hexdump(b: bytes, start: int = 0, length: int = 0x100) -> str:
    chunk = b[start:start+length]
    out = []
    for i in range(0, len(chunk), 16):
        row = chunk[i:i+16]
        hs = " ".join(f"{x:02x}" for x in row)
        asc = "".join(chr(x) if 32 <= x < 127 else "." for x in row)
        out.append(f"{start+i:08x}  {hs:<47}  {asc}")
    return "\n".join(out)

print("log.dll sha256:", sha256_path(LOG_DLL))
print("encrypted_shellcode.bin sha256:", sha256_path(ENC_SHELLCODE))
print("BluetoothService.exe sha256:", sha256_path(CONTAINER_EXE))

log.dll sha256: 3bdc4c0637591533f1d4198a72a33426c01f69bd2e15ceee547866f65e26b7ad
encrypted_shellcode.bin sha256: 77bfea78def679aa1117f569a35e8fd1542df21f7e00e27f192c907e61d63a2e
BluetoothService.exe sha256: 2da00de67720f5f13b17e9d985fe70f10f153da60c9ab1086fe58f069a156924


## 1) Config Extraction + RC4 Decryption (Offline)

Rapid7 notes the encrypted configuration is stored in `BluetoothService` (our `input/encrypted_shellcode.bin`) at:
- offset `0x30808`
- size `0x980`
- RC4 key `qwhvb^435h&*7`

We'll implement RC4 in Python and show recovered plaintext strings.

In [3]:
def rc4_crypt(data: bytes, key: bytes) -> bytes:
    # Classic RC4 KSA+PRGA
    S = list(range(256))
    j = 0
    for i in range(256):
        j = (j + S[i] + key[i % len(key)]) & 0xFF
        S[i], S[j] = S[j], S[i]
    i = 0
    j = 0
    out = bytearray(len(data))
    for n, x in enumerate(data):
        i = (i + 1) & 0xFF
        j = (j + S[i]) & 0xFF
        S[i], S[j] = S[j], S[i]
        K = S[(S[i] + S[j]) & 0xFF]
        out[n] = x ^ K
    return bytes(out)

cfg_off = 0x30808
cfg_len = 0x980
cfg_key = b"qwhvb^435h&*7"

enc = ENC_SHELLCODE.read_bytes()
assert cfg_off + cfg_len <= len(enc)
cfg_enc = enc[cfg_off:cfg_off+cfg_len]
cfg_plain = rc4_crypt(cfg_enc, cfg_key)

cfg_out = OUTPUT_DIR / "notebook_config_decrypted.bin"
cfg_out.write_bytes(cfg_plain)

print("Encrypted cfg sha256:", sha256_bytes(cfg_enc), "entropy", f"{shannon_entropy(cfg_enc):.3f}")
print("Decrypted cfg sha256:", sha256_bytes(cfg_plain), "entropy", f"{shannon_entropy(cfg_plain):.3f}")
print("Wrote:", cfg_out)
print()
print("First 0x120 bytes of decrypted config:")
print(hexdump(cfg_plain, 0, 0x120))

Encrypted cfg sha256: e07764c6d2d84f34f0be62c09ea047e031ab7d092d768bf09b04851544e416c0 entropy 7.915
Decrypted cfg sha256: aad018195c5ee6c2e3c00bc3c95313cb4301218534765472124ebc7b5fb7bcb1 entropy 1.163
Wrote: /Users/yakovgoldberg/Downloads/malware/output/notebook_config_decrypted.bin

First 0x120 bytes of decrypted config:
00000000  68 74 74 70 73 3a 2f 2f 61 70 69 2e 73 6b 79 63  https://api.skyc
00000010  6c 6f 75 64 63 65 6e 74 65 72 2e 63 6f 6d 2f 61  loudcenter.com/a
00000020  2f 63 68 61 74 2f 73 2f 37 30 35 32 31 64 64 66  /chat/s/70521ddf
00000030  2d 61 32 65 66 2d 34 61 64 66 2d 39 63 66 30 2d  -a2ef-4adf-9cf0-
00000040  36 64 38 65 32 34 61 61 61 38 32 31 00 00 00 00  6d8e24aaa821....
00000050  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000060  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000070  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000080  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ........

In [4]:
def extract_printable_ascii(b: bytes, min_len: int = 6) -> list[str]:
    out = []
    cur = bytearray()
    for x in b:
        if 32 <= x < 127:
            cur.append(x)
        else:
            if len(cur) >= min_len:
                out.append(cur.decode("ascii", errors="ignore"))
            cur.clear()
    if len(cur) >= min_len:
        out.append(cur.decode("ascii", errors="ignore"))
    return out

print("ASCII strings in decrypted config (min_len=6):")
for s in extract_printable_ascii(cfg_plain, 6):
    print("-", s)

ASCII strings in decrypted config (min_len=6):
- https://api.skycloudcenter.com/a/chat/s/70521ddf-a2ef-4adf-9cf0-6d8e24aaa821
- BluetoothService
- Accept: */*
- User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.4044.92 Safari/537.36
- Content-Type: text/html
- Accept-Language: en-US,en;q=0.9


## 2) Stage1 Decryption via log.dll (Emulation-Assisted)

The stage1 buffer is decrypted by `log.dll` at runtime.

Instead of re-implementing the full LCG-based decrypt routine from scratch, we run the **actual malware decryption function** inside a controlled x86 emulator (Unicorn) and dump the decrypted bytes.

This is still *pure Python* end-to-end, and it's reproducible on macOS/ARM.

In [5]:
# Run the Unicorn-based log.dll emulator directly (in-kernel).
stage1_runner = Stage1Decryptor()
stage1_runner.run(
    log_dll_path=LOG_DLL,
    encrypted_payload_path=ENC_SHELLCODE,
    output_dir=OUTPUT_DIR,
    mode='logwrite',
    stop_at='none',
)

stage1 = OUTPUT_DIR / 'shellcode.bin'
stage1_full = OUTPUT_DIR / 'shellcode_full.bin'
assert stage1.exists(), f'Missing {stage1}'
assert stage1_full.exists(), f'Missing {stage1_full}'

b1 = stage1.read_bytes()
bfull = stage1_full.read_bytes()
print('stage1 len:', len(b1), 'sha256:', sha256_bytes(b1), 'entropy:', f'{shannon_entropy(b1):.3f}')
print('stage1_full len:', len(bfull), 'sha256:', sha256_bytes(bfull), 'entropy:', f'{shannon_entropy(bfull):.3f}')


[*] emulate_logwrite_dump_shellcode.py 2026-02-06u mode=logwrite stop_at=none
[+] Running init at 0x10001000
[+] LogWrite RVA: 0x00001B20  VA: 0x10001B20
[+] Breakpoint RVA: 0x00001C11 VA: 0x10001C11
[+] Starting emulation...
[*] STUB GetModuleHandleA(0x0) -> EAX=0x10000000 RET=0x1000101E
[*] VirtualAlloc mapped 0x30000000-0x30001000 size=0x1000
[*] STUB HeapAlloc(0x0, 0x0, 0x110) -> EAX=0x30000000 RET=0x10006BC6
[*] Seed set at 0x10016354 = 0x00000000
[*] STUB GetModuleHandleA(0x10012c88) -> EAX=0x50000000 RET=0x10001097
[*] STUB GetProcessHeap() -> EAX=0x70000000 RET=0x100010A9
[*] VirtualAlloc mapped 0x30001000-0x30201000 size=0x200000
[*] STUB HeapAlloc(0x70000000, 0x0, 0x200000) -> EAX=0x30001000 RET=0x100010B0
[*] STUB HeapFree(0x0, 0x0, 0x30000000) -> EAX=0x00000001 RET=0x10006D12
[*] STUB EnterCriticalSection(0x10015ec8) -> EAX=0x00000000 RET=0x10006B44
[*] STUB HeapSize(0x0, 0x0, 0x993aed03) -> EAX=0x00000000 RET=0x1000A04A
[*] VirtualAlloc mapped 0x30201000-0x30202000 size=0x

## 3) Main Module Materialization (Offline Transform)

Rapid7 describes the main module decryption as a simple byte transform using key `gQ2JR&9;`.

The repo script `offline_extract_stage2.py` does two useful things:
- injects the stage1-provided regions into a known PE container (`BluetoothService.exe`) at 5 RVAs
- applies the decrypt transform to those regions

We generate two artifacts:
- `output/main_module_patched.exe` (easy to open as PE, signature invalid after modification)
- `output/main_module_mem.bin` (decrypted memory image; often the best artifact for RE)


### 3a) The Byte Transform (Rapid7 key `gQ2JR&9;`)

Rapid7's pseudocode per byte is:

```
x = x + k
x = x ^ k
x = x - k
```

A useful property (for this sample) is that this transform is its own inverse.
So applying it twice yields the original byte stream again.

(In our scripts you may see “rounds=5”; for this transform, odd rounds behave like 1 round, even rounds undo it.)


In [6]:
KEY = b"gQ2JR&9;"

def bxform_byte(x: int, k: int) -> int:
    x = (x + k) & 0xFF
    x = x ^ k
    x = (x - k) & 0xFF
    return x

def bxform(data: bytes, key: bytes, rounds: int = 1) -> bytes:
    out = bytearray(data)
    for _ in range(rounds):
        for i in range(len(out)):
            k = key[i & 7]
            out[i] = bxform_byte(out[i], k)
    return bytes(out)

# Quick sanity: involution property on a short prefix
test = b"hello world" * 32
t1 = bxform(test, KEY, rounds=1)
t2 = bxform(t1, KEY, rounds=1)
assert t2 == test

print("byte-transform sanity: ok (transform is an involution)")
print("rounds=1 == rounds=5 ?", bxform(test, KEY, rounds=1) == bxform(test, KEY, rounds=5))
print("rounds=1 == rounds=3 ?", bxform(test, KEY, rounds=1) == bxform(test, KEY, rounds=3))
print("rounds=2 restores original ?", bxform(test, KEY, rounds=2) == test)

byte-transform sanity: ok (transform is an involution)
rounds=1 == rounds=5 ? True
rounds=1 == rounds=3 ? True
rounds=2 restores original ? True


In [7]:
# This arg-struct is sample-specific and comes from stage1 runtime data.
# For the Rapid7 sample it is stable and matches the described layout (5 regions).
ARG_STRUCT = [
    0x116A7, 0x5, 0x1000, 0x24000, 0x2D000,
    0x30000, 0x31000, 0x0, 0x0, 0x23000,
    0x8E00, 0xC00, 0x200, 0x1C00, 0x0,
    0x0, 0x400000, 0x0, 0x31000, 0x2C5D0,
    0x30001000, 0x100014C0, 0x100014D0, 0x30000000, 0x0,
]
arg_struct = ArgStruct.from_iterable(ARG_STRUCT)

patched_exe = OUTPUT_DIR / 'main_module_patched.exe'
mem_img = OUTPUT_DIR / 'main_module_mem.bin'

mat = MainModuleMaterializer(key=b'gQ2JR&9;', rounds=1)
info = mat.materialize(
    container_pe_path=CONTAINER_EXE,
    stage1_full_path=stage1_full,
    arg_struct=arg_struct,
    out_patched_pe_path=patched_exe,
    out_mem_image_path=mem_img,
)

print('patched_exe:', patched_exe, 'sha256:', info['patched_pe_sha256'], 'size:', patched_exe.stat().st_size)
print('mem_img:   ', mem_img, 'sha256:', info['mem_image_sha256'], 'size:', mem_img.stat().st_size)
print('regions:', [(hex(r), hex(s)) for (r,s) in info['regions']])


patched_exe: /Users/yakovgoldberg/Downloads/malware/output/main_module_patched.exe sha256: dcdafc1e261daaff1e4384835915b444fa7310fbeeb9cb1f2bd8024bde4db884 size: 950592
mem_img:    /Users/yakovgoldberg/Downloads/malware/output/main_module_mem.bin sha256: c9c03d293eedc8d9404b41416c79ea1e26e49ba8da3e841b1a86afb54d82a314 size: 942080
regions: [('0x1000', '0x23000'), ('0x24000', '0x8e00'), ('0x2d000', '0xc00'), ('0x30000', '0x200'), ('0x31000', '0x1c00')]


In [8]:
def parse_pe_minimal(path: Path):
    b = path.read_bytes()
    if b[:2] != b"MZ":
        raise ValueError("Not MZ")
    pe = struct.unpack_from("<I", b, 0x3C)[0]
    if b[pe:pe+4] != b"PE\0\0":
        raise ValueError("Not PE")
    coff = pe + 4
    opt = coff + 20
    magic = struct.unpack_from("<H", b, opt)[0]
    if magic != 0x10B:
        raise ValueError(f"Unexpected PE magic {magic:#x}")
    ep_rva = struct.unpack_from("<I", b, opt + 0x10)[0]
    imgbase = struct.unpack_from("<I", b, opt + 0x1C)[0]
    szimg = struct.unpack_from("<I", b, opt + 0x38)[0]
    return {
        "image_base": imgbase,
        "entry_rva": ep_rva,
        "entry_va": imgbase + ep_rva,
        "size_of_image": szimg,
    }

print("patched exe PE summary:")
print(parse_pe_minimal(patched_exe))

print("\nmem image PE summary:")
print(parse_pe_minimal(mem_img))

patched exe PE summary:
{'image_base': 4194304, 'entry_rva': 291248, 'entry_va': 4485552, 'size_of_image': 942080}

mem image PE summary:
{'image_base': 4194304, 'entry_rva': 291248, 'entry_va': 4485552, 'size_of_image': 942080}


## 4) Quick Visualizations

We'll compare entropy and a byte histogram for:
- encrypted input (`encrypted_shellcode.bin`)
- decrypted stage1 (`shellcode.bin`)
- decrypted main module memory image (`main_module_mem.bin`)

If `bokeh` is unavailable, we'll print text summaries instead.

In [9]:
samples = {
    "encrypted_shellcode.bin": ENC_SHELLCODE.read_bytes(),
    "stage1_shellcode.bin": stage1.read_bytes(),
    "main_module_mem.bin": mem_img.read_bytes(),
}

for name, b in samples.items():
    print(f"{name:24s} len={len(b):8d} sha256={sha256_bytes(b)} ent={shannon_entropy(b):.3f}")

try:
    from bokeh.io import output_notebook
    from bokeh.plotting import figure, show

    output_notebook()

    def hist_counts(b: bytes):
        c = [0] * 256
        for x in b:
            c[x] += 1
        return c

    p = figure(
        title="Byte Histogram (0..255)",
        width=1100,
        height=320,
        x_axis_label="byte value",
        y_axis_label="count",
        tools="pan,wheel_zoom,box_zoom,reset,save,hover",
    )
    p.toolbar.logo = None

    x = list(range(256))
    palette = [
        "#1f77b4",
        "#ff7f0e",
        "#2ca02c",
        "#d62728",
        "#9467bd",
        "#8c564b",
    ]
    for i, (name, b) in enumerate(samples.items()):
        y = hist_counts(b)
        p.line(x, y, line_width=2, alpha=0.85, legend_label=name, color=palette[i % len(palette)])

    p.legend.location = "top_right"
    p.legend.click_policy = "hide"
    show(p)

except Exception as e:
    print("bokeh not available:", e)
    print("(skipping plots)")

encrypted_shellcode.bin  len=  201096 sha256=77bfea78def679aa1117f569a35e8fd1542df21f7e00e27f192c907e61d63a2e ent=7.999
stage1_shellcode.bin     len=  201096 sha256=4416729d92e22ccb93e26c6896efe056b851a914727969f0ff604da4ef18ccfa ent=7.991
main_module_mem.bin      len=  942080 sha256=c9c03d293eedc8d9404b41416c79ea1e26e49ba8da3e841b1a86afb54d82a314 ent=6.983


## Next Steps (Manual RE)

If you're opening the decrypted main module in IDA:
- prefer `output/main_module_mem.bin` (memory image)
- or open `output/main_module_patched.exe` and jump to the EntryPoint.

You can also re-run the emulator in `--mode full` for additional runtime telemetry, but the analysis here intentionally avoids a full stage1 emulation because the shellcode uses exception-driven / anti-analysis control flow.
