Add pickle package for generating exploit payloads#587
Conversation
The package mirrors the dotnet package's gadget-creation API but for
Python pickle. Two layers:
- Low-level opcode emitters (Proto, Global, BinUnicode, Tuple1, ...)
plus value encoders (EncodeString, EncodeTuple, EncodeAny) named
after CPython's Lib/pickle.py opcodes for easy cross-reference.
- Fragment composition API (Str, Int, Bool, TupleOf, ListOf, DictOf,
Call, CallFragment, Method, Build, Dump) where every primitive
returns a Fragment that pushes one value onto the unpickler stack.
Calls compose freely, so nested gadgets like
eval(open(path).read()) build naturally.
CreateOSSystem / CreateNTSystem / CreateExec / CreateEval /
CreateSubprocessPopen wrap the typical RCE shapes; the legacy
Reduce(module, attr, []any) entry point covers callers that have
untyped Go values.
A pure-Go Disassemble walks a stream into named opcode list, used by
the test suite to validate gadget structure without ever calling
pickle.loads on attacker-shaped bytes.
The package locks to protocol 2 by default: supported by every Python
2.3+ and 3.x release, no protocol-4-only opcodes (SHORT_BINUNICODE,
STACK_GLOBAL, MEMOIZE, FRAME) needed for typical gadgets, and the
emitted bytes are deterministic across CPython versions.
21 tests covering the low-level emitters with byte-exact golden hex,
the high-level shortcuts, the Fragment composition API including a
nested call chain, dict and BUILD shapes, and the disassembler error
paths.
Signed-off-by: Valentin Lobstein <281638514+vlobstein-vc@users.noreply.github.com>
Audit pass to remove duplication: - values.go and the legacy Reduce(module, attr, []any) in gadgets.go duplicated the Fragment API. They are gone; the typed Dump(Call(module, attr, ...)) path is the single way to compose. - Length-prefixed emitters (BinUnicode, BinUnicode8, ShortBinUnicode, BinBytes, ShortBinBytes) now share a single emitLengthPrefixed helper instead of repeating the make/copy ceremony five times. - Single-byte opcode helpers (Stop, Mark, Tuple, Tuple1, Tuple2, Tuple3, EmptyTuple, EmptyList, EmptyDict, ReduceOp, Appends) route through one tiny `single` constructor. - Disasm collapses six prefixed string/bytes decoders into two (decodeStringArg / decodeBytesArg) backed by a shared readPrefixedBytes that itself reuses readInt for the length. - Fragment composites (TupleOf, ListOf, DictOf, CallFragment, Method, Build) drop the bytes.Buffer ceremony in favour of a small concatFragments helper that allocates exactly once. Net diff is roughly -450 lines for the same public API surface (modulo the dropped Encode* / legacy Reduce, which had no callers). Tests stay 14 functions and pass; lint stays at zero issues. Signed-off-by: Valentin Lobstein <281638514+vlobstein-vc@users.noreply.github.com>
Doc strings on every exported symbol were stating the obvious. Tightened to one-liners where the function name + signature already communicate the behaviour, kept the why on each helper that has a non-obvious choice (protocol gating, panic conditions, opcode shape). Signed-off-by: Valentin Lobstein <281638514+vlobstein-vc@users.noreply.github.com>
The decoder had a 41-line switch with one error-wrapping branch per byte count and a separate signed-int path on the 4-byte case. Replace with a single readUint that reads N little-endian bytes into a uint64 through one io.ReadFull plus a tiny shift loop. Sign-extension moves to decodeIntArg where it actually matters (BININT). readPrefixedBytes now shares the same primitive. Net: disasm.go drops from 253 to 233 lines, all 17 tests pass. Signed-off-by: Valentin Lobstein <281638514+vlobstein-vc@users.noreply.github.com>
decodeOp had a 30-line switch that mapped each opcode to (name, byteCount, signed) by hand. Replace with a single argOps map[byte]argSpec and one tiny dispatcher: lookup, call the spec's reader, wrap the arg into Op. Adding a new opcode now means one map entry instead of one switch case plus one helper. Net: same LOC (237 vs 233), but the source of truth for "which opcodes carry args, what shape, what name" is one block of data rather than scattered across switch cases and per-shape helper signatures. Signed-off-by: Valentin Lobstein <281638514+vlobstein-vc@users.noreply.github.com>
ListOf and DictOf shared the same shape (empty case + MARK..close case) with only the opener / closer opcodes and the item-flatten step differing. Extract collection(emptyOp, closeOp, items) so both end up as 3-line wrappers. A future SetOf (proto-4 EMPTY_SET / ADDITEMS) drops in as one collection call. TupleOf's per-arity switch becomes a tupleSmallClosers byte array indexed by len(elems); the large-arity branch keeps the MARK..TUPLE fallback. Same byte output, less dispatch code. Signed-off-by: Valentin Lobstein <281638514+vlobstein-vc@users.noreply.github.com>
Stop / Mark / ReduceOp / EmptyTuple / EmptyList / EmptyDict / Appends /
Tuple / Tuple1..3 were one-line returns of []byte{OpXxx}. Internal
callers already use Fragment{OpXxx} directly, so these wrappers were
exported-but-dead. Drop them along with TestSingleByteOpcodes; the byte
constants stay and serve any caller that needs the byte form.
BinUnicode8 covers strings >4 GiB; gadget payloads never reach that
size and at protocol 2 the opcode would not even decode. Drop.
Signed-off-by: Valentin Lobstein <281638514+vlobstein-vc@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new pickle/ package to generate attacker-controlled Python pickle byte streams (protocol 2 by default) and to disassemble pickle streams into named opcodes for inspection/testing, aligning with existing deserialization-sink support in the framework.
Changes:
- Introduces low-level opcode emitters (
Proto,Global,BinUnicode, integer encoders, etc.) and higher-level composableFragmentbuilders (Str,Int,TupleOf,ListOf,DictOf,Call,Method,Build,Dump). - Adds common exploit “shortcut” gadgets (e.g.,
CreateOSSystem,CreateExec,CreateSubprocessPopen). - Adds a pure-Go disassembler and a golden/disassembly-based test suite for deterministic verification without needing Python at test time.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
pickle/pickle.go |
Package docs and protocol constant for the new pickle payload generator. |
pickle/opcodes.go |
Low-level opcode constants and byte emitters for pickle encoding. |
pickle/fragments.go |
High-level fragment composition API and Dump wrapper. |
pickle/gadgets.go |
Convenience gadget constructors for common RCE primitives. |
pickle/disasm.go |
Pure-Go pickle disassembler used by tests and for safe inspection. |
pickle/pickle_test.go |
Golden-hex and disassembly-shape tests covering emitters, gadgets, and disassembler behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…add BinUnicode8 + BinBytes8 Signed-off-by: Valentin Lobstein <281638514+vlobstein-vc@users.noreply.github.com>
…ary-side panics Signed-off-by: Valentin Lobstein <281638514+vlobstein-vc@users.noreply.github.com>
Signed-off-by: Valentin Lobstein <281638514+vlobstein-vc@users.noreply.github.com>
lobsterjerusalem
left a comment
There was a problem hiding this comment.
Just added the bit about standardizing the length checking throughout.
Btw this is awesome, it was actually something on my TODO list that I can knock off now 👍.
… (value, bool) through wrappers Signed-off-by: Valentin Lobstein <281638514+vlobstein-vc@users.noreply.github.com>
lobsterjerusalem
left a comment
There was a problem hiding this comment.
Made a slight change to the pickle docs for example usage to match the update. But looks good to go.
Summary
The framework already ships dedicated packages for the deserialization sinks it sees most:
dotnet/for .NET BinaryFormatter / SOAPFormatter andjava/gadgets/for Java. Python pickle has not had the same treatment. This PR fills that gap with apickle/package modelled on the existing dotnet API.og-rek already exposes a
Call{Callable: Class{Module, Name}, Args: Tuple{...}}type whose encoder emitsGLOBAL + tuple + REDUCE, so the byte-level capability has been there. The package this PR ships re-implements that primitive in-tree to keep the framework dep-free and adds an offsec-shaped API on top:Low-level opcode emitters (
Proto,Global,BinUnicode,Tuple1/2/3, ...) named afterLib/pickle.pyfor easy cross-reference, plus a length-prefix helper that drivesBinUnicode / BinUnicode8 / ShortBinUnicode / BinBytes / ShortBinBytesfrom a single function.Typed Fragment composition: primitives
Str/Int/Bool/Bytes/None, compositesTupleOf/ListOf/DictOf, callablesCall/CallFragment/Method/Build, top-level wrapperDump. Every primitive returns aFragmentthat pushes one value onto the unpickler stack, and Fragments nest arbitrarily.Create*shortcuts ship for the canonical gadgets:CreateOSSystem,CreateNTSystem,CreateExec,CreateEval,CreateSubprocessPopen.Method/Buildcover thegetattrand__setstate__chain patterns that go-exploit modules will reach for.A pure-Go
Disassemblewalks a stream into named opcodes; the test suite uses it instead of callingpickle.loadson attacker-shaped bytes. 17 tests, all golden hex or disassembly round-trip, no Python interpreter required at test time.Default protocol is 2: universally supported across Python 2.3+ and 3.x, deterministic output across CPython versions.