Reverse-engineered Python implementation of the xcaptcha widget protocol
(xcaptcha.com). Replicates every HTTP exchange the browser does, builds the
encrypted fingerprint payload byte-for-byte, and exposes the flow as a CLI, a
Python package, and a small HTTP API.
The image-recognition step is intentionally left as a stub for you to plug your own OCR / vision model into — see Implementing OCR.
xcap/
├── solver/
│ ├── __init__.py
│ ├── crypto.py AES-256-GCM encrypt/decrypt
│ ├── fingerprint.py d-id/bfp builder, answer encoder
│ ├── submit.py HTTP layer (curl_cffi, edge101 TLS)
│ ├── task_solver.py OCR stub + manual fallback + helpers
│ ├── session.py Full flow orchestrator
│ ├── fp_gen.py Fingerprint generator
│ └── fingerprints/ Pre-generated fingerprint JSONs
├── canvas_hash.js Replicates app.js ge.get() canvas hash
├── decrypted_payload.json Reference payload (used as base for fp_gen)
├── main.py CLI entry point
├── server.py HTTP API (Flask)
├── decrypt.js Standalone payload decryptor (Node)
├── send.py One-shot test sender (re-encrypts and posts)
├── did_gen.py Standalone d-id encode/decode utility
└── xcaptcha.com.har Reference HAR capture
xcaptcha ships three obfuscated bundles, served from
static.xcaptcha.com:
api.js— the host-page widget. Builds the iframe URLs and tracks mouse-out coordinates (ther=parameter). Trivial to read.visitor.js— the fingerprint collector loaded inside the iframe. Heavily obfuscated with a string-decoder helper (_0x337a). Builds the fingerprint dict (fpCollect), encrypts it with AES-256-GCM and POSTs to/visitorand/watch.app.js— the Vue widget that drives the challenge UI, talks to/initand/task, packs the answer and emits the success token viapostMessage.
visitor.js exports an object _0x22712a with a 32-byte hex key
('RqcVY'). The string is split across seven obfuscated constants that
concatenate to:
1947903c1b 1ae59b7301 3040b06eb9 6ab05bf23c
1288e52919 c3ebe51440 dafd
The wire format is base64( iv[12] || ciphertext || authTag[16] ). WebCrypto
appends the auth tag to the ciphertext, so on the wire the layout is just
iv || ct_with_tag. See solver/crypto.py for the 30
lines that implement encrypt/decrypt.
app.js fe.get builds the header using four pieces:
fe = { get(cb) {
me.get().run((audio) => {
var canvas = ge.get("test1") === ge.get("test2") ? btoa(ge.get("test1")) : "";
var renderer = ge.info("test3").renderer;
var languages = navigator.languages.join(";");
var inner = [btoa(audio), canvas, btoa(renderer), languages].join(":");
cb(btoa(inner));
});
}};So d-id = btoa(b64(audio_sum) + ":" + b64(canvas_hash) + ":" + b64(gpu_renderer) + ":" + languages_raw).
The languages chunk is not base64'd — we found this by decoding a real
captured d-id and seeing the trailing chunk arrive as raw text.
app.js ge.get(id) paints fixed shapes + text on a hidden canvas, calls
toDataURL(), then computes a Java-style 32-bit hashCode:
o = (o << 5) - o + charCodeAt(r) | 0; // i.e. o = o*31 + c, wrapping int32Hash values differ across machines because canvas text renders with subtly
different anti-aliasing per OS / driver. We replicate this in
canvas_hash.js using the canvas npm package and a
seeded PRNG that nudges the radius / shadow blur / rotation just enough to
produce a different hash per fingerprint.
me.get uses an OfflineAudioContext(1, 44100, 44100) -> triangle
oscillator @ 10 kHz -> DynamicsCompressor(thr=-50, ratio=12) ->
getChannelData(0)[4500..4999]. The compressor attenuates the signal by
~8 dB; combined with band-limited resampling, the output is very close to a
pure sine. Real captures show mean(|x|) ≈ 0.248, peak ≈ 0.395,
abs_sum ≈ 124 — see solver/fp_gen.py for the
two-line generator that reproduces this exactly.
app.js TaskText.send builds the answer like this:
checked = { btoa("colxrow"): (row-1)*blocks_x + col, ... }; // exactly 2 keys
answer = btoa(JSON.stringify(checked));
fetch(`/captcha/{sk}/task/${answer}`, { headers: { "Wcaptcha-Key": task.key } });encode_answer() in solver/fingerprint.py does
exactly this.
The full flow lives in solver/session.py — eight HTTP
calls in the same order Edge fires them:
| # | Method | URL | Purpose |
|---|---|---|---|
| 1 | GET | api.xcaptcha.com/i |
latency probe (cosmetic) |
| 2 | GET | api-ping-2.xcaptcha.com:8443/ |
returns { token } for /watch |
| 3 | GET | api.xcaptcha.com/captcha/{sk}/ |
parses CAPTCHA_SESSION from HTML |
| 4 | POST | api.xcaptcha.com/visitor |
encrypted fingerprint -> visitor_id |
| 5 | POST | api2.xcaptcha.com/watch |
telemetry mirror |
| 6 | GET | api.xcaptcha.com/captcha/{sk}/init |
sends Bfp + Wparams headers |
| 7 | GET | api.xcaptcha.com/captcha/{sk}/task |
fetches challenge image |
| 8 | GET | api.xcaptcha.com/captcha/{sk}/task/{answer_b64} |
submits answer, returns token |
We use curl_cffi with
impersonate="edge101" so the TLS fingerprint matches a real Microsoft Edge
build. accept-language, sec-ch-ua-* and the cookie behaviour are picked
to match a Windows 10 + Edge 147 baseline.
# Generate 100 realistic fingerprints
python -m solver.fp_gen --100
# Or:
python -m solver.fp_gen -n 50
python -m solver.fp_gen -n 1 --show # print the first one to stdoutEach fingerprint varies:
- Canvas hash: distinct integer per seed, computed by
canvas_hash.js. - Audio samples: 10 kHz sine + per-machine float-precision noise, so
abs_summatches the observed Chrome distribution. - GPU: 15 real Windows 10/11 strings (AMD / NVIDIA / Intel).
- Screen / hardware / language / timezone: realistic combinations only.
- UA: Edge 136 .. 147 with matching
uaData.brands+appVersion. - Heap / storage / collectedAt: per-machine variance.
The challenge image is a single PNG containing a target band (top or
bottom) showing the assembled code with a noise/squiggle overlay, plus a
blocks.x × blocks.y grid of clean fragments. The widget asks the user to
pick the 2 fragments whose concatenation equals the target.
solve_ocr(task) in solver/task_solver.py is
intentionally left as a stub — drop in your favourite OCR / vision model.
A solve_manual(task) fallback that prompts on the terminal is provided
out of the box.
The framework gives you a clean numpy split via split_task_image(task):
from solver.task_solver import split_task_image
def solve_ocr(task):
target_arr, cells = split_task_image(task)
# target_arr : np.ndarray (H, W, 3) — the noisy target band
# cells : { (col, row) -> np.ndarray (H, W, 3) } — clean fragments
target_text = your_ocr(target_arr)
cell_texts = { k: your_ocr(v) for k, v in cells.items() }
for (a, ta), (b, tb) in pairs(cell_texts):
if ta + tb == target_text:
return [a, b]
if tb + ta == target_text:
return [b, a]
raise RuntimeError("no match")Suggested approaches (none of them require Tesseract):
- EasyOCR (PyTorch, ~150 MB models, downloaded on first run)
- PaddleOCR (smaller, faster on CPU)
- Custom CNN trained on captured (image, label) pairs
- Visual template matching — render each candidate concatenation as text and pixel-match against the squiggle-removed target
The target band is best read after a denoising pass (squiggle lines are much thinner than letter strokes, so morphological opening with a 2-3 px kernel removes them while preserving glyphs).
# Random fingerprint, auto-mode (OCR if implemented, otherwise manual)
python main.py
# Specific fingerprint
python main.py --fp solver/fingerprints/fp_0398945.json
# Force manual or OCR
python main.py --manual
python main.py --ocrpip install flask
python server.py # binds 0.0.0.0:8080
python server.py --port 9000Run the full flow. If your solve_ocr is implemented, returns the token in
one call:
POST /api/solve
{
"sitekey": "11aa62606fb968f3674742df60598957",
"lang": "en",
"fingerprint": "solver/fingerprints/fp_0398945.json"
}
200 OK
{ "success": true, "token": "res_54c33e05ce1e2db6fd3929c4381df09d-..." }If OCR is not implemented, the server pauses after step 7 and returns
the captcha image plus a job_id:
200 OK
{
"success": false,
"need_answer": true,
"job_id": "abc...",
"task": {
"type": "text",
"img": "<base64 PNG>",
"blocks": { "x": 2, "y": 4 },
"size": { "width": 300, "height": 458 }
}
}You then solve the image externally and submit:
POST /api/answer/abc...
{ "selections": [[2, 1], [1, 3]] } // (col, row), 1-indexed
200 OK
{ "success": true, "token": "res_..." }{
"ok": true,
"fingerprints": 100,
"ocr_implemented": false
}Jobs are stored in-memory with a 120-second TTL (matches the xcaptcha task
timeout). Single-process only — for production, swap JobStore for Redis.
pip install curl_cffi cryptography numpy flask
npm install canvas # only needed for fp_genTested on Python 3.11 + Node 20 (Windows 10 / Linux).
solver/crypto.py— 30 lines, AES-GCMsolver/fingerprint.py— d-id, answer encodersolver/submit.py— every HTTP call, one method eachsolver/session.py— orchestrates the 8 stepssolver/fp_gen.py— fingerprint generatorcanvas_hash.js— canvas-hash replica via node-canvasserver.py— Flask API with the manual-answer flow
For research, security testing, and CTF-style work only. Don't be a jerk.