XCAPTCHA captcha solver

Reverse-engineered Python implementation of the xcaptcha widget protocol (xcaptcha.com). Replicates every HTTP exchange the browser does, builds the encrypted fingerprint payload byte-for-byte, and exposes the flow as a CLI, a Python package, and a small HTTP API.

The image-recognition step is intentionally left as a stub for you to plug your own OCR / vision model into — see Implementing OCR.

Repository layout

xcap/
├── solver/
│   ├── __init__.py
│   ├── crypto.py            AES-256-GCM encrypt/decrypt
│   ├── fingerprint.py       d-id/bfp builder, answer encoder
│   ├── submit.py            HTTP layer (curl_cffi, edge101 TLS)
│   ├── task_solver.py       OCR stub + manual fallback + helpers
│   ├── session.py           Full flow orchestrator
│   ├── fp_gen.py            Fingerprint generator
│   └── fingerprints/        Pre-generated fingerprint JSONs
├── canvas_hash.js           Replicates app.js ge.get() canvas hash
├── decrypted_payload.json   Reference payload (used as base for fp_gen)
├── main.py                  CLI entry point
├── server.py                HTTP API (Flask)
├── decrypt.js               Standalone payload decryptor (Node)
├── send.py                  One-shot test sender (re-encrypts and posts)
├── did_gen.py               Standalone d-id encode/decode utility
└── xcaptcha.com.har         Reference HAR capture

How we reversed it

xcaptcha ships three obfuscated bundles, served from static.xcaptcha.com:

api.js — the host-page widget. Builds the iframe URLs and tracks mouse-out coordinates (the r= parameter). Trivial to read.
visitor.js — the fingerprint collector loaded inside the iframe. Heavily obfuscated with a string-decoder helper (_0x337a). Builds the fingerprint dict (fpCollect), encrypts it with AES-256-GCM and POSTs to /visitor and /watch.
app.js — the Vue widget that drives the challenge UI, talks to /init and /task, packs the answer and emits the success token via postMessage.

Encryption

visitor.js exports an object _0x22712a with a 32-byte hex key ('RqcVY'). The string is split across seven obfuscated constants that concatenate to:

1947903c1b 1ae59b7301 3040b06eb9 6ab05bf23c
1288e52919 c3ebe51440 dafd

The wire format is base64( iv[12] || ciphertext || authTag[16] ). WebCrypto appends the auth tag to the ciphertext, so on the wire the layout is just iv || ct_with_tag. See solver/crypto.py for the 30 lines that implement encrypt/decrypt.

`d-id` / `Bfp` header

app.js fe.get builds the header using four pieces:

fe = { get(cb) {
  me.get().run((audio) => {
    var canvas    = ge.get("test1") === ge.get("test2") ? btoa(ge.get("test1")) : "";
    var renderer  = ge.info("test3").renderer;
    var languages = navigator.languages.join(";");
    var inner     = [btoa(audio), canvas, btoa(renderer), languages].join(":");
    cb(btoa(inner));
  });
}};

So d-id = btoa(b64(audio_sum) + ":" + b64(canvas_hash) + ":" + b64(gpu_renderer) + ":" + languages_raw). The languages chunk is not base64'd — we found this by decoding a real captured d-id and seeing the trailing chunk arrive as raw text.

Canvas hash

app.js ge.get(id) paints fixed shapes + text on a hidden canvas, calls toDataURL(), then computes a Java-style 32-bit hashCode:

o = (o << 5) - o + charCodeAt(r) | 0;     // i.e. o = o*31 + c, wrapping int32

Hash values differ across machines because canvas text renders with subtly different anti-aliasing per OS / driver. We replicate this in canvas_hash.js using the canvas npm package and a seeded PRNG that nudges the radius / shadow blur / rotation just enough to produce a different hash per fingerprint.

Audio fingerprint

me.get uses an OfflineAudioContext(1, 44100, 44100) -> triangle oscillator @ 10 kHz -> DynamicsCompressor(thr=-50, ratio=12) -> getChannelData(0)[4500..4999]. The compressor attenuates the signal by ~8 dB; combined with band-limited resampling, the output is very close to a pure sine. Real captures show mean(|x|) ≈ 0.248, peak ≈ 0.395, abs_sum ≈ 124 — see solver/fp_gen.py for the two-line generator that reproduces this exactly.

Task answer

app.js TaskText.send builds the answer like this:

checked = { btoa("colxrow"): (row-1)*blocks_x + col, ... };   // exactly 2 keys
answer  = btoa(JSON.stringify(checked));
fetch(`/captcha/{sk}/task/${answer}`, { headers: { "Wcaptcha-Key": task.key } });

encode_answer() in solver/fingerprint.py does exactly this.

How the solver works

The full flow lives in solver/session.py — eight HTTP calls in the same order Edge fires them:

#	Method	URL	Purpose
1	GET	`api.xcaptcha.com/i`	latency probe (cosmetic)
2	GET	`api-ping-2.xcaptcha.com:8443/`	returns `{ token }` for `/watch`
3	GET	`api.xcaptcha.com/captcha/{sk}/`	parses `CAPTCHA_SESSION` from HTML
4	POST	`api.xcaptcha.com/visitor`	encrypted fingerprint -> `visitor_id`
5	POST	`api2.xcaptcha.com/watch`	telemetry mirror
6	GET	`api.xcaptcha.com/captcha/{sk}/init`	sends `Bfp` + `Wparams` headers
7	GET	`api.xcaptcha.com/captcha/{sk}/task`	fetches challenge image
8	GET	`api.xcaptcha.com/captcha/{sk}/task/{answer_b64}`	submits answer, returns token

We use curl_cffi with impersonate="edge101" so the TLS fingerprint matches a real Microsoft Edge build. accept-language, sec-ch-ua-* and the cookie behaviour are picked to match a Windows 10 + Edge 147 baseline.

Fingerprint generator

# Generate 100 realistic fingerprints
python -m solver.fp_gen --100

# Or:
python -m solver.fp_gen -n 50
python -m solver.fp_gen -n 1 --show       # print the first one to stdout

Each fingerprint varies:

Canvas hash: distinct integer per seed, computed by canvas_hash.js.
Audio samples: 10 kHz sine + per-machine float-precision noise, so abs_sum matches the observed Chrome distribution.
GPU: 15 real Windows 10/11 strings (AMD / NVIDIA / Intel).
Screen / hardware / language / timezone: realistic combinations only.
UA: Edge 136 .. 147 with matching uaData.brands + appVersion.
Heap / storage / collectedAt: per-machine variance.

Image solver (OCR)

The challenge image is a single PNG containing a target band (top or bottom) showing the assembled code with a noise/squiggle overlay, plus a blocks.x × blocks.y grid of clean fragments. The widget asks the user to pick the 2 fragments whose concatenation equals the target.

solve_ocr(task) in solver/task_solver.py is intentionally left as a stub — drop in your favourite OCR / vision model. A solve_manual(task) fallback that prompts on the terminal is provided out of the box.

Implementing OCR

The framework gives you a clean numpy split via split_task_image(task):

from solver.task_solver import split_task_image

def solve_ocr(task):
    target_arr, cells = split_task_image(task)
    # target_arr : np.ndarray (H, W, 3) — the noisy target band
    # cells      : { (col, row) -> np.ndarray (H, W, 3) } — clean fragments

    target_text = your_ocr(target_arr)
    cell_texts  = { k: your_ocr(v) for k, v in cells.items() }

    for (a, ta), (b, tb) in pairs(cell_texts):
        if ta + tb == target_text:
            return [a, b]
        if tb + ta == target_text:
            return [b, a]
    raise RuntimeError("no match")

Suggested approaches (none of them require Tesseract):

EasyOCR (PyTorch, ~150 MB models, downloaded on first run)
PaddleOCR (smaller, faster on CPU)
Custom CNN trained on captured (image, label) pairs
Visual template matching — render each candidate concatenation as text and pixel-match against the squiggle-removed target

The target band is best read after a denoising pass (squiggle lines are much thinner than letter strokes, so morphological opening with a 2-3 px kernel removes them while preserving glyphs).

CLI usage

# Random fingerprint, auto-mode (OCR if implemented, otherwise manual)
python main.py

# Specific fingerprint
python main.py --fp solver/fingerprints/fp_0398945.json

# Force manual or OCR
python main.py --manual
python main.py --ocr

HTTP API

pip install flask
python server.py                    # binds 0.0.0.0:8080
python server.py --port 9000

`POST /api/solve`

Run the full flow. If your solve_ocr is implemented, returns the token in one call:

POST /api/solve
{
  "sitekey": "11aa62606fb968f3674742df60598957",
  "lang":    "en",
  "fingerprint": "solver/fingerprints/fp_0398945.json"
}

200 OK
{ "success": true, "token": "res_54c33e05ce1e2db6fd3929c4381df09d-..." }

If OCR is not implemented, the server pauses after step 7 and returns the captcha image plus a job_id:

200 OK
{
  "success": false,
  "need_answer": true,
  "job_id": "abc...",
  "task": {
    "type":   "text",
    "img":    "<base64 PNG>",
    "blocks": { "x": 2, "y": 4 },
    "size":   { "width": 300, "height": 458 }
  }
}

You then solve the image externally and submit:

`POST /api/answer/<job_id>`

POST /api/answer/abc...
{ "selections": [[2, 1], [1, 3]] }      // (col, row), 1-indexed

200 OK
{ "success": true, "token": "res_..." }

`GET /api/health`

{
  "ok": true,
  "fingerprints": 100,
  "ocr_implemented": false
}

Jobs are stored in-memory with a 120-second TTL (matches the xcaptcha task timeout). Single-process only — for production, swap JobStore for Redis.

Installation

pip install curl_cffi cryptography numpy flask
npm install canvas                  # only needed for fp_gen

Tested on Python 3.11 + Node 20 (Windows 10 / Linux).

Files of interest, in increasing order of complexity

solver/crypto.py — 30 lines, AES-GCM
solver/fingerprint.py — d-id, answer encoder
solver/submit.py — every HTTP call, one method each
solver/session.py — orchestrates the 8 steps
solver/fp_gen.py — fingerprint generator
canvas_hash.js — canvas-hash replica via node-canvas
server.py — Flask API with the manual-answer flow

Disclaimer

For research, security testing, and CTF-style work only. Don't be a jerk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XCAPTCHA captcha solver

Repository layout

How we reversed it

Encryption

`d-id` / `Bfp` header

Canvas hash

Audio fingerprint

Task answer

How the solver works

Fingerprint generator

Image solver (OCR)

Implementing OCR

CLI usage

HTTP API

`POST /api/solve`

`POST /api/answer/<job_id>`

`GET /api/health`

Installation

Files of interest, in increasing order of complexity

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
solver		solver
README.md		README.md
canvas_hash.js		canvas_hash.js
decrypt.js		decrypt.js
decrypted_payload.json		decrypted_payload.json
did_gen.py		did_gen.py
main.py		main.py
server.py		server.py
visitor.js		visitor.js

Folders and files

Latest commit

History

Repository files navigation

XCAPTCHA captcha solver

Repository layout

How we reversed it

Encryption

d-id / Bfp header

Canvas hash

Audio fingerprint

Task answer

How the solver works

Fingerprint generator

Image solver (OCR)

Implementing OCR

CLI usage

HTTP API

POST /api/solve

POST /api/answer/<job_id>

GET /api/health

Installation

Files of interest, in increasing order of complexity

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`d-id` / `Bfp` header

`POST /api/solve`

`POST /api/answer/<job_id>`

`GET /api/health`

Packages