Skip to content

tuffuserofpython/xcaptcha-captcha-solver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XCAPTCHA captcha solver

Reverse-engineered Python implementation of the xcaptcha widget protocol (xcaptcha.com). Replicates every HTTP exchange the browser does, builds the encrypted fingerprint payload byte-for-byte, and exposes the flow as a CLI, a Python package, and a small HTTP API.

The image-recognition step is intentionally left as a stub for you to plug your own OCR / vision model into — see Implementing OCR.


Repository layout

xcap/
├── solver/
│   ├── __init__.py
│   ├── crypto.py            AES-256-GCM encrypt/decrypt
│   ├── fingerprint.py       d-id/bfp builder, answer encoder
│   ├── submit.py            HTTP layer (curl_cffi, edge101 TLS)
│   ├── task_solver.py       OCR stub + manual fallback + helpers
│   ├── session.py           Full flow orchestrator
│   ├── fp_gen.py            Fingerprint generator
│   └── fingerprints/        Pre-generated fingerprint JSONs
├── canvas_hash.js           Replicates app.js ge.get() canvas hash
├── decrypted_payload.json   Reference payload (used as base for fp_gen)
├── main.py                  CLI entry point
├── server.py                HTTP API (Flask)
├── decrypt.js               Standalone payload decryptor (Node)
├── send.py                  One-shot test sender (re-encrypts and posts)
├── did_gen.py               Standalone d-id encode/decode utility
└── xcaptcha.com.har         Reference HAR capture

How we reversed it

xcaptcha ships three obfuscated bundles, served from static.xcaptcha.com:

  1. api.js — the host-page widget. Builds the iframe URLs and tracks mouse-out coordinates (the r= parameter). Trivial to read.
  2. visitor.js — the fingerprint collector loaded inside the iframe. Heavily obfuscated with a string-decoder helper (_0x337a). Builds the fingerprint dict (fpCollect), encrypts it with AES-256-GCM and POSTs to /visitor and /watch.
  3. app.js — the Vue widget that drives the challenge UI, talks to /init and /task, packs the answer and emits the success token via postMessage.

Encryption

visitor.js exports an object _0x22712a with a 32-byte hex key ('RqcVY'). The string is split across seven obfuscated constants that concatenate to:

1947903c1b 1ae59b7301 3040b06eb9 6ab05bf23c
1288e52919 c3ebe51440 dafd

The wire format is base64( iv[12] || ciphertext || authTag[16] ). WebCrypto appends the auth tag to the ciphertext, so on the wire the layout is just iv || ct_with_tag. See solver/crypto.py for the 30 lines that implement encrypt/decrypt.

d-id / Bfp header

app.js fe.get builds the header using four pieces:

fe = { get(cb) {
  me.get().run((audio) => {
    var canvas    = ge.get("test1") === ge.get("test2") ? btoa(ge.get("test1")) : "";
    var renderer  = ge.info("test3").renderer;
    var languages = navigator.languages.join(";");
    var inner     = [btoa(audio), canvas, btoa(renderer), languages].join(":");
    cb(btoa(inner));
  });
}};

So d-id = btoa(b64(audio_sum) + ":" + b64(canvas_hash) + ":" + b64(gpu_renderer) + ":" + languages_raw). The languages chunk is not base64'd — we found this by decoding a real captured d-id and seeing the trailing chunk arrive as raw text.

Canvas hash

app.js ge.get(id) paints fixed shapes + text on a hidden canvas, calls toDataURL(), then computes a Java-style 32-bit hashCode:

o = (o << 5) - o + charCodeAt(r) | 0;     // i.e. o = o*31 + c, wrapping int32

Hash values differ across machines because canvas text renders with subtly different anti-aliasing per OS / driver. We replicate this in canvas_hash.js using the canvas npm package and a seeded PRNG that nudges the radius / shadow blur / rotation just enough to produce a different hash per fingerprint.

Audio fingerprint

me.get uses an OfflineAudioContext(1, 44100, 44100) -> triangle oscillator @ 10 kHz -> DynamicsCompressor(thr=-50, ratio=12) -> getChannelData(0)[4500..4999]. The compressor attenuates the signal by ~8 dB; combined with band-limited resampling, the output is very close to a pure sine. Real captures show mean(|x|) ≈ 0.248, peak ≈ 0.395, abs_sum ≈ 124 — see solver/fp_gen.py for the two-line generator that reproduces this exactly.

Task answer

app.js TaskText.send builds the answer like this:

checked = { btoa("colxrow"): (row-1)*blocks_x + col, ... };   // exactly 2 keys
answer  = btoa(JSON.stringify(checked));
fetch(`/captcha/{sk}/task/${answer}`, { headers: { "Wcaptcha-Key": task.key } });

encode_answer() in solver/fingerprint.py does exactly this.


How the solver works

The full flow lives in solver/session.py — eight HTTP calls in the same order Edge fires them:

# Method URL Purpose
1 GET api.xcaptcha.com/i latency probe (cosmetic)
2 GET api-ping-2.xcaptcha.com:8443/ returns { token } for /watch
3 GET api.xcaptcha.com/captcha/{sk}/ parses CAPTCHA_SESSION from HTML
4 POST api.xcaptcha.com/visitor encrypted fingerprint -> visitor_id
5 POST api2.xcaptcha.com/watch telemetry mirror
6 GET api.xcaptcha.com/captcha/{sk}/init sends Bfp + Wparams headers
7 GET api.xcaptcha.com/captcha/{sk}/task fetches challenge image
8 GET api.xcaptcha.com/captcha/{sk}/task/{answer_b64} submits answer, returns token

We use curl_cffi with impersonate="edge101" so the TLS fingerprint matches a real Microsoft Edge build. accept-language, sec-ch-ua-* and the cookie behaviour are picked to match a Windows 10 + Edge 147 baseline.

Fingerprint generator

# Generate 100 realistic fingerprints
python -m solver.fp_gen --100

# Or:
python -m solver.fp_gen -n 50
python -m solver.fp_gen -n 1 --show       # print the first one to stdout

Each fingerprint varies:

  • Canvas hash: distinct integer per seed, computed by canvas_hash.js.
  • Audio samples: 10 kHz sine + per-machine float-precision noise, so abs_sum matches the observed Chrome distribution.
  • GPU: 15 real Windows 10/11 strings (AMD / NVIDIA / Intel).
  • Screen / hardware / language / timezone: realistic combinations only.
  • UA: Edge 136 .. 147 with matching uaData.brands + appVersion.
  • Heap / storage / collectedAt: per-machine variance.

Image solver (OCR)

The challenge image is a single PNG containing a target band (top or bottom) showing the assembled code with a noise/squiggle overlay, plus a blocks.x × blocks.y grid of clean fragments. The widget asks the user to pick the 2 fragments whose concatenation equals the target.

solve_ocr(task) in solver/task_solver.py is intentionally left as a stub — drop in your favourite OCR / vision model. A solve_manual(task) fallback that prompts on the terminal is provided out of the box.

Implementing OCR

The framework gives you a clean numpy split via split_task_image(task):

from solver.task_solver import split_task_image

def solve_ocr(task):
    target_arr, cells = split_task_image(task)
    # target_arr : np.ndarray (H, W, 3) — the noisy target band
    # cells      : { (col, row) -> np.ndarray (H, W, 3) } — clean fragments

    target_text = your_ocr(target_arr)
    cell_texts  = { k: your_ocr(v) for k, v in cells.items() }

    for (a, ta), (b, tb) in pairs(cell_texts):
        if ta + tb == target_text:
            return [a, b]
        if tb + ta == target_text:
            return [b, a]
    raise RuntimeError("no match")

Suggested approaches (none of them require Tesseract):

  • EasyOCR (PyTorch, ~150 MB models, downloaded on first run)
  • PaddleOCR (smaller, faster on CPU)
  • Custom CNN trained on captured (image, label) pairs
  • Visual template matching — render each candidate concatenation as text and pixel-match against the squiggle-removed target

The target band is best read after a denoising pass (squiggle lines are much thinner than letter strokes, so morphological opening with a 2-3 px kernel removes them while preserving glyphs).


CLI usage

# Random fingerprint, auto-mode (OCR if implemented, otherwise manual)
python main.py

# Specific fingerprint
python main.py --fp solver/fingerprints/fp_0398945.json

# Force manual or OCR
python main.py --manual
python main.py --ocr

HTTP API

pip install flask
python server.py                    # binds 0.0.0.0:8080
python server.py --port 9000

POST /api/solve

Run the full flow. If your solve_ocr is implemented, returns the token in one call:

POST /api/solve
{
  "sitekey": "11aa62606fb968f3674742df60598957",
  "lang":    "en",
  "fingerprint": "solver/fingerprints/fp_0398945.json"
}

200 OK
{ "success": true, "token": "res_54c33e05ce1e2db6fd3929c4381df09d-..." }

If OCR is not implemented, the server pauses after step 7 and returns the captcha image plus a job_id:

200 OK
{
  "success": false,
  "need_answer": true,
  "job_id": "abc...",
  "task": {
    "type":   "text",
    "img":    "<base64 PNG>",
    "blocks": { "x": 2, "y": 4 },
    "size":   { "width": 300, "height": 458 }
  }
}

You then solve the image externally and submit:

POST /api/answer/<job_id>

POST /api/answer/abc...
{ "selections": [[2, 1], [1, 3]] }      // (col, row), 1-indexed

200 OK
{ "success": true, "token": "res_..." }

GET /api/health

{
  "ok": true,
  "fingerprints": 100,
  "ocr_implemented": false
}

Jobs are stored in-memory with a 120-second TTL (matches the xcaptcha task timeout). Single-process only — for production, swap JobStore for Redis.


Installation

pip install curl_cffi cryptography numpy flask
npm install canvas                  # only needed for fp_gen

Tested on Python 3.11 + Node 20 (Windows 10 / Linux).


Files of interest, in increasing order of complexity

  1. solver/crypto.py — 30 lines, AES-GCM
  2. solver/fingerprint.py — d-id, answer encoder
  3. solver/submit.py — every HTTP call, one method each
  4. solver/session.py — orchestrates the 8 steps
  5. solver/fp_gen.py — fingerprint generator
  6. canvas_hash.js — canvas-hash replica via node-canvas
  7. server.py — Flask API with the manual-answer flow

Disclaimer

For research, security testing, and CTF-style work only. Don't be a jerk.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors