Benchmark Challenges

Challenge data repository for benchmark-platform.

Structure

xbow/          # 74 challenges from xbow-validation-benchmarks
custom/        # 4 custom challenges (XSS, Auth, etc.)
argus/         # 60 challenges from argus-validation-benchmarks

Challenge Sources

Source	Count	Link	Description
xbow	74	xbow-validation-benchmarks	Web app vulnerabilities across diverse frameworks
argus	60	argus-validation-benchmarks	SSRF, XSS, SQLi, RCE, IDOR, deserialization (Next.js, Flask, Express, Go, Django, Rails, Spring Boot)
custom	4	—	Hand-crafted challenges for specific scenarios

Usage

This repo is consumed by benchmark-platform's Challenge Store feature. Challenges are automatically packaged and published as GitHub Release assets on each push.

You can browse and download challenges directly from the platform's Web UI sidebar (Challenge Store), or clone manually:

git clone https://github.com/wgpsec/benchmark-challenges /tmp/benchmarks
cp -r /tmp/benchmarks/xbow challenges/xbow
cp -r /tmp/benchmarks/custom challenges/custom
cp -r /tmp/benchmarks/argus challenges/argus

Adding a Challenge

Create a directory under the appropriate category: xbow/XBEN-XXX-24/, argus/APEX-XXX-25/, or custom/MY-CHALLENGE/
Include at minimum: docker-compose.yml, benchmark.json, .env
Push to main — the GitHub Action will package and publish it automatically

Create Your Own Challenge

Want to build a custom challenge? Two resources to help:

Format Specification — Complete reference for all required files, fields, and constraints. Copy this to any AI tool to generate compliant challenges.
Claude Code Skill — Interactive skill for Claude Code users. Just describe your attack scenario and it generates all files ready to import.

Quick start with AI:

1. Copy docs/challenge-spec.md content to your AI assistant
2. Describe the vulnerability scenario you want
3. Get a complete challenge directory generated
4. Run `docker compose up --build` to test locally
5. Pack as `category--name.zip` and import via the platform UI

Challenge Format

XBEN-001-24/
├── docker-compose.yml    # Required
├── benchmark.json        # Metadata (name, description, level, points)
├── benchmark.yaml        # Optional, multi-flag definitions
├── .env                  # FLAG environment variable
└── app/ mysql/ ...       # Application code

CI/CD

On every push to main, the GitHub Action:

Detects which challenge directories changed
Packs each changed challenge into a zip archive
Generates manifest.json listing all challenges
Uploads assets to the latest GitHub Release

Only changed challenges are re-packaged (incremental).

WgpSec Agentic Ecosystem

benchmark-challenges is the data layer of the WgpSec Agentic Ecosystem — providing real-world vulnerable environments for evaluating AI agent offensive capabilities.

┌───────────────────── WgpSec Agentic Ecosystem ─────────────────────┐
│                                                                     │
│  Knowledge ➜ Service ➜ Execution ➜ Evaluation                      │
│                                                                     │
│  AboutSecurity ──▶ context1337 ──▶ tchkiller ──▶ benchmark-platform │
│  (Knowledge Base)  (MCP Server)    (Pentest Agent)  (Platform)     │
│                                         ▲                           │
│                                    PoJun (General Solver)           │
│                                         │                           │
│                              benchmark-challenges (this repo)       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Project	Role
AboutSecurity	Structured pentest knowledge base (Skills, Dic, Payload, Vuln)
context1337	MCP Server — turns AboutSecurity into a searchable API for AI agents
tchkiller	Autonomous pentest agent with multi-round decision-making and team collaboration
benchmark-platform	CTF challenge platform for evaluating agent offensive capabilities
benchmark-challenges	Challenge data repository — packed & distributed via GitHub Releases
PoJun	General-purpose AI problem-solving engine (private)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.claude/skills		.claude/skills
.github/workflows		.github/workflows
argus		argus
custom		custom
docs		docs
xbow		xbow
.gitignore		.gitignore
README.md		README.md
README.zh-CN.md		README.zh-CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmark Challenges

Structure

Challenge Sources

Usage

Adding a Challenge

Create Your Own Challenge

Challenge Format

CI/CD

WgpSec Agentic Ecosystem

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Benchmark Challenges

Structure

Challenge Sources

Usage

Adding a Challenge

Create Your Own Challenge

Challenge Format

CI/CD

WgpSec Agentic Ecosystem

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages