4thwall – Transformer-based Hybrid WAF

A hybrid Web Application Firewall that combines ModSecurity (rule-based) with a Transformer (DistilBERT) classifier in a two-layer inline defense pipeline. The ML model is hosted on HuggingFace Hub and downloaded automatically at startup.

Architecture

                    ┌─────────────────┐
                    │     Client      │
                    └────────┬────────┘
                             │
                       port 80
                             │
                             ▼
                    ┌─────────────────┐
                    │     Nginx       │
                    │  + ModSecurity  │     Layer 1: Rule-based detection
                    │  (CRS rules)   │
                    └────────┬────────┘
                             │
                   ┌─────────┴─────────┐
                   │                   │
             MALICIOUS            BENIGN
                   │                   │
                   ▼                   ▼
            403 Forbidden     ┌─────────────────┐
            + log to          │   FastAPI API    │
            access.log        │  WAFClassifier   │     Layer 2: ML-based detection
                              │  (DistilBERT)    │
                              └────────┬─────────┘
                                       │
                             ┌─────────┴─────────┐
                             │                   │
                       MALICIOUS            BENIGN
                             │                   │
                             ▼                   ▼
                      403 Forbidden       ┌──────────────┐
                      + log to            │     DVWA     │
                      decision_log        │  (backend)   │
                                          └──────────────┘

        ┌─────────────────┐
        │  Background:    │     Tails access.log every 60s,
        │  modsec_block   │     logs ModSec 403s to decision_log
        │  _logger        │     so they appear in the dashboard.
        └────────┬────────┘
                 │
                 ▼
        decision_log.json
        (BLOCK/ALERT/ALLOW + blocked_by: modsec | ml_model)
                 │
                 ▼
        ┌─────────────────┐
        │  Dashboard UI   │     port 3000
        │  (React + Vite) │
        └─────────────────┘

Request Flow

Client → Nginx/ModSecurity (port 80) — CRS rules evaluate the request.
- If malicious → Nginx returns 403 Forbidden directly. The background modsec_block_logger picks up the 403 from access.log and writes it to decision_log.json with blocked_by: "modsec".
- If benign → request is forwarded to the API.
API (ML scoring) — The FastAPI catch-all route scores the request with the DistilBERT WAF classifier.
- If malicious (ML score > 0.9) → returns 403 Forbidden, logs to decision_log.json with blocked_by: "ml_model".
- If medium risk (ML score > 0.6) → logs an ALERT but allows the request through.
- If benign → proxies the request to DVWA and returns the response.
Dashboard (port 3000) — Reads decision_log.json and shows live traffic, attack distribution (ModSec vs ML blocks), and AI model status.

Prerequisites

Docker and Docker Compose

For local development (training only):

Python 3.10+

Quick Start

Clone and enter the project:
```
cd 4thwall
```
Copy environment template:
```
copy .env.example .env
```
Edit .env if you need different settings.
Build and start all services:
```
docker compose up --build -d
```
On first startup, the API container downloads the model from HuggingFace Hub (~260MB). Subsequent starts use the cached model from a Docker volume.
Access the stack:

Service URL Notes

Nginx (WAF) http://localhost:80 Entry point — all DVWA traffic goes through ModSec + ML.

Dashboard UI http://localhost:3000 Live traffic, attack stats, AI model status.

Note: The API (port 8000) and DVWA (port 80 inside container) are internal only — not exposed to the host. All traffic must go through Nginx on port 80.
Useful commands:
- View logs: docker compose logs -f
- Stop: docker compose down
- Rebuild API only: docker compose build api

Model

The WAF classifier is a fine-tuned DistilBERT model hosted at viitheone/waf-model on HuggingFace Hub.

Pipeline: Text Classification (binary: benign vs malicious)
Format: Safetensors
Download: Automatic at API startup via transformers.AutoModelForSequenceClassification.from_pretrained()
Caching: Stored in a named Docker volume (hf_cache) to persist across container restarts

To use a local model instead (e.g. after retraining), set MODEL_PATH to a local directory:

MODEL_PATH=./models/waf_model

Training

Place your training CSV in data/ (columns: method, path or url, query or args, status, user_agent, request_time, label with label 0=benign, 1=malicious).

Install dependencies locally:

python -m venv .venv
.venv\Scripts\activate   # Windows
# source .venv/bin/activate   # Linux/macOS
pip install -r requirements.txt

Run training:
```
python -m ml.train --data_path data/train.csv
```
Optional: --output_dir ./models/waf_model and --epochs 3.
Model and tokenizer are saved under MODEL_SAVE_PATH (default ./models/waf_model). Best model is chosen by validation F1.
To use your retrained model, either:
- Upload it to HuggingFace Hub and update MODEL_PATH in .env
- Or set MODEL_PATH=./models/waf_model and mount the models directory in docker-compose.yml

Policy Settings

Condition	Action	Reason
ModSecurity blocked	BLOCK	ModSecurity rule triggered
ML score > 0.9	BLOCK	High ML risk score
ML score > 0.6	ALERT	Medium ML risk score
Otherwise	ALLOW	Low risk

Decision Logging

Every request that passes through the system is logged to logs/decision_log.json with the following fields:

Field	Description
`timestamp`	ISO 8601 timestamp
`request_text`	Serialized request (method, path, query, status, UA, time)
`modsecurity_blocked`	Whether ModSecurity blocked the request
`ml_score`	ML model risk score (0.0–1.0)
`action`	Final action: `BLOCK`, `ALERT`, or `ALLOW`
`reason`	Human-readable reason for the action
`blocked_by`	Which layer blocked: `"modsec"`, `"ml_model"`, or `null`

Troubleshooting

503 on /score: Model failed to download from HuggingFace Hub. Check internet connectivity and that MODEL_PATH is correct (default: viitheone/waf-model).
Import errors: Run from project root and ensure requirements.txt is installed.
Empty dashboard: Traffic must go through http://localhost:80 (Nginx WAF). ModSec blocks appear in the dashboard with ~60 second delay (background log tailer interval).
Port 80 not loading: Run docker compose ps — all services should be Up. The CRS nginx image listens on 8080 inside the container; compose maps host 80 → container 8080.
Nginx Restarting + bind-mount conf: If nginx/conf.d/99-waf-host-access-log.conf did not exist the first time you ran Compose, Docker may have created a directory with that name. Remove that folder, ensure the path is a regular file, then docker compose up -d --force-recreate nginx.
"WAF doesn't block": ModSecurity blocks at the Nginx layer (403). The ML model blocks inline in the API for requests that pass ModSec. Test with: curl "http://localhost:80/?id=1' OR 1=1--".
Docker build timeout / huge downloads: The API Dockerfile installs CPU-only PyTorch from PyTorch's wheel index. If a build times out, retry with docker compose build --no-cache api.
Dashboard can't reach API: The frontend proxies /api to the API container over the Docker network. Run docker compose ps to confirm both are running.

Limitations

No online learning; model is fixed after training.
ModSec blocks appear in the dashboard with ~60s delay (access log tailing interval).
Docker Compose sets MODSEC_RULE_ENGINE=On; local nginx/nginx.conf samples may still be detection-only if running Nginx manually.
Policy engine has exactly the specified rules (ModSec block, then ML thresholds 0.9 / 0.6).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
api		api
ml		ml
nginx		nginx
policy		policy
public		public
utils		utils
.env.example		.env.example
.gitignore		.gitignore
Dockerfile.api		Dockerfile.api
Dockerfile.frontend		Dockerfile.frontend
README.md		README.md
docker-compose.yml		docker-compose.yml
gen_reqs.bat		gen_reqs.bat
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

4thwall – Transformer-based Hybrid WAF

Architecture

Request Flow

Prerequisites

Quick Start

Model

Training

Policy Settings

Decision Logging

Troubleshooting

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Service	URL	Notes
Nginx (WAF)	http://localhost:80	Entry point — all DVWA traffic goes through ModSec + ML.
Dashboard UI	http://localhost:3000	Live traffic, attack stats, AI model status.

Folders and files

Latest commit

History

Repository files navigation

4thwall – Transformer-based Hybrid WAF

Architecture

Request Flow

Prerequisites

Quick Start

Model

Training

Policy Settings

Decision Logging

Troubleshooting

Limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages