Document Intelligence. Zero Trust.
Forensic document fraud detection for banks and accounting firms — multi-layer AI pipeline, editorial-grade UI, light + dark themes.
FraudScan AI processes a complete credit dossier (CIN + payslip + bank statement + RIB) in under 5 seconds and produces an explainable verdict at three levels:
| Verdict | Score | Action |
|---|---|---|
| 🟢 Clean | ≥ 75 / 100 | Proceed |
| 🟡 Suspicious | 50 – 74 | Manual review |
| 🔴 Fraudulent | < 50 | Reject + flag |
The engine combines five independent forensic layers orchestrated by LangGraph:
┌──────────┐ ┌────────────┐ ┌───────────┐ ┌─────────────┐ ┌─────────┐
│ Ingestion│ → │ Forensique │ → │ Vision │ → │ Cross-check │ → │ Scoring │
└──────────┘ └────────────┘ └───────────┘ └─────────────┘ └─────────┘
PDF → JPG ELA + ORB GPT-4o 5 fuzzy rules 0–100
clone detect extraction identity/income verdict
Every decision arrives with a plain-language report, a clickable evidence trail, and per-document forensic heatmaps — defensible to regulators, comprehensible to analysts.
- What's inside
- Tech stack
- Quick start
- Design system — Forensic Ledger
- Features in depth
- Project structure
- API reference
- Database schema
- Deliverables
- Testing
- Configuration
- Roadmap
- License
| Layer | Capability | Implementation |
|---|---|---|
| Frontend | Marketing site, signup, dashboard, dossiers, analytics, billing, settings | Next.js 14 App Router, TypeScript, Tailwind, Framer Motion, Recharts |
| Theme system | Light + dark via CSS variables, no flash on reload | Editorial Forensic Ledger palette (parchment / ink) |
| Search | ⌘K command palette, debounced, categorized | Server-side LIKE across reference, name, ID, filename |
| Backend | REST + SSE, JWT auth, Stripe webhooks | FastAPI 0.111, SQLAlchemy 2.0 async |
| AI pipeline | Five-layer LangGraph orchestration | OpenCV ELA + ORB, GPT-4o Vision, FuzzyWuzzy rules |
| Multi-tenancy | Org-scoped data, audit log | All queries filter by organization_id |
| Storage | S3-compatible, self-hosted | MinIO + auto-bucket on boot |
| Demo data | 25 seeded dossiers across all verdicts on first boot | services/seed.py with realistic Moroccan applicants |
| Test data | Two scenarios (clean + fraudulent), 8 generated PDFs | test-data/generate.py |
| Deliverables | Word report + PowerPoint deck (PFE Big Data & IA) | docs/build_report.py, docs/build_deck.py |
- Python 3.11 · FastAPI 0.111 · SQLAlchemy 2.0 (async) · Pydantic 2.7
- AI / Vision : LangGraph 0.1, OpenAI Python SDK, OpenCV 4.10 (headless), Pillow 10.3, pdf2image
- Cross-check : FuzzyWuzzy + python-Levenshtein
- Storage / Queue : MinIO 7.2 client, Redis 5.0
- Auth / Billing : python-jose (JWT), Stripe 9.11
- Streaming : sse-starlette
- Next.js 14 App Router · TypeScript 5.5
- TailwindCSS 3.4 with CSS-variable-driven theming +
tailwindcss-animate - Framer Motion 11 for component motion (gauge sweep, stamp reveal)
- Recharts 2.12 themed via CSS vars (responds to light/dark)
- Radix UI primitives (Tabs, Dialog, Tooltip)
- Lucide React icons
- Typography : Fraunces (display serif), IBM Plex Sans (body), JetBrains Mono (data)
- PostgreSQL 16 · MinIO (S3-compatible) · Redis 7
- Docker Compose orchestration with health checks
- All five services on a single bridge network
git clone https://github.com/theChefEngineer/fraude-detection.git
cd fraude-detection
docker compose up -dThat's it. After ~60 seconds the full stack is live:
| Service | URL | Credentials |
|---|---|---|
| Frontend | http://localhost:3000 | dev-bypass enabled |
| Backend API | http://localhost:8000 | — |
| OpenAPI docs | http://localhost:8000/docs | — |
| MinIO console | http://localhost:9001 | minioadmin / minioadmin123 |
| Postgres | localhost:5432 | fraud / fraud123 |
| Redis | localhost:6379 | — |
The backend auto-seeds 25 realistic dossiers + 2 in-flight cases on first boot, so the dashboard, list, and analytics are populated immediately.
- Open http://localhost:3000 → click Start free → land on
/dashboard - Browse seeded cases at http://localhost:3000/dossiers
- Click any dossier → see verdict stamp, chronograph dial, cross-check table, document tabs with ELA heatmap
- Hit ⌘K (or
Ctrl+K) → search by applicant name, reference, or filename - Toggle light/dark via the switch in the top right
./test-data/run_pipeline.shThis script creates two dossiers (clean + fraudulent), uploads 4 PDFs each, kicks off the analysis, polls until done, and prints the verdict + cross-check breakdown for both.
docker compose down # stop, keep data
docker compose down -v # stop and wipe volumes (re-seeds on next boot)A deliberate departure from the generic cyan-glow-on-black AI aesthetic. The interface is positioned as a publication, not a SaaS template: museum-quality forensic case file × Bloomberg Terminal × The Economist editorial.
| Role | Hex | Use |
|---|---|---|
| Background | #F5F1E8 |
Warm parchment |
| Surface | #FDFBF5 |
Paper white |
| Elevated | #FFFFFF |
Cards |
| Ink | #1A1D24 |
Body text, charcoal |
| Accent | #1E3A5F |
Deep fountain-pen blue |
| Crimson | #A31D3A |
Fraudulent verdict |
| Ochre | #A86C1D |
Suspicious verdict |
| Forest | #2F6646 |
Clean verdict |
| Role | Hex | Use |
|---|---|---|
| Background | #161618 |
Warm graphite (not black) |
| Ink | #ECE8DF |
Parchment text (not white) |
| Accent | #9BB8D8 |
Lifted ink-blue |
- Display : Fraunces — variable optical-size serif, used for headlines, verdict stamps, gauge numerals
- Body : IBM Plex Sans — institutional, drawn for IBM, characterful
- Mono : JetBrains Mono — tabular numerals for data, scores, references
| VerdictStamp | Editorial italic word ("FRAUDULENT") inside a double-rule frame, rotated −2°, animated stamp-in keyframe |
| TrustScoreGauge | Chronograph dial — major/minor tick marks every 2 units, numeric scale 0/20/40/60/80/100, central Fraunces numeral |
| Cross-check table | Hairline borders, ink-color status indicators, no glow shadows |
| Pipeline progress | 5-step strip with active-stage pulse, real-time SSE progression |
| Search palette | ⌘K-triggered popover, 180 ms debounce, categorized results |
All colors flow through CSS variables — toggling the theme switches the entire app coherently in one paint frame.
Each layer is documented in its own file and can be swapped independently.
- PDF → JPEG via
pdf2image(200 DPI, first page) - Resize to 2048 px max, JPEG quality 92
- Output stored in MinIO under
{org_id}/{dossier_id}/normalized_{doc_id}.jpg
- Error Level Analysis (ELA) at q=90 — re-saves and diffs to expose splices
- ORB keypoint clone detection — finds copy-move regions
- Laplacian variance noise score
- Outputs heatmap PNG for the UI overlay
- GPT-4o Vision with strict JSON-schema prompt
- Returns extracted fields + semantic anomalies + AI-generated probability
- Graceful degradation — falls back to a deterministic stub if the API key is missing or invalid
Five rules implemented as pure Python functions:
| Rule | Logic | Threshold |
|---|---|---|
| R1 Identity match | fuzz.token_sort_ratio across all docs |
≥ 85 % |
| R2 Income verification | |payslip_net − bank_credit| / payslip_net |
≤ 10 % |
| R3 Employer consistency | fuzz.partial_ratio(employer, transfer_label) |
≥ 70 % |
| R4 Date logic | payslip_date ≤ credit_date ≤ today |
strict |
| R5 RIB holder vs CIN | fuzzy name match | ≥ 85 % |
score = forensics_avg * 0.35 + vision_avg * 0.35 + crosscheck_score * 0.30
verdict = "clean" if score >= 75 else "suspicious" if score >= 50 else "fraudulent"Weights determined by Grid Search over 1331 triplets — see REPORT.md §4.5.
The pipeline emits progress events to a per-dossier asyncio.Queue, exposed via /api/dossiers/{id}/stream as Server-Sent Events. The frontend <PipelineProgress /> component subscribes and renders the 5-step strip live with stage transitions and percentage.
Press ⌘K (or Ctrl+K) anywhere in the dashboard. Searches across:
- Dossier reference (e.g.
DOSS-2026-001A) - Applicant name (e.g.
Mehdi) - Applicant ID number (e.g.
AB123456) - Document filename (e.g.
payslip_mehdi)
Results are categorized (Dossiers / Documents), keyboard-navigable, and tenant-scoped.
Every model has an organization_id foreign key, every endpoint depends on get_current_organization, and the dev tenant (dev-org) is auto-provisioned at boot when DEV_AUTH_BYPASS=true. Production swaps in real Supabase JWTs (HS256) by setting DEV_AUTH_BYPASS=false and providing SUPABASE_JWT_SECRET.
On first boot with an empty database, backend/services/seed.py populates:
- 25 completed dossiers with realistic Moroccan applicant names (Mehdi El Idrissi, Yasmine Tazi, Karim Benali, …)
- 2 in-flight dossiers (status =
processing) with live progress bars - Per-document extracted fields, anomalies, trust scores, forensics scores
- Per-dossier cross-check results matching the verdict (passed for clean, mixed for suspicious, multiple failures for fraudulent)
- Synthetic preview images + heatmap PNG stored in MinIO so document tabs render
The seeder is idempotent — only runs when the org has 0 dossiers. Disable with SEED_DEMO_DATA=false.
fraude-detection/
├── README.md # this file
├── REPORT.md # PFE report (Markdown source)
├── docker-compose.yml # full stack orchestration
├── .env / .env.example # configuration
│
├── backend/ # FastAPI + LangGraph + SQLAlchemy
│ ├── Dockerfile
│ ├── requirements.txt
│ ├── main.py # app entrypoint + lifespan
│ ├── config.py # pydantic settings
│ ├── database.py # async engine + session
│ ├── models/ # SQLAlchemy models
│ │ ├── organization.py
│ │ ├── user.py
│ │ ├── dossier.py
│ │ ├── document.py
│ │ └── audit_log.py
│ ├── routers/ # FastAPI routers
│ │ ├── auth.py
│ │ ├── organizations.py
│ │ ├── dossiers.py
│ │ ├── documents.py
│ │ ├── analysis.py # analytics endpoints
│ │ ├── billing.py # Stripe checkout + webhook
│ │ └── search.py # global search
│ ├── schemas/ # Pydantic DTOs
│ ├── services/
│ │ ├── auth.py # JWT dependency + dev bypass
│ │ ├── audit.py # audit log helper
│ │ ├── bootstrap.py # dev tenant provisioning
│ │ ├── seed.py # demo data seeder (25+2 dossiers)
│ │ ├── storage.py # MinIO client
│ │ ├── stripe_service.py # Stripe wrapper
│ │ └── pipeline/ # the AI engine
│ │ ├── orchestrator.py # LangGraph StateGraph
│ │ ├── ingestion.py # PDF → JPG normalization
│ │ ├── forensics.py # ELA + ORB clone detection
│ │ ├── extraction.py # GPT-4o Vision
│ │ ├── cross_check.py # 5 rules engine
│ │ ├── scoring.py # weighted aggregation
│ │ └── progress.py # SSE pub/sub
│ └── migrations/
│
├── frontend/ # Next.js 14 App Router
│ ├── Dockerfile
│ ├── package.json
│ ├── tailwind.config.ts # Forensic Ledger tokens
│ ├── next.config.mjs
│ ├── tsconfig.json
│ ├── public/
│ │ └── theme-init.js # pre-paint theme bootstrap
│ └── src/
│ ├── app/
│ │ ├── globals.css # CSS variables for both themes
│ │ ├── layout.tsx # root + ThemeProvider
│ │ ├── page.tsx # landing page
│ │ ├── (auth)/
│ │ │ ├── login/
│ │ │ └── signup/
│ │ └── (dashboard)/
│ │ ├── layout.tsx # sidebar + topbar
│ │ ├── dashboard/
│ │ ├── dossiers/
│ │ │ ├── page.tsx
│ │ │ ├── new/page.tsx # 3-step wizard
│ │ │ └── [id]/page.tsx # detail with verdict stamp
│ │ ├── documents/[id]/
│ │ ├── analytics/
│ │ ├── settings/
│ │ └── billing/
│ ├── components/
│ │ ├── ui/ # button, card, badge, input, ThemeToggle
│ │ ├── layout/ # Sidebar, TopBar, Logo, SearchPalette
│ │ ├── landing/ # Hero, Features, Pricing, Footer, Nav
│ │ ├── dossiers/ # DossierCard, DossierTable
│ │ ├── documents/ # UploadZone, DocumentViewer
│ │ ├── analysis/ # VerdictStamp, TrustScoreGauge, …
│ │ └── charts/ # FraudDistributionChart, ProcessingVolumeChart
│ └── lib/
│ ├── api.ts # typed API client
│ ├── auth.ts # client-side session helpers
│ ├── theme.tsx # ThemeProvider + useTheme
│ ├── types.ts
│ └── utils.ts
│
├── test-data/ # generated test PDFs + runner
│ ├── generate.py # PIL-based document generator
│ ├── run_pipeline.sh # end-to-end test runner
│ ├── clean/ # 4 PDFs of a coherent dossier
│ └── fraudulent/ # 4 PDFs with deliberate inconsistencies
│
└── docs/ # PFE deliverables
├── build_report.py # Word document generator
├── build_deck.py # PowerPoint deck generator
├── FraudScan_AI_Rapport_PFE.docx
└── FraudScan_AI_Presentation_PFE.pptx
Full OpenAPI spec at http://localhost:8000/docs once running. Highlights:
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health probe |
/api/auth/me |
GET | Current user + org |
/api/organizations/current |
GET | Org info, plan, usage |
/api/dossiers |
GET, POST | List + create |
/api/dossiers/{id} |
GET | Full results (verdict, score, cross-checks, documents) |
/api/dossiers/{id}/analyze |
POST | Trigger pipeline (background task) |
/api/dossiers/{id}/status |
GET | Poll status |
/api/dossiers/{id}/stream |
GET (SSE) | Real-time progress stream |
/api/documents/{id} |
POST | Upload one or more files |
/api/documents/{id}/preview |
GET | Normalized JPEG |
/api/documents/{id}/heatmap |
GET | ELA heatmap PNG |
/api/analytics/dashboard |
GET | Aggregated stats |
/api/search?q=… |
GET | Global search across dossiers + documents |
/api/billing/create-checkout |
POST | Stripe checkout session |
/api/billing/webhook |
POST | Stripe webhook (signed) |
All endpoints are tenant-scoped via get_current_organization dependency.
Organization (multi-tenant root)
├─ id (UUID, PK)
├─ slug, name, plan (free/starter/pro/enterprise)
├─ stripe_customer_id, stripe_subscription_id
├─ monthly_dossier_limit, dossiers_used_this_month
└─ created_at
User
├─ id (UUID, PK)
├─ organization_id (FK → Organization)
├─ email (unique), full_name, role
└─ supabase_id, created_at
Dossier
├─ id, organization_id (FK), reference (DOSS-YYYY-XXXX)
├─ applicant_name, applicant_id_number, case_type
├─ status (pending/processing/completed/failed)
├─ verdict (clean/suspicious/fraudulent)
├─ global_trust_score, processing_time_ms
├─ progress_stage, progress_percent (live SSE)
├─ summary_text
└─ created_at, completed_at
Document
├─ id, dossier_id (FK)
├─ document_type (CIN/PASSPORT/PAYSLIP/BANK_STATEMENT/RIB/KBIS/OTHER)
├─ original_filename, minio_key, image_minio_key, heatmap_minio_key
├─ file_size_bytes, mime_type, status
├─ trust_score, forensics_score
├─ extraction_data (JSONB), anomalies (JSONB)
└─ processing_time_ms, created_at
CrossCheckResult
├─ id, dossier_id (FK)
├─ rule_id (R1_IDENTITY_MATCH, R2_INCOME_VERIFY, …)
├─ rule_name, status (passed/failed/warning/skipped)
├─ confidence (0–100), details
├─ doc_a_id, doc_b_id (which docs were compared)
└─ created_at
AuditLog
├─ id, organization_id (FK), user_id
├─ action, entity_type, entity_id
├─ metadata (JSONB), ip_address
└─ created_at
The docs/ folder contains generators for the PFE Big Data & IA submission:
FraudScan_AI_Rapport_PFE.docx— full report following the prof's directives (compréhension du problème, données, EDA, modélisation, évaluation, industrialisation, limites, bibliographie, annexes). Editorial styling, tables with ink-blue headers, code blocks, hairline rules.FraudScan_AI_Presentation_PFE.pptx— 23-slide widescreen deck (16:9) with cover, agenda, chapter dividers, comparison tables, confusion matrix, ablation study, demo flow, "Merci" closing.
Regenerate after edits:
cd docs
python3 build_report.py # → .docx
python3 build_deck.py # → .pptxBoth scripts use only python-docx + python-pptx — no external assets.
cd backend && python3 -m compileall -q . && echo OKcd frontend && npm run typecheck./test-data/run_pipeline.shRuns the full pipeline on both the clean and fraudulent test scenarios, prints verdict, score, latency, and cross-check breakdown.
docker cp test-data/generate.py fraudscan-backend:/tmp/generate.py
docker exec fraudscan-backend python /tmp/generate.py
docker cp fraudscan-backend:/tmp/clean test-data/_new_clean
docker cp fraudscan-backend:/tmp/fraudulent test-data/_new_fraudAll configuration via .env at the project root. Defaults are dev-friendly.
| Variable | Default | Notes |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://fraud:fraud123@postgres:5432/fraudscan |
|
MINIO_ENDPOINT |
minio:9000 |
|
MINIO_BUCKET |
fraudscan-documents |
Auto-created at boot |
REDIS_URL |
redis://redis:6379 |
|
OPENAI_API_KEY |
sk-replace-me |
Set this for real GPT-4o Vision. Pipeline uses stub when missing. |
OPENAI_VISION_MODEL |
gpt-4o |
|
SUPABASE_JWT_SECRET |
dev-secret-… |
HS256 |
DEV_AUTH_BYPASS |
true |
Disable in production |
SEED_DEMO_DATA |
true |
Disable to skip seeding |
STRIPE_SECRET_KEY |
sk_test_dev |
Test mode by default |
STRIPE_WEBHOOK_SECRET |
whsec_dev |
Required for signature verification |
STRIPE_PRICE_STARTER / STRIPE_PRICE_PRO |
placeholders | Real Stripe Price IDs to enable upgrades |
# Edit .env
OPENAI_API_KEY=sk-proj-your-real-key
docker compose restart backend
./test-data/run_pipeline.sh
# Now R2/R3/R5 will fire real failures on the fraudulent dossierImplemented vs. proposed — full discussion in REPORT.md §7.
| ✅ | 5-layer pipeline with LangGraph orchestration |
| ✅ | Real-time SSE progress |
| ✅ | Multi-tenancy + audit log |
| ✅ | Forensic Ledger design system + light/dark themes |
| ✅ | ⌘K command-palette search |
| ✅ | Demo data seeder (25 dossiers) |
| ✅ | Stripe billing (test mode) |
| ✅ | Docker Compose deployment |
| 🟡 | Supabase real auth (currently dev bypass) |
| 🟡 | Multi-page PDF support |
| 🟡 | OCR offline fallback |
| 🟡 | Forensic CNN (MesoNet / EfficientNet) |
| 🟡 | Active learning loop with XGBoost final classifier |
| 🟡 | Local fine-tuned LLaVA for cost reduction |
| 🟡 | Kafka + Spark streaming architecture |
| 🟡 | Adversarial robustness evaluation |
MIT — see LICENSE (to be added). Free for academic use; commercial use requires permission.
[Your name] — Master Big Data & Intelligence Artificielle, 2025–2026.
Project supervised by [Encadrant]. See REPORT.md for the full PFE report.
FraudScan AI — Document Intelligence. Zero Trust.