FraudScan AI

Document Intelligence. Zero Trust.

Forensic document fraud detection for banks and accounting firms — multi-layer AI pipeline, editorial-grade UI, light + dark themes.

Overview

FraudScan AI processes a complete credit dossier (CIN + payslip + bank statement + RIB) in under 5 seconds and produces an explainable verdict at three levels:

Verdict	Score	Action
🟢 Clean	≥ 75 / 100	Proceed
🟡 Suspicious	50 – 74	Manual review
🔴 Fraudulent	< 50	Reject + flag

The engine combines five independent forensic layers orchestrated by LangGraph:

┌──────────┐   ┌────────────┐   ┌───────────┐   ┌─────────────┐   ┌─────────┐
│ Ingestion│ → │ Forensique │ → │  Vision   │ → │ Cross-check │ → │ Scoring │
└──────────┘   └────────────┘   └───────────┘   └─────────────┘   └─────────┘
PDF → JPG       ELA + ORB        GPT-4o          5 fuzzy rules     0–100
                clone detect     extraction      identity/income   verdict

Every decision arrives with a plain-language report, a clickable evidence trail, and per-document forensic heatmaps — defensible to regulators, comprehensible to analysts.

What's inside

Layer	Capability	Implementation
Frontend	Marketing site, signup, dashboard, dossiers, analytics, billing, settings	Next.js 14 App Router, TypeScript, Tailwind, Framer Motion, Recharts
Theme system	Light + dark via CSS variables, no flash on reload	Editorial Forensic Ledger palette (parchment / ink)
Search	⌘K command palette, debounced, categorized	Server-side `LIKE` across reference, name, ID, filename
Backend	REST + SSE, JWT auth, Stripe webhooks	FastAPI 0.111, SQLAlchemy 2.0 async
AI pipeline	Five-layer LangGraph orchestration	OpenCV ELA + ORB, GPT-4o Vision, FuzzyWuzzy rules
Multi-tenancy	Org-scoped data, audit log	All queries filter by `organization_id`
Storage	S3-compatible, self-hosted	MinIO + auto-bucket on boot
Demo data	25 seeded dossiers across all verdicts on first boot	`services/seed.py` with realistic Moroccan applicants
Test data	Two scenarios (clean + fraudulent), 8 generated PDFs	`test-data/generate.py`
Deliverables	Word report + PowerPoint deck (PFE Big Data & IA)	`docs/build_report.py`, `docs/build_deck.py`

Tech stack

Backend

Python 3.11 · FastAPI 0.111 · SQLAlchemy 2.0 (async) · Pydantic 2.7
AI / Vision : LangGraph 0.1, OpenAI Python SDK, OpenCV 4.10 (headless), Pillow 10.3, pdf2image
Cross-check : FuzzyWuzzy + python-Levenshtein
Storage / Queue : MinIO 7.2 client, Redis 5.0
Auth / Billing : python-jose (JWT), Stripe 9.11
Streaming : sse-starlette

Frontend

Next.js 14 App Router · TypeScript 5.5
TailwindCSS 3.4 with CSS-variable-driven theming + tailwindcss-animate
Framer Motion 11 for component motion (gauge sweep, stamp reveal)
Recharts 2.12 themed via CSS vars (responds to light/dark)
Radix UI primitives (Tabs, Dialog, Tooltip)
Lucide React icons
Typography : Fraunces (display serif), IBM Plex Sans (body), JetBrains Mono (data)

Infrastructure

PostgreSQL 16 · MinIO (S3-compatible) · Redis 7
Docker Compose orchestration with health checks
All five services on a single bridge network

Quick start

1. Clone and launch

git clone https://github.com/theChefEngineer/fraude-detection.git
cd fraude-detection
docker compose up -d

That's it. After ~60 seconds the full stack is live:

Service	URL	Credentials
Frontend	http://localhost:3000	dev-bypass enabled
Backend API	http://localhost:8000	—
OpenAPI docs	http://localhost:8000/docs	—
MinIO console	http://localhost:9001	`minioadmin` / `minioadmin123`
Postgres	localhost:5432	`fraud` / `fraud123`
Redis	localhost:6379	—

2. First-run smoke test

The backend auto-seeds 25 realistic dossiers + 2 in-flight cases on first boot, so the dashboard, list, and analytics are populated immediately.

Open http://localhost:3000 → click Start free → land on /dashboard
Browse seeded cases at http://localhost:3000/dossiers
Click any dossier → see verdict stamp, chronograph dial, cross-check table, document tabs with ELA heatmap
Hit ⌘K (or Ctrl+K) → search by applicant name, reference, or filename
Toggle light/dark via the switch in the top right

3. Run the pipeline on the test PDFs

./test-data/run_pipeline.sh

This script creates two dossiers (clean + fraudulent), uploads 4 PDFs each, kicks off the analysis, polls until done, and prints the verdict + cross-check breakdown for both.

4. Stop and reset

docker compose down              # stop, keep data
docker compose down -v           # stop and wipe volumes (re-seeds on next boot)

Design system — Forensic Ledger

A deliberate departure from the generic cyan-glow-on-black AI aesthetic. The interface is positioned as a publication, not a SaaS template: museum-quality forensic case file × Bloomberg Terminal × The Economist editorial.

Palette — light theme (default)

Role	Hex	Use
Background	`#F5F1E8`	Warm parchment
Surface	`#FDFBF5`	Paper white
Elevated	`#FFFFFF`	Cards
Ink	`#1A1D24`	Body text, charcoal
Accent	`#1E3A5F`	Deep fountain-pen blue
Crimson	`#A31D3A`	Fraudulent verdict
Ochre	`#A86C1D`	Suspicious verdict
Forest	`#2F6646`	Clean verdict

Palette — dark theme

Role	Hex	Use
Background	`#161618`	Warm graphite (not black)
Ink	`#ECE8DF`	Parchment text (not white)
Accent	`#9BB8D8`	Lifted ink-blue

Typography

Display : Fraunces — variable optical-size serif, used for headlines, verdict stamps, gauge numerals
Body : IBM Plex Sans — institutional, drawn for IBM, characterful
Mono : JetBrains Mono — tabular numerals for data, scores, references

Signature components


VerdictStamp	Editorial italic word ("FRAUDULENT") inside a double-rule frame, rotated −2°, animated stamp-in keyframe
TrustScoreGauge	Chronograph dial — major/minor tick marks every 2 units, numeric scale 0/20/40/60/80/100, central Fraunces numeral
Cross-check table	Hairline borders, ink-color status indicators, no glow shadows
Pipeline progress	5-step strip with active-stage pulse, real-time SSE progression
Search palette	⌘K-triggered popover, 180 ms debounce, categorized results

All colors flow through CSS variables — toggling the theme switches the entire app coherently in one paint frame.

Features in depth

AI pipeline

Each layer is documented in its own file and can be swapped independently.

Layer 1 — Ingestion (`backend/services/pipeline/ingestion.py`)

PDF → JPEG via pdf2image (200 DPI, first page)
Resize to 2048 px max, JPEG quality 92
Output stored in MinIO under {org_id}/{dossier_id}/normalized_{doc_id}.jpg

Layer 2 — Forensics (`backend/services/pipeline/forensics.py`)

Error Level Analysis (ELA) at q=90 — re-saves and diffs to expose splices
ORB keypoint clone detection — finds copy-move regions
Laplacian variance noise score
Outputs heatmap PNG for the UI overlay

Layer 3 — Vision (`backend/services/pipeline/extraction.py`)

GPT-4o Vision with strict JSON-schema prompt
Returns extracted fields + semantic anomalies + AI-generated probability
Graceful degradation — falls back to a deterministic stub if the API key is missing or invalid

Layer 4 — Cross-check (`backend/services/pipeline/cross_check.py`)

Five rules implemented as pure Python functions:

Rule	Logic	Threshold
R1 Identity match	`fuzz.token_sort_ratio` across all docs	≥ 85 %
R2 Income verification	`\|payslip_net − bank_credit\| / payslip_net`	≤ 10 %
R3 Employer consistency	`fuzz.partial_ratio(employer, transfer_label)`	≥ 70 %
R4 Date logic	`payslip_date ≤ credit_date ≤ today`	strict
R5 RIB holder vs CIN	fuzzy name match	≥ 85 %

Layer 5 — Scoring (`backend/services/pipeline/scoring.py`)

score = forensics_avg * 0.35 + vision_avg * 0.35 + crosscheck_score * 0.30
verdict = "clean" if score >= 75 else "suspicious" if score >= 50 else "fraudulent"

Weights determined by Grid Search over 1331 triplets — see REPORT.md §4.5.

Real-time progress (SSE)

The pipeline emits progress events to a per-dossier asyncio.Queue, exposed via /api/dossiers/{id}/stream as Server-Sent Events. The frontend <PipelineProgress /> component subscribes and renders the 5-step strip live with stage transitions and percentage.

Search palette (⌘K)

Press ⌘K (or Ctrl+K) anywhere in the dashboard. Searches across:

Dossier reference (e.g. DOSS-2026-001A)
Applicant name (e.g. Mehdi)
Applicant ID number (e.g. AB123456)
Document filename (e.g. payslip_mehdi)

Results are categorized (Dossiers / Documents), keyboard-navigable, and tenant-scoped.

Multi-tenancy

Every model has an organization_id foreign key, every endpoint depends on get_current_organization, and the dev tenant (dev-org) is auto-provisioned at boot when DEV_AUTH_BYPASS=true. Production swaps in real Supabase JWTs (HS256) by setting DEV_AUTH_BYPASS=false and providing SUPABASE_JWT_SECRET.

Demo seeding

On first boot with an empty database, backend/services/seed.py populates:

25 completed dossiers with realistic Moroccan applicant names (Mehdi El Idrissi, Yasmine Tazi, Karim Benali, …)
2 in-flight dossiers (status = processing) with live progress bars
Per-document extracted fields, anomalies, trust scores, forensics scores
Per-dossier cross-check results matching the verdict (passed for clean, mixed for suspicious, multiple failures for fraudulent)
Synthetic preview images + heatmap PNG stored in MinIO so document tabs render

The seeder is idempotent — only runs when the org has 0 dossiers. Disable with SEED_DEMO_DATA=false.

Project structure

fraude-detection/
├── README.md                      # this file
├── REPORT.md                      # PFE report (Markdown source)
├── docker-compose.yml             # full stack orchestration
├── .env / .env.example            # configuration
│
├── backend/                       # FastAPI + LangGraph + SQLAlchemy
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── main.py                    # app entrypoint + lifespan
│   ├── config.py                  # pydantic settings
│   ├── database.py                # async engine + session
│   ├── models/                    # SQLAlchemy models
│   │   ├── organization.py
│   │   ├── user.py
│   │   ├── dossier.py
│   │   ├── document.py
│   │   └── audit_log.py
│   ├── routers/                   # FastAPI routers
│   │   ├── auth.py
│   │   ├── organizations.py
│   │   ├── dossiers.py
│   │   ├── documents.py
│   │   ├── analysis.py            # analytics endpoints
│   │   ├── billing.py             # Stripe checkout + webhook
│   │   └── search.py              # global search
│   ├── schemas/                   # Pydantic DTOs
│   ├── services/
│   │   ├── auth.py                # JWT dependency + dev bypass
│   │   ├── audit.py               # audit log helper
│   │   ├── bootstrap.py           # dev tenant provisioning
│   │   ├── seed.py                # demo data seeder (25+2 dossiers)
│   │   ├── storage.py             # MinIO client
│   │   ├── stripe_service.py      # Stripe wrapper
│   │   └── pipeline/              # the AI engine
│   │       ├── orchestrator.py    # LangGraph StateGraph
│   │       ├── ingestion.py       # PDF → JPG normalization
│   │       ├── forensics.py       # ELA + ORB clone detection
│   │       ├── extraction.py      # GPT-4o Vision
│   │       ├── cross_check.py     # 5 rules engine
│   │       ├── scoring.py         # weighted aggregation
│   │       └── progress.py        # SSE pub/sub
│   └── migrations/
│
├── frontend/                      # Next.js 14 App Router
│   ├── Dockerfile
│   ├── package.json
│   ├── tailwind.config.ts         # Forensic Ledger tokens
│   ├── next.config.mjs
│   ├── tsconfig.json
│   ├── public/
│   │   └── theme-init.js          # pre-paint theme bootstrap
│   └── src/
│       ├── app/
│       │   ├── globals.css        # CSS variables for both themes
│       │   ├── layout.tsx         # root + ThemeProvider
│       │   ├── page.tsx           # landing page
│       │   ├── (auth)/
│       │   │   ├── login/
│       │   │   └── signup/
│       │   └── (dashboard)/
│       │       ├── layout.tsx     # sidebar + topbar
│       │       ├── dashboard/
│       │       ├── dossiers/
│       │       │   ├── page.tsx
│       │       │   ├── new/page.tsx          # 3-step wizard
│       │       │   └── [id]/page.tsx         # detail with verdict stamp
│       │       ├── documents/[id]/
│       │       ├── analytics/
│       │       ├── settings/
│       │       └── billing/
│       ├── components/
│       │   ├── ui/                 # button, card, badge, input, ThemeToggle
│       │   ├── layout/             # Sidebar, TopBar, Logo, SearchPalette
│       │   ├── landing/            # Hero, Features, Pricing, Footer, Nav
│       │   ├── dossiers/           # DossierCard, DossierTable
│       │   ├── documents/          # UploadZone, DocumentViewer
│       │   ├── analysis/           # VerdictStamp, TrustScoreGauge, …
│       │   └── charts/             # FraudDistributionChart, ProcessingVolumeChart
│       └── lib/
│           ├── api.ts              # typed API client
│           ├── auth.ts             # client-side session helpers
│           ├── theme.tsx           # ThemeProvider + useTheme
│           ├── types.ts
│           └── utils.ts
│
├── test-data/                     # generated test PDFs + runner
│   ├── generate.py                # PIL-based document generator
│   ├── run_pipeline.sh            # end-to-end test runner
│   ├── clean/                     # 4 PDFs of a coherent dossier
│   └── fraudulent/                # 4 PDFs with deliberate inconsistencies
│
└── docs/                          # PFE deliverables
    ├── build_report.py            # Word document generator
    ├── build_deck.py              # PowerPoint deck generator
    ├── FraudScan_AI_Rapport_PFE.docx
    └── FraudScan_AI_Presentation_PFE.pptx

API reference

Full OpenAPI spec at http://localhost:8000/docs once running. Highlights:

Endpoint	Method	Description
`/health`	GET	Health probe
`/api/auth/me`	GET	Current user + org
`/api/organizations/current`	GET	Org info, plan, usage
`/api/dossiers`	GET, POST	List + create
`/api/dossiers/{id}`	GET	Full results (verdict, score, cross-checks, documents)
`/api/dossiers/{id}/analyze`	POST	Trigger pipeline (background task)
`/api/dossiers/{id}/status`	GET	Poll status
`/api/dossiers/{id}/stream`	GET (SSE)	Real-time progress stream
`/api/documents/{id}`	POST	Upload one or more files
`/api/documents/{id}/preview`	GET	Normalized JPEG
`/api/documents/{id}/heatmap`	GET	ELA heatmap PNG
`/api/analytics/dashboard`	GET	Aggregated stats
`/api/search?q=…`	GET	Global search across dossiers + documents
`/api/billing/create-checkout`	POST	Stripe checkout session
`/api/billing/webhook`	POST	Stripe webhook (signed)

All endpoints are tenant-scoped via get_current_organization dependency.

Database schema

Organization (multi-tenant root)
├─ id (UUID, PK)
├─ slug, name, plan (free/starter/pro/enterprise)
├─ stripe_customer_id, stripe_subscription_id
├─ monthly_dossier_limit, dossiers_used_this_month
└─ created_at

User
├─ id (UUID, PK)
├─ organization_id (FK → Organization)
├─ email (unique), full_name, role
└─ supabase_id, created_at

Dossier
├─ id, organization_id (FK), reference (DOSS-YYYY-XXXX)
├─ applicant_name, applicant_id_number, case_type
├─ status (pending/processing/completed/failed)
├─ verdict (clean/suspicious/fraudulent)
├─ global_trust_score, processing_time_ms
├─ progress_stage, progress_percent (live SSE)
├─ summary_text
└─ created_at, completed_at

Document
├─ id, dossier_id (FK)
├─ document_type (CIN/PASSPORT/PAYSLIP/BANK_STATEMENT/RIB/KBIS/OTHER)
├─ original_filename, minio_key, image_minio_key, heatmap_minio_key
├─ file_size_bytes, mime_type, status
├─ trust_score, forensics_score
├─ extraction_data (JSONB), anomalies (JSONB)
└─ processing_time_ms, created_at

CrossCheckResult
├─ id, dossier_id (FK)
├─ rule_id (R1_IDENTITY_MATCH, R2_INCOME_VERIFY, …)
├─ rule_name, status (passed/failed/warning/skipped)
├─ confidence (0–100), details
├─ doc_a_id, doc_b_id (which docs were compared)
└─ created_at

AuditLog
├─ id, organization_id (FK), user_id
├─ action, entity_type, entity_id
├─ metadata (JSONB), ip_address
└─ created_at

Deliverables

The docs/ folder contains generators for the PFE Big Data & IA submission:

FraudScan_AI_Rapport_PFE.docx — full report following the prof's directives (compréhension du problème, données, EDA, modélisation, évaluation, industrialisation, limites, bibliographie, annexes). Editorial styling, tables with ink-blue headers, code blocks, hairline rules.
FraudScan_AI_Presentation_PFE.pptx — 23-slide widescreen deck (16:9) with cover, agenda, chapter dividers, comparison tables, confusion matrix, ablation study, demo flow, "Merci" closing.

Regenerate after edits:

cd docs
python3 build_report.py    # → .docx
python3 build_deck.py      # → .pptx

Both scripts use only python-docx + python-pptx — no external assets.

Testing

Compile-check the backend

cd backend && python3 -m compileall -q . && echo OK

Type-check the frontend

cd frontend && npm run typecheck

End-to-end pipeline

./test-data/run_pipeline.sh

Runs the full pipeline on both the clean and fraudulent test scenarios, prints verdict, score, latency, and cross-check breakdown.

Generate fresh test documents

docker cp test-data/generate.py fraudscan-backend:/tmp/generate.py
docker exec fraudscan-backend python /tmp/generate.py
docker cp fraudscan-backend:/tmp/clean       test-data/_new_clean
docker cp fraudscan-backend:/tmp/fraudulent  test-data/_new_fraud

Configuration

All configuration via .env at the project root. Defaults are dev-friendly.

Variable	Default	Notes
`DATABASE_URL`	`postgresql+asyncpg://fraud:fraud123@postgres:5432/fraudscan`
`MINIO_ENDPOINT`	`minio:9000`
`MINIO_BUCKET`	`fraudscan-documents`	Auto-created at boot
`REDIS_URL`	`redis://redis:6379`
`OPENAI_API_KEY`	`sk-replace-me`	Set this for real GPT-4o Vision. Pipeline uses stub when missing.
`OPENAI_VISION_MODEL`	`gpt-4o`
`SUPABASE_JWT_SECRET`	`dev-secret-…`	HS256
`DEV_AUTH_BYPASS`	`true`	Disable in production
`SEED_DEMO_DATA`	`true`	Disable to skip seeding
`STRIPE_SECRET_KEY`	`sk_test_dev`	Test mode by default
`STRIPE_WEBHOOK_SECRET`	`whsec_dev`	Required for signature verification
`STRIPE_PRICE_STARTER` / `STRIPE_PRICE_PRO`	placeholders	Real Stripe Price IDs to enable upgrades

Enable real GPT-4o Vision

# Edit .env
OPENAI_API_KEY=sk-proj-your-real-key

docker compose restart backend
./test-data/run_pipeline.sh
# Now R2/R3/R5 will fire real failures on the fraudulent dossier

Roadmap

Implemented vs. proposed — full discussion in REPORT.md §7.


✅	5-layer pipeline with LangGraph orchestration
✅	Real-time SSE progress
✅	Multi-tenancy + audit log
✅	Forensic Ledger design system + light/dark themes
✅	⌘K command-palette search
✅	Demo data seeder (25 dossiers)
✅	Stripe billing (test mode)
✅	Docker Compose deployment
🟡	Supabase real auth (currently dev bypass)
🟡	Multi-page PDF support
🟡	OCR offline fallback
🟡	Forensic CNN (MesoNet / EfficientNet)
🟡	Active learning loop with XGBoost final classifier
🟡	Local fine-tuned LLaVA for cost reduction
🟡	Kafka + Spark streaming architecture
🟡	Adversarial robustness evaluation

License

MIT — see LICENSE (to be added). Free for academic use; commercial use requires permission.

Author

[Your name] — Master Big Data & Intelligence Artificielle, 2025–2026. Project supervised by [Encadrant]. See REPORT.md for the full PFE report.

FraudScan AI — Document Intelligence. Zero Trust.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backend		backend
docs		docs
frontend		frontend
test-data		test-data
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
REPORT.md		REPORT.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

FraudScan AI

Overview

Table of contents

What's inside

Tech stack

Backend

Frontend

Infrastructure

Quick start

1. Clone and launch

2. First-run smoke test

3. Run the pipeline on the test PDFs

4. Stop and reset

Design system — Forensic Ledger

Palette — light theme (default)

Palette — dark theme

Typography

Signature components

Features in depth

AI pipeline

Layer 1 — Ingestion (backend/services/pipeline/ingestion.py)

Layer 2 — Forensics (backend/services/pipeline/forensics.py)

Layer 3 — Vision (backend/services/pipeline/extraction.py)

Layer 4 — Cross-check (backend/services/pipeline/cross_check.py)

Layer 5 — Scoring (backend/services/pipeline/scoring.py)

Real-time progress (SSE)

Search palette (⌘K)

Multi-tenancy

Demo seeding

Project structure

API reference

Database schema

Deliverables

Testing

Compile-check the backend

Type-check the frontend

End-to-end pipeline

Generate fresh test documents

Configuration

Enable real GPT-4o Vision

Roadmap

License

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Layer 1 — Ingestion (`backend/services/pipeline/ingestion.py`)

Layer 2 — Forensics (`backend/services/pipeline/forensics.py`)

Layer 3 — Vision (`backend/services/pipeline/extraction.py`)

Layer 4 — Cross-check (`backend/services/pipeline/cross_check.py`)

Layer 5 — Scoring (`backend/services/pipeline/scoring.py`)

Packages