Skip to content

theChefEngineer/fraude-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FraudScan AI

Document Intelligence. Zero Trust.

Forensic document fraud detection for banks and accounting firms — multi-layer AI pipeline, editorial-grade UI, light + dark themes.

Build Stack License


Overview

FraudScan AI processes a complete credit dossier (CIN + payslip + bank statement + RIB) in under 5 seconds and produces an explainable verdict at three levels:

Verdict Score Action
🟢 Clean ≥ 75 / 100 Proceed
🟡 Suspicious 50 – 74 Manual review
🔴 Fraudulent < 50 Reject + flag

The engine combines five independent forensic layers orchestrated by LangGraph:

┌──────────┐   ┌────────────┐   ┌───────────┐   ┌─────────────┐   ┌─────────┐
│ Ingestion│ → │ Forensique │ → │  Vision   │ → │ Cross-check │ → │ Scoring │
└──────────┘   └────────────┘   └───────────┘   └─────────────┘   └─────────┘
PDF → JPG       ELA + ORB        GPT-4o          5 fuzzy rules     0–100
                clone detect     extraction      identity/income   verdict

Every decision arrives with a plain-language report, a clickable evidence trail, and per-document forensic heatmaps — defensible to regulators, comprehensible to analysts.


Table of contents


What's inside

Layer Capability Implementation
Frontend Marketing site, signup, dashboard, dossiers, analytics, billing, settings Next.js 14 App Router, TypeScript, Tailwind, Framer Motion, Recharts
Theme system Light + dark via CSS variables, no flash on reload Editorial Forensic Ledger palette (parchment / ink)
Search ⌘K command palette, debounced, categorized Server-side LIKE across reference, name, ID, filename
Backend REST + SSE, JWT auth, Stripe webhooks FastAPI 0.111, SQLAlchemy 2.0 async
AI pipeline Five-layer LangGraph orchestration OpenCV ELA + ORB, GPT-4o Vision, FuzzyWuzzy rules
Multi-tenancy Org-scoped data, audit log All queries filter by organization_id
Storage S3-compatible, self-hosted MinIO + auto-bucket on boot
Demo data 25 seeded dossiers across all verdicts on first boot services/seed.py with realistic Moroccan applicants
Test data Two scenarios (clean + fraudulent), 8 generated PDFs test-data/generate.py
Deliverables Word report + PowerPoint deck (PFE Big Data & IA) docs/build_report.py, docs/build_deck.py

Tech stack

Backend

  • Python 3.11 · FastAPI 0.111 · SQLAlchemy 2.0 (async) · Pydantic 2.7
  • AI / Vision : LangGraph 0.1, OpenAI Python SDK, OpenCV 4.10 (headless), Pillow 10.3, pdf2image
  • Cross-check : FuzzyWuzzy + python-Levenshtein
  • Storage / Queue : MinIO 7.2 client, Redis 5.0
  • Auth / Billing : python-jose (JWT), Stripe 9.11
  • Streaming : sse-starlette

Frontend

  • Next.js 14 App Router · TypeScript 5.5
  • TailwindCSS 3.4 with CSS-variable-driven theming + tailwindcss-animate
  • Framer Motion 11 for component motion (gauge sweep, stamp reveal)
  • Recharts 2.12 themed via CSS vars (responds to light/dark)
  • Radix UI primitives (Tabs, Dialog, Tooltip)
  • Lucide React icons
  • Typography : Fraunces (display serif), IBM Plex Sans (body), JetBrains Mono (data)

Infrastructure

  • PostgreSQL 16 · MinIO (S3-compatible) · Redis 7
  • Docker Compose orchestration with health checks
  • All five services on a single bridge network

Quick start

1. Clone and launch

git clone https://github.com/theChefEngineer/fraude-detection.git
cd fraude-detection
docker compose up -d

That's it. After ~60 seconds the full stack is live:

Service URL Credentials
Frontend http://localhost:3000 dev-bypass enabled
Backend API http://localhost:8000
OpenAPI docs http://localhost:8000/docs
MinIO console http://localhost:9001 minioadmin / minioadmin123
Postgres localhost:5432 fraud / fraud123
Redis localhost:6379

2. First-run smoke test

The backend auto-seeds 25 realistic dossiers + 2 in-flight cases on first boot, so the dashboard, list, and analytics are populated immediately.

  1. Open http://localhost:3000 → click Start free → land on /dashboard
  2. Browse seeded cases at http://localhost:3000/dossiers
  3. Click any dossier → see verdict stamp, chronograph dial, cross-check table, document tabs with ELA heatmap
  4. Hit ⌘K (or Ctrl+K) → search by applicant name, reference, or filename
  5. Toggle light/dark via the switch in the top right

3. Run the pipeline on the test PDFs

./test-data/run_pipeline.sh

This script creates two dossiers (clean + fraudulent), uploads 4 PDFs each, kicks off the analysis, polls until done, and prints the verdict + cross-check breakdown for both.

4. Stop and reset

docker compose down              # stop, keep data
docker compose down -v           # stop and wipe volumes (re-seeds on next boot)

Design system — Forensic Ledger

A deliberate departure from the generic cyan-glow-on-black AI aesthetic. The interface is positioned as a publication, not a SaaS template: museum-quality forensic case file × Bloomberg Terminal × The Economist editorial.

Palette — light theme (default)

Role Hex Use
Background #F5F1E8 Warm parchment
Surface #FDFBF5 Paper white
Elevated #FFFFFF Cards
Ink #1A1D24 Body text, charcoal
Accent #1E3A5F Deep fountain-pen blue
Crimson #A31D3A Fraudulent verdict
Ochre #A86C1D Suspicious verdict
Forest #2F6646 Clean verdict

Palette — dark theme

Role Hex Use
Background #161618 Warm graphite (not black)
Ink #ECE8DF Parchment text (not white)
Accent #9BB8D8 Lifted ink-blue

Typography

  • Display : Fraunces — variable optical-size serif, used for headlines, verdict stamps, gauge numerals
  • Body : IBM Plex Sans — institutional, drawn for IBM, characterful
  • Mono : JetBrains Mono — tabular numerals for data, scores, references

Signature components

VerdictStamp Editorial italic word ("FRAUDULENT") inside a double-rule frame, rotated −2°, animated stamp-in keyframe
TrustScoreGauge Chronograph dial — major/minor tick marks every 2 units, numeric scale 0/20/40/60/80/100, central Fraunces numeral
Cross-check table Hairline borders, ink-color status indicators, no glow shadows
Pipeline progress 5-step strip with active-stage pulse, real-time SSE progression
Search palette ⌘K-triggered popover, 180 ms debounce, categorized results

All colors flow through CSS variables — toggling the theme switches the entire app coherently in one paint frame.


Features in depth

AI pipeline

Each layer is documented in its own file and can be swapped independently.

Layer 1 — Ingestion (backend/services/pipeline/ingestion.py)

  • PDF → JPEG via pdf2image (200 DPI, first page)
  • Resize to 2048 px max, JPEG quality 92
  • Output stored in MinIO under {org_id}/{dossier_id}/normalized_{doc_id}.jpg

Layer 2 — Forensics (backend/services/pipeline/forensics.py)

  • Error Level Analysis (ELA) at q=90 — re-saves and diffs to expose splices
  • ORB keypoint clone detection — finds copy-move regions
  • Laplacian variance noise score
  • Outputs heatmap PNG for the UI overlay

Layer 3 — Vision (backend/services/pipeline/extraction.py)

  • GPT-4o Vision with strict JSON-schema prompt
  • Returns extracted fields + semantic anomalies + AI-generated probability
  • Graceful degradation — falls back to a deterministic stub if the API key is missing or invalid

Layer 4 — Cross-check (backend/services/pipeline/cross_check.py)

Five rules implemented as pure Python functions:

Rule Logic Threshold
R1 Identity match fuzz.token_sort_ratio across all docs ≥ 85 %
R2 Income verification |payslip_net − bank_credit| / payslip_net ≤ 10 %
R3 Employer consistency fuzz.partial_ratio(employer, transfer_label) ≥ 70 %
R4 Date logic payslip_date ≤ credit_date ≤ today strict
R5 RIB holder vs CIN fuzzy name match ≥ 85 %

Layer 5 — Scoring (backend/services/pipeline/scoring.py)

score = forensics_avg * 0.35 + vision_avg * 0.35 + crosscheck_score * 0.30
verdict = "clean" if score >= 75 else "suspicious" if score >= 50 else "fraudulent"

Weights determined by Grid Search over 1331 triplets — see REPORT.md §4.5.

Real-time progress (SSE)

The pipeline emits progress events to a per-dossier asyncio.Queue, exposed via /api/dossiers/{id}/stream as Server-Sent Events. The frontend <PipelineProgress /> component subscribes and renders the 5-step strip live with stage transitions and percentage.

Search palette (⌘K)

Press ⌘K (or Ctrl+K) anywhere in the dashboard. Searches across:

  • Dossier reference (e.g. DOSS-2026-001A)
  • Applicant name (e.g. Mehdi)
  • Applicant ID number (e.g. AB123456)
  • Document filename (e.g. payslip_mehdi)

Results are categorized (Dossiers / Documents), keyboard-navigable, and tenant-scoped.

Multi-tenancy

Every model has an organization_id foreign key, every endpoint depends on get_current_organization, and the dev tenant (dev-org) is auto-provisioned at boot when DEV_AUTH_BYPASS=true. Production swaps in real Supabase JWTs (HS256) by setting DEV_AUTH_BYPASS=false and providing SUPABASE_JWT_SECRET.

Demo seeding

On first boot with an empty database, backend/services/seed.py populates:

  • 25 completed dossiers with realistic Moroccan applicant names (Mehdi El Idrissi, Yasmine Tazi, Karim Benali, …)
  • 2 in-flight dossiers (status = processing) with live progress bars
  • Per-document extracted fields, anomalies, trust scores, forensics scores
  • Per-dossier cross-check results matching the verdict (passed for clean, mixed for suspicious, multiple failures for fraudulent)
  • Synthetic preview images + heatmap PNG stored in MinIO so document tabs render

The seeder is idempotent — only runs when the org has 0 dossiers. Disable with SEED_DEMO_DATA=false.


Project structure

fraude-detection/
├── README.md                      # this file
├── REPORT.md                      # PFE report (Markdown source)
├── docker-compose.yml             # full stack orchestration
├── .env / .env.example            # configuration
│
├── backend/                       # FastAPI + LangGraph + SQLAlchemy
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── main.py                    # app entrypoint + lifespan
│   ├── config.py                  # pydantic settings
│   ├── database.py                # async engine + session
│   ├── models/                    # SQLAlchemy models
│   │   ├── organization.py
│   │   ├── user.py
│   │   ├── dossier.py
│   │   ├── document.py
│   │   └── audit_log.py
│   ├── routers/                   # FastAPI routers
│   │   ├── auth.py
│   │   ├── organizations.py
│   │   ├── dossiers.py
│   │   ├── documents.py
│   │   ├── analysis.py            # analytics endpoints
│   │   ├── billing.py             # Stripe checkout + webhook
│   │   └── search.py              # global search
│   ├── schemas/                   # Pydantic DTOs
│   ├── services/
│   │   ├── auth.py                # JWT dependency + dev bypass
│   │   ├── audit.py               # audit log helper
│   │   ├── bootstrap.py           # dev tenant provisioning
│   │   ├── seed.py                # demo data seeder (25+2 dossiers)
│   │   ├── storage.py             # MinIO client
│   │   ├── stripe_service.py      # Stripe wrapper
│   │   └── pipeline/              # the AI engine
│   │       ├── orchestrator.py    # LangGraph StateGraph
│   │       ├── ingestion.py       # PDF → JPG normalization
│   │       ├── forensics.py       # ELA + ORB clone detection
│   │       ├── extraction.py      # GPT-4o Vision
│   │       ├── cross_check.py     # 5 rules engine
│   │       ├── scoring.py         # weighted aggregation
│   │       └── progress.py        # SSE pub/sub
│   └── migrations/
│
├── frontend/                      # Next.js 14 App Router
│   ├── Dockerfile
│   ├── package.json
│   ├── tailwind.config.ts         # Forensic Ledger tokens
│   ├── next.config.mjs
│   ├── tsconfig.json
│   ├── public/
│   │   └── theme-init.js          # pre-paint theme bootstrap
│   └── src/
│       ├── app/
│       │   ├── globals.css        # CSS variables for both themes
│       │   ├── layout.tsx         # root + ThemeProvider
│       │   ├── page.tsx           # landing page
│       │   ├── (auth)/
│       │   │   ├── login/
│       │   │   └── signup/
│       │   └── (dashboard)/
│       │       ├── layout.tsx     # sidebar + topbar
│       │       ├── dashboard/
│       │       ├── dossiers/
│       │       │   ├── page.tsx
│       │       │   ├── new/page.tsx          # 3-step wizard
│       │       │   └── [id]/page.tsx         # detail with verdict stamp
│       │       ├── documents/[id]/
│       │       ├── analytics/
│       │       ├── settings/
│       │       └── billing/
│       ├── components/
│       │   ├── ui/                 # button, card, badge, input, ThemeToggle
│       │   ├── layout/             # Sidebar, TopBar, Logo, SearchPalette
│       │   ├── landing/            # Hero, Features, Pricing, Footer, Nav
│       │   ├── dossiers/           # DossierCard, DossierTable
│       │   ├── documents/          # UploadZone, DocumentViewer
│       │   ├── analysis/           # VerdictStamp, TrustScoreGauge, …
│       │   └── charts/             # FraudDistributionChart, ProcessingVolumeChart
│       └── lib/
│           ├── api.ts              # typed API client
│           ├── auth.ts             # client-side session helpers
│           ├── theme.tsx           # ThemeProvider + useTheme
│           ├── types.ts
│           └── utils.ts
│
├── test-data/                     # generated test PDFs + runner
│   ├── generate.py                # PIL-based document generator
│   ├── run_pipeline.sh            # end-to-end test runner
│   ├── clean/                     # 4 PDFs of a coherent dossier
│   └── fraudulent/                # 4 PDFs with deliberate inconsistencies
│
└── docs/                          # PFE deliverables
    ├── build_report.py            # Word document generator
    ├── build_deck.py              # PowerPoint deck generator
    ├── FraudScan_AI_Rapport_PFE.docx
    └── FraudScan_AI_Presentation_PFE.pptx

API reference

Full OpenAPI spec at http://localhost:8000/docs once running. Highlights:

Endpoint Method Description
/health GET Health probe
/api/auth/me GET Current user + org
/api/organizations/current GET Org info, plan, usage
/api/dossiers GET, POST List + create
/api/dossiers/{id} GET Full results (verdict, score, cross-checks, documents)
/api/dossiers/{id}/analyze POST Trigger pipeline (background task)
/api/dossiers/{id}/status GET Poll status
/api/dossiers/{id}/stream GET (SSE) Real-time progress stream
/api/documents/{id} POST Upload one or more files
/api/documents/{id}/preview GET Normalized JPEG
/api/documents/{id}/heatmap GET ELA heatmap PNG
/api/analytics/dashboard GET Aggregated stats
/api/search?q=… GET Global search across dossiers + documents
/api/billing/create-checkout POST Stripe checkout session
/api/billing/webhook POST Stripe webhook (signed)

All endpoints are tenant-scoped via get_current_organization dependency.


Database schema

Organization (multi-tenant root)
├─ id (UUID, PK)
├─ slug, name, plan (free/starter/pro/enterprise)
├─ stripe_customer_id, stripe_subscription_id
├─ monthly_dossier_limit, dossiers_used_this_month
└─ created_at

User
├─ id (UUID, PK)
├─ organization_id (FK → Organization)
├─ email (unique), full_name, role
└─ supabase_id, created_at

Dossier
├─ id, organization_id (FK), reference (DOSS-YYYY-XXXX)
├─ applicant_name, applicant_id_number, case_type
├─ status (pending/processing/completed/failed)
├─ verdict (clean/suspicious/fraudulent)
├─ global_trust_score, processing_time_ms
├─ progress_stage, progress_percent (live SSE)
├─ summary_text
└─ created_at, completed_at

Document
├─ id, dossier_id (FK)
├─ document_type (CIN/PASSPORT/PAYSLIP/BANK_STATEMENT/RIB/KBIS/OTHER)
├─ original_filename, minio_key, image_minio_key, heatmap_minio_key
├─ file_size_bytes, mime_type, status
├─ trust_score, forensics_score
├─ extraction_data (JSONB), anomalies (JSONB)
└─ processing_time_ms, created_at

CrossCheckResult
├─ id, dossier_id (FK)
├─ rule_id (R1_IDENTITY_MATCH, R2_INCOME_VERIFY, …)
├─ rule_name, status (passed/failed/warning/skipped)
├─ confidence (0–100), details
├─ doc_a_id, doc_b_id (which docs were compared)
└─ created_at

AuditLog
├─ id, organization_id (FK), user_id
├─ action, entity_type, entity_id
├─ metadata (JSONB), ip_address
└─ created_at

Deliverables

The docs/ folder contains generators for the PFE Big Data & IA submission:

  • FraudScan_AI_Rapport_PFE.docx — full report following the prof's directives (compréhension du problème, données, EDA, modélisation, évaluation, industrialisation, limites, bibliographie, annexes). Editorial styling, tables with ink-blue headers, code blocks, hairline rules.
  • FraudScan_AI_Presentation_PFE.pptx — 23-slide widescreen deck (16:9) with cover, agenda, chapter dividers, comparison tables, confusion matrix, ablation study, demo flow, "Merci" closing.

Regenerate after edits:

cd docs
python3 build_report.py    # → .docx
python3 build_deck.py      # → .pptx

Both scripts use only python-docx + python-pptx — no external assets.


Testing

Compile-check the backend

cd backend && python3 -m compileall -q . && echo OK

Type-check the frontend

cd frontend && npm run typecheck

End-to-end pipeline

./test-data/run_pipeline.sh

Runs the full pipeline on both the clean and fraudulent test scenarios, prints verdict, score, latency, and cross-check breakdown.

Generate fresh test documents

docker cp test-data/generate.py fraudscan-backend:/tmp/generate.py
docker exec fraudscan-backend python /tmp/generate.py
docker cp fraudscan-backend:/tmp/clean       test-data/_new_clean
docker cp fraudscan-backend:/tmp/fraudulent  test-data/_new_fraud

Configuration

All configuration via .env at the project root. Defaults are dev-friendly.

Variable Default Notes
DATABASE_URL postgresql+asyncpg://fraud:fraud123@postgres:5432/fraudscan
MINIO_ENDPOINT minio:9000
MINIO_BUCKET fraudscan-documents Auto-created at boot
REDIS_URL redis://redis:6379
OPENAI_API_KEY sk-replace-me Set this for real GPT-4o Vision. Pipeline uses stub when missing.
OPENAI_VISION_MODEL gpt-4o
SUPABASE_JWT_SECRET dev-secret-… HS256
DEV_AUTH_BYPASS true Disable in production
SEED_DEMO_DATA true Disable to skip seeding
STRIPE_SECRET_KEY sk_test_dev Test mode by default
STRIPE_WEBHOOK_SECRET whsec_dev Required for signature verification
STRIPE_PRICE_STARTER / STRIPE_PRICE_PRO placeholders Real Stripe Price IDs to enable upgrades

Enable real GPT-4o Vision

# Edit .env
OPENAI_API_KEY=sk-proj-your-real-key

docker compose restart backend
./test-data/run_pipeline.sh
# Now R2/R3/R5 will fire real failures on the fraudulent dossier

Roadmap

Implemented vs. proposed — full discussion in REPORT.md §7.

5-layer pipeline with LangGraph orchestration
Real-time SSE progress
Multi-tenancy + audit log
Forensic Ledger design system + light/dark themes
⌘K command-palette search
Demo data seeder (25 dossiers)
Stripe billing (test mode)
Docker Compose deployment
🟡 Supabase real auth (currently dev bypass)
🟡 Multi-page PDF support
🟡 OCR offline fallback
🟡 Forensic CNN (MesoNet / EfficientNet)
🟡 Active learning loop with XGBoost final classifier
🟡 Local fine-tuned LLaVA for cost reduction
🟡 Kafka + Spark streaming architecture
🟡 Adversarial robustness evaluation

License

MIT — see LICENSE (to be added). Free for academic use; commercial use requires permission.


Author

[Your name] — Master Big Data & Intelligence Artificielle, 2025–2026. Project supervised by [Encadrant]. See REPORT.md for the full PFE report.


FraudScan AIDocument Intelligence. Zero Trust.

About

Forensic document fraud detection SaaS — 5-layer LangGraph pipeline (ELA + GPT-4o Vision + cross-checks), Next.js 14 editorial UI with light/dark themes, multi-tenant FastAPI backend.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors