Financial Disclosure Auditor — Cross-Temporal Contradiction Detector
Codicil reads years of SEC 10-K filings and finds specific, cited instances where management's forward-looking statements in one filing directly contradict what they disclosed had actually occurred in the next. If it cannot quote verbatim from both filings, it outputs nothing.
Built for the lablab.ai × AMD Developer Hackathon 2026.
Live demo: nurusyda-codicil.hf.space
Most AI tools summarise documents. Codicil audits them.
For every consecutive pair of annual filings, the model is given the earlier filing's forward-looking language and the later filing's disclosure of actual outcomes. It is instructed to produce a finding only if it can extract a verbatim quote from each — the earlier projection and the later reality — and explain the contradiction. No inference. No sentiment scores. No hallucinations.
Verified batch run — 10 major US banks, 10 years of 10-K filings:
| Bank | Finding | Documents |
|---|---|---|
| TFC | Contradiction between forward-looking guidance and actual results for FDIC DIF assessment premiums | Amendment No. 64 (2023) → No. 77 (2024) |
| TFC | Contradictory statements regarding the effective end date of the Stress Capital Buffer | Amendment No. 99 (2026) → No. 107 (2026) |
| TFC | Contradiction regarding the regulatory deadline for submitting the first resolution plan under the Tailoring Rules | Amendment No. 34 (2020) → No. 45 (2021) |
| BAC | Contradiction in stated regulatory submission frequency for resolution plans | Amendment No. 2 (2017) → No. 14 (2018) |
| GS | Contradiction in stated regulatory disclosure frequency for ESG risks under the CRR | Amendment No. 3 (2023) → No. 16 (2024) |
| MS | Contradiction in the stated expected compliance date for the final phase of initial margin rules | Amendment No. 3 (2020) → No. 15 (2021) |
Total batch time: ~1,854 seconds. Total GPU compute cost: ~$0.65.
The MI300X's 192GB HBM3 memory is load-bearing, not decorative.
Qwen3.6-35B-A3B at a 131K token context window requires GPU memory that a standard H100 cannot hold at this utilisation level. The MI300X enables loading the full model with headroom for long-context batches, making multi-year cross-filing analysis possible in a single inference pass. No chunking. No retrieval errors. That is the product's core guarantee.
Batch across 10 tickers runs via vLLM's PagedAttention on ROCm, with efficient KV-cache reuse across concurrent requests.
SEC EDGAR
│
▼
scripts/ingest_edgar.py ← downloads 10-K filings, extracts thematic sections
│ groups: capital_liquidity, risk_language,
│ covenant_legal, management_guidance
▼
scripts/codicil_inference.py ← sends section chain to Qwen3.6-35B on MI300X
│ auditor prompt: cite-or-be-silent discipline
│ output: Earlier quote + Later quote + reasoning
▼
scripts/batch_runner.py ← loops ingest + inference across multiple tickers
│ writes data/batch_results/batch_live_*.json
▼
scripts/sector_compare.py ← aggregates batch results into sector_summary.json
│ sorted by finding count descending
▼
scripts/app.py ← Pure Python HTTP server UI
Analyze tab: single-company live analysis
Sector tab: 10-bank comparison dashboard
Gap detection: instant, GPU-free
Offline fallback: demo works without MI300X
Requirements: Python 3.11, a running vLLM server pointed at Qwen3.6-35B-A3B
git clone https://github.com/nurusyda/codicil
cd codicil
python -m venv env && source env/bin/activate
pip install -r requirements.txtSet the inference server URL:
export LLAMA_SERVER_URL=http://<your-mi300x-ip>:8000/v1Run the app:
python scripts/app.py
# Open http://127.0.0.1:7860GPU-free demo mode: The Sector tab loads from data/sector_summary.json without inference. The Analyze tab's gap detection also runs without GPU. Only the Analyze → Run inference step requires the MI300X.
python scripts/batch_runner.py \
--tickers JPM BAC WFC C GS MS USB PNC TFC COF \
--form-type 10-K \
--limit 10
python scripts/sector_compare.py
# Output: data/sector_summary.jsonDry-run (no inference, verifies ingestion only):
python scripts/batch_runner.py --tickers JPM BAC --form-type 10-K --limit 5 --dry-rundocker start rocm && docker exec -it rocm /bin/bash
vllm serve Qwen/Qwen3.6-35B-A3B \
--dtype bfloat16 \
--max-model-len 131072 \
--gpu-memory-utilization 0.92 \
--enable-prefix-caching \
--reasoning-parser qwen3 \
--port 8000 \
--host 0.0.0.0Turn off the GPU immediately after batch completes. Verify "Offline" in AMD portal — not just poweroff from host.
| Script | Purpose |
|---|---|
scripts/app.py |
Main UI — FastAPI + WebSocket + all API endpoints |
scripts/ingest_edgar.py |
SEC EDGAR downloader and section extractor |
scripts/codicil_inference.py |
Auditor prompt + vLLM inference client |
scripts/batch_runner.py |
Multi-ticker batch orchestrator |
scripts/sector_compare.py |
Aggregates batch results into sector summary |
scripts/risk_factor_diff.py |
Risk factor disappearance detector (year-over-year diff) |
cross_check.py |
Cross-module import verifier — run before every commit (repo root, not scripts/) |
- Model: Qwen3.6-35B-A3B (Alibaba Cloud)
- Inference: vLLM on ROCm (AMD MI300X via AMD Developer Cloud)
- Embeddings: BAAI/bge-m3 via sentence-transformers (
local_files_only=True) - Vector store: ChromaDB
- Data source: SEC EDGAR via sec-edgar-downloader
- Backend: Python / http.server (stdlib)
- Frontend: Vanilla JS served by http.server
- Hosting: Hugging Face Spaces
Codicil surfaces candidates for attorney or analyst review. It does not replace legal judgment or investment advice. Every flagged finding requires human verification. Output is based on public SEC filings only.
Built May 2026. AMD Developer Hackathon submission.