🤖 AI-Driven Contract Lifecycle Management (VERDICT)

📋 Overview

VERDICT is an AI-powered contract analysis platform that automates legal contract management using Google Gemini 2.5 Pro. The platform helps legal and procurement teams reduce manual effort, minimize errors, accelerate contract review, and provide actionable insights. It supports contracts in English and Arabic with full RAG-powered Q&A capabilities.

🎯 Key Capabilities

📄 Secure Contract Upload — Support for PDF and DOCX files
🤖 AI-Powered Analysis — Comprehensive contract analysis using Gemini 2.5 Pro
💬 RAG Q&A System — TF-IDF + MMR-powered question answering
🌐 Arabic Translation — Full contract translation with RTL display
📊 Contract Summarization — Automatic summary generation
⚠️ Risk & Compliance Analysis — Identify potential issues and missing clauses
🎨 Professional UI — VERDICT-branded Streamlit interface

🛠 Technology Stack

Component	Technology	Purpose
Frontend	Streamlit	Multi-page web application
LLM	Google Gemini 2.5 Pro Preview	Contract analysis and generation
Document Parsing	`doc_reader.py` (PDF/DOCX)	Text extraction from contracts
RAG System	TF-IDF + MMR Diversification	Question answering with citations
Embeddings	Sentence-Transformers	Semantic search capabilities
Translation	Deep Translator (Google)	Arabic translation with chunking
Storage	File-based (JSON/TXT)	Persistent analysis storage
API Backend	Google Gemini API	AI processing (250 req/day free tier)

📁 Repository Structure

UNSUPERVISED-CLM/
│
├── 🏠 main.py                        # Main Streamlit app (Home page)
│   │                                 # - Full contract analysis pipeline
│   │                                 # - 2-tab interface (Full Analysis + Arabic Translation)
│   │                                 # - RAG-powered Q&A chatbot
│   │                                 # - Saved analyses sidebar
│   │                                 # - Session state management
│
├── 📱 app/                           # UI components & styling
│   ├── theme.css                     # VERDICT design (Aurora Indigo→Teal gradient)
│   │                                 # - Custom header with navigation
│   │                                 # - Section cards & hero blocks
│   │                                 # - Chat interface styling
│   │                                 # - Responsive design system
│   ├── config.py                     # App configuration & settings
│   ├── shared.py                     # Shared utilities (header, theme loader)
│   ├── utils.py                      # Helper functions
│   ├── parse_utils.py                # Document parsing utilities
│   └── assets/                       # Images, logos, VERDICT branding
│
├── 📄 pages/                         # Streamlit multi-page app structure
│   ├── 1_Upload.py                   # Upload page with full analysis
│   │                                 # - Same functionality as main page
│   │                                 # - Parse → Analyze → Display → RAG → Translate
│   ├── 2_Create.py                   # Create new contracts page
│   ├── 3_Edit.py                     # Edit existing contracts page
│   └── 4_Results.py                  # Results display page
│
├── 🧠 src/                           # Core AI logic & processing
│   ├── doc_reader.py                 # Document parsing (PDF/DOCX → text)
│   │                                 # - Language detection (Arabic/English)
│   │                                 # - Text extraction with metadata
│   │
│   ├── llm_handler.py                # LLM wrapper for Gemini API
│   │                                 # - API configuration management
│   │                                 # - Request/response handling
│   │
│   ├── analysis.py                   # Main contract analysis logic
│   │                                 # - analyze_document() function
│   │                                 # - 7-section structured analysis
│   │                                 # - Prompt engineering for contracts
│   │
│   ├── analysis_metrics.py           # Performance & quality metrics
│   │                                 # - ContractAnalysisMetrics class
│   │                                 # - Session tracking
│   │                                 # - Quality assessment scores
│   │                                 # - Performance monitoring
│   │
│   ├── rag_model.py                  # RAG system implementation
│   │                                 # - TfidfIndex class
│   │                                 # - TF-IDF vectorization
│   │                                 # - MMR diversification
│   │                                 # - Extractive + generative answers
│   │                                 # - LEGAL_PATTERNS for contracts
│   │
│   ├── metrics_rag.py                # RAG performance metrics
│   │                                 # - Answer quality evaluation
│   │                                 # - Retrieval accuracy tracking
│   │
│   ├── contract_summary_generator.py # Summary generation system
│   │                                 # - ContractSummaryGenerator class
│   │                                 # - Multi-document grouping
│   │                                 # - Automatic summary creation
│   │
│   ├── translation_utils.py          # Arabic translation module
│   │                                 # - translate_to_arabic() with chunking
│   │                                 # - save_arabic_translation()
│   │                                 # - load_arabic_translation()
│   │                                 # - Rate limiting (0.5s delay)
│   │
│   ├── translation_handler.py        # Translation orchestration
│   ├── bilingual_analyzer.py         # Bilingual analysis support
│   ├── bilingual_contract_summary_generator.py
│   ├── data_extraction.py            # Contract information extraction
│   ├── risk_analysis.py              # Risk assessment logic
│   ├── summarization.py              # Text summarization
│   └── generate_summaries.py         # Batch summary generation
│
├── 💬 prompts/                       # Prompt engineering templates
│   ├── extraction_prompt.txt         # Information extraction prompts
│   ├── risk_prompt.txt               # Risk analysis prompts
│   └── summarization_prompt.txt      # Summarization prompts
│
├── 📊 data/                          # Data storage directories
│   ├── parsed/                       # Parsed contract JSON files
│   │                                 # Format: filename_YYYYMMDD_HHMMSS.json
│   │                                 # Contains: {text, language, metadata}
│   │
│   ├── analysed_documents/           # Gemini analysis output files
│   │                                 # Format: filename_analysis_YYYYMMDD_HHMMSS.txt
│   │                                 # Contains: Full contract analysis text
│   │                                 # Note: Files with -arabic.txt are translations
│   │
│   ├── contract_summaries/           # Generated summaries
│   │                                 # Auto-generated by ContractSummaryGenerator
│   │
│   ├── english/                      # English contract samples
│   │   ├── sample_contract_1.pdf
│   │   └── sample_contract_2.docx
│   │
│   ├── arabic/                       # Arabic contract samples
│   │   ├── sample_contract_1.pdf
│   │   └── sample_contract_2.docx
│   │
│   └── old_runs/                     # Historical analysis runs
│
├── 🧪 tests/                         # Test suite
│   │                                 # Contains unit & integration tests
│   │                                 # - test_imports.py
│   │                                 # - test_data_extraction.py
│   │                                 # - test_risk_analysis.py
│   │                                 # - test_gemini_api.py
│   │                                 # Run with: pytest tests/
│
├── 📋 requirements.txt               # Python dependencies
├── 🔒 .env                          # Environment variables (GEMINI_API_KEY)
├── .gitignore                       # Git ignore file
└── 📖 README.md                     # This file

� Quick Start (Clone & Run)

Follow these steps to set up VERDICT on a new machine without a virtual environment:

Clone the repository

git clone https://github.com/thm-msror/Unsupervised-CLM.git
cd Unsupervised-CLM

Install Python dependencies (uses the global interpreter)
```
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install deep-translator python-magic-bin
```
deep-translator (Arabic translation) and python-magic-bin (file type detection on Windows) are required at runtime; add them to the requirements file if your environment does not already include them.

Create the .env file in the project root with your own credentials

GEMINI_API_KEY=AIza...your_key_here...
STREAMLIT_SERVER_PORT=8501

Launch the Streamlit app
```
streamlit run main.py
```
(Optional) Verify setup
```
pytest
python tests/test_env.py
```

�🔄 Analysis Pipeline

Main Pipeline Flow (main.py & 1_Upload.py)

┌─────────────────────────────────────────────────────────────────┐
│ 1. DOCUMENT UPLOAD                                              │
│    - User uploads PDF/DOCX file                                 │
│    - File stored in session state                               │
└─────────────────┬───────────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────────┐
│ 2. DOCUMENT PARSING (doc_reader.py)                            │
│    - Extract text from PDF/DOCX                                 │
│    - Detect language (Arabic/English)                           │
│    - Save to data/parsed/filename_YYYYMMDD_HHMMSS.json         │
│    Output: {text: str, language: str, metadata: dict}          │
└─────────────────┬───────────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────────┐
│ 3. GEMINI ANALYSIS (analyze_text_with_gemini)                  │
│    - Create temporary JSON with parsed data                     │
│    - Call analysis.analyze_document()                           │
│    - Use Gemini 2.5 Pro Preview model                          │
│    - Generate 7-section structured analysis:                    │
│      • Contract Overview                                        │
│      • Key Parties and Roles                                    │
│      • Important Dates and Deadlines                            │
│      • Legal Framework and Governing Law                        │
│      • Main Obligations and Responsibilities                    │
│      • Financial Terms and Conditions                           │
│      • Risk Analysis and Potential Issues                       │
│    - Save to data/analysed_documents/filename_analysis_*.txt   │
└─────────────────┬───────────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────────┐
│ 4. PARALLEL PROCESSING (Threading)                             │
│    ┌─────────────────────┐  ┌──────────────────────────┐      │
│    │ RAG Index Building  │  │ Summary Generation       │      │
│    │ (build_rag_index)   │  │ (generate_summary)       │      │
│    │                     │  │                          │      │
│    │ • Split analysis    │  │ • Group related docs     │      │
│    │   into sections     │  │ • Call Gemini for        │      │
│    │ • Build TF-IDF      │  │   comprehensive summary  │      │
│    │   vectors           │  │ • Save to                │      │
│    │ • Enable MMR search │  │   contract_summaries/    │      │
│    │ • Save with joblib  │  │                          │      │
│    └─────────────────────┘  └──────────────────────────┘      │
└─────────────────┬───────────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────────┐
│ 5. DISPLAY RESULTS (2-Tab Interface)                           │
│    ┌─────────────────────────────────────────────────────┐    │
│    │ Tab 1: Full Analysis                                 │    │
│    │ - Dark blue styled text box (#0d1933 background)    │    │
│    │ - Complete Gemini analysis with all 7 sections      │    │
│    │ - Scrollable with max-height 600px                  │    │
│    └─────────────────────────────────────────────────────┘    │
│    ┌─────────────────────────────────────────────────────┐    │
│    │ Tab 2: Arabic Translation                            │    │
│    │ - "Translate to Arabic" button (if not exists)      │    │
│    │ - Uses translation_utils.translate_to_arabic()      │    │
│    │ - Chunks text (4500 chars) with rate limiting       │    │
│    │ - Saves as filename-arabic.txt                      │    │
│    │ - RTL display with Arabic styling                   │    │
│    └─────────────────────────────────────────────────────┘    │
└─────────────────┬───────────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────────┐
│ 6. Q&A CHATBOT (answer_contract_question)                      │
│    - User asks question about contract                          │
│    - Load RAG index from joblib file                           │
│    - TF-IDF search for relevant sections (top_k=3)            │
│    - Apply MMR diversification                                 │
│    - Generate answer using Gemini with context                 │
│    - Display in chat interface with history                    │
└─────────────────────────────────────────────────────────────────┘

Session State Management

Streamlit uses st.session_state to persist data across reruns:

# Main page session state
st.session_state.analysis_results      # List of all analyzed contracts
st.session_state.current_analysis      # Currently displayed analysis
st.session_state.rag_index            # RAG index for Q&A
st.session_state.rag_built            # Flag: RAG ready
st.session_state.chat_history         # Q&A conversation history

# Upload page session state (separate namespace)
st.session_state.upload_analysis_results
st.session_state.upload_current_analysis
st.session_state.upload_rag_index
st.session_state.upload_rag_built
st.session_state.upload_chat_history

File Storage Patterns

Upload → Parse → Analyze → Translate
   ↓        ↓        ↓          ↓
contract.pdf
         parsed/contract_20250126_120000.json
                 analysed_documents/contract_analysis_20250126_120030.txt
                                analysed_documents/contract_analysis_20250126_120030-arabic.txt

Loading Logic (main.py load_saved_analyses_simple()):

Scans data/analysed_documents/ directory
Loads only files matching pattern: *_analysis_*.txt
Skips Arabic translations (files ending with -arabic.txt)
Sorts by timestamp (newest first)
Displays in sidebar with "✅ Loaded Contract Analysis #X from disk"

📊 Analysis Metrics & Quality Assessment

ContractAnalysisMetrics Class (`src/analysis_metrics.py`)

Comprehensive metrics tracking for LLM performance and quality assessment.

1. Session Metadata

{
    'session_id': '20250126_120000',           # Timestamp-based unique ID
    'session_start': '2025-01-26T12:00:00',    # ISO timestamp
    'session_end': '2025-01-26T12:05:30',      # ISO timestamp
    'session_duration_minutes': 5.5            # Total elapsed time
}

2. Processing Volume Metrics

{
    'documents_processed': 10,        # Total attempted
    'successful_analyses': 9,         # Successfully completed
    'failed_analyses': 1,             # Failed processing
    'total_processing_time': 450.2    # Cumulative seconds
}

3. Performance Metrics

API Response Times:

Tracks all Gemini API call latencies
Calculates average, median, p95, p99
Identifies performance bottlenecks

Throughput:

throughput_docs_per_minute = successful_analyses / (total_time / 60)

Success Rate:

success_rate = (successful_analyses / documents_processed) * 100

4. Quality Assessment Metrics (0.0-1.0 scale)

Structure Compliance:

Checks if analysis follows 7-section format
Validates presence of all required sections
Calculation: sections_found / 7

Completeness Scores:

Measures analysis depth and thoroughness
Checks for minimum content length per section
Penalties for missing information

Extraction Accuracy:

Validates extracted entities (parties, dates, amounts)
Cross-references with source text
Pattern matching for legal terms

Consistency Scores:

Checks internal logical consistency
Validates date sequences (effective < expiration)
Cross-references between sections

5. Content Analysis Metrics

Document Sizes:

{
    'document_sizes': [2500, 3200, 1800],     # Character counts
    'avg_document_size': 2500,
    'median_document_size': 2500
}

Language Distribution:

{
    'languages_detected': {
        'en': 7,      # 7 English contracts
        'ar': 3       # 3 Arabic contracts
    }
}

Complexity Scores:

Based on document length, legal jargon density, clause count
Formula: (char_count / 1000) * jargon_ratio * clause_density

6. Per-Document Metrics

Each analyzed document gets detailed tracking:

{
    'filename': 'contract.pdf',
    'text_length': 2500,
    'language': 'en',
    'processing_time': 45.2,
    'api_calls': 3,
    'tokens_used': 5000,
    'errors': [],
    'analysis_quality': {
        'structure_compliance': 1.0,
        'completeness': 0.95,
        'extraction_accuracy': 0.92,
        'consistency': 0.98
    },
    'success': True,
    'timestamp': '2025-01-26T12:01:30'
}

Important Metrics Calculations

Average Response Time

avg_response_time = sum(api_response_times) / len(api_response_times)

Overall Quality Score

quality_score = (
    structure_compliance * 0.25 +
    completeness * 0.30 +
    extraction_accuracy * 0.25 +
    consistency * 0.20
)

Token Efficiency

tokens_per_char = total_tokens / total_chars_processed
efficiency_score = 1.0 / tokens_per_char  # Higher is better

Metrics Output

JSON Export:

metrics.save_metrics('data/metrics/session_20250126_120000.json')

Console Dashboard:

=== VERDICT Contract Analysis Session ===
Session ID: 20250126_120000
Duration: 5.5 minutes

Processing:
- Documents: 10 processed
- Success: 9 (90.0%)
- Failed: 1 (10.0%)

Performance:
- Avg Response Time: 4.2s
- Throughput: 1.8 docs/min
- Total Tokens: 50,000

Quality (avg):
- Structure: 0.95
- Completeness: 0.92
- Accuracy: 0.89
- Consistency: 0.96
- Overall: 0.93

🧪 RAG System Architecture

TfidfIndex Class (`src/rag_model.py`)

Core Components:

TF-IDF Vectorization:
- Converts text to numerical vectors
- Term frequency × Inverse document frequency
- Captures importance of terms in corpus
MMR Diversification:
- Maximal Marginal Relevance algorithm
- Balances relevance with diversity
- Prevents redundant results
- Formula: MMR = λ × Similarity(q,d) - (1-λ) × max Similarity(d,R)
LEGAL_PATTERNS:
- Pre-defined regex patterns for contract entities
- Extracts: parties, dates, amounts, obligations, termination, renewal
- Pattern-based extractive answers
Hybrid Search:
- Extractive: Direct text extraction with patterns
- Generative: Gemini-generated answers with context
- Combined approach for best results

Key Methods:

build_index(documents):

Splits documents into searchable chunks
Builds TF-IDF matrix
Saves index with joblib for persistence

search(query, top_k=5, use_mmr=True):

Vectorizes query
Computes cosine similarity
Applies MMR if enabled
Returns ranked results with scores

ask(question, use_extractive=True, use_generative=True):

Searches relevant context
Tries extractive answer first (pattern matching)
Falls back to generative (Gemini) if needed
Returns answer with source citations

✨ Features

🎯 Core Features

📄 Secure Contract Upload & Processing
Multi-page Streamlit app with dedicated Upload page and Results display
🔍 AI-Powered Comprehensive Analysis
7-section structured analysis using Gemini 2.5 Pro Preview
💬 RAG-Powered Q&A
TF-IDF + MMR search with Gemini-generated answers
🌐 Arabic Translation
Full translation with chunking, rate limiting, and RTL display
📊 Contract Summaries
Automatic summary generation for contract groups
💾 Persistent Storage
File-based storage with automatic loading on app restart

� Requirements

System Requirements

Python 3.8+ — Main runtime
Internet connection — For Google Gemini API access
Google AI API Key — Free tier available at Google AI Studio

Installation Commands

# Install Python dependencies
pip install -r requirements.txt

# Verify installation
python --version
pip list | grep google-generativeai

�🚀 Quick Start

🏠 Local Development (Recommended for Privacy)

Step 1: Clone & Install

git clone https://github.com/thm-msror/Unsupervised-CLM.git
cd Unsupervised-CLM
pip install -r requirements.txt

Step 2: Get Google Gemini API Key

# 1. Go to Google AI Studio: https://makersuite.google.com/app/apikey
# 2. Sign in with Google account
# 3. Click "Create API Key" 
# 4. Copy the API key (starts with AIza...)

Step 3: Configure Environment

# Create a .env file
New-Item -ItemType File -Path .env -Force

# Add your Gemini API key:
echo "GEMINI_API_KEY=your_api_key_here" >> .env

💡 Free Tier: Google Gemini offers generous free limits: 15 req/min, 1500 req/day, 1M tokens/month

Step 4: Run Application

# Run from repository root
python -m streamlit run .\main.py --server.port 8501
# Open your browser at: http://localhost:8501

🔑 API Key Security: Your Gemini API key is used client-side and not stored permanently
📁 Structure Note: The main entry point is main.py at the repository root for simple deployment

☁️ Cloud Deployment (Demo Only)

For hackathon demos, deploy on Render using cloud-hosted LLMs:

Step 1: Connect Repository

Link your GitHub repository to Render

Step 2: Configure Environment

Add environment variables from .env to Render dashboard
run python -m dotenv
Include API keys, model endpoints, and configuration

Step 3: Set Start Command

streamlit run main.py --server.port $PORT --server.address 0.0.0.0

🔒 Security: Add your GEMINI_API_KEY as an environment variable in Render dashboard

🧪 Testing & Development

Running Tests

# Run all tests
pytest tests/

# Run specific test modules
pytest tests/test_data_extraction.py -v
pytest tests/test_risk_analysis.py -v

Development Setup

# Install development dependencies
pip install -r requirements.txt

🔐 AI Configuration & API Setup

🤖 Google Gemini 2.5 Flash

This application uses Google Gemini 2.5 Flash for contract analysis:

Feature	Specification
Model	gemini-2.5-flash (latest)
Context Window	32K tokens
Speed	Optimized for low-latency
Free Tier	15 req/min, 1500 req/day, 1M tokens/month

🔑 API Key Setup

Get Your Free API Key:

Visit Google AI Studio
Sign in with your Google account
Click "Create API Key"
Copy the key (starts with AIza...)

Add to Environment:

# Create .env file
echo "GEMINI_API_KEY=AIza..." >> .env

🧪 Testing Gemini API

Quick Test Script:

# Run the test script
python tests/test_gemini_api.py

Manual Test with Python:

import google.generativeai as genai

# Configure API
genai.configure(api_key="your_api_key_here")
model = genai.GenerativeModel('gemini-2.5-flash')

# Test contract analysis
contract_text = "Agreement between ABC Corp and XYZ Ltd, effective Jan 2025, $150/hour"
prompt = f"Extract key information from this contract: {contract_text}"

response = model.generate_content(prompt)
print(response.text)

💡 Why Gemini for Hackathons?

🆓 Free Tier: Generous limits for development and demos
⚡ Fast: Optimized for quick responses
🧠 Smart: Excellent at structured data extraction
📄 Contract-Ready: Great performance on legal document analysis

📈 Real Performance Metrics (Latest Session)

Session Overview (October 25, 2025)

{
  "session_id": "20251025_232811",
  "session_duration_minutes": 4.75,
  "documents_processed": 10,
  "successful_analyses": 10,
  "failed_analyses": 0,
  "success_rate": 100%
}

Processing Performance

Metric	Value
Total Processing Time	239.96 seconds (~4 minutes)
Average Response Time	23.71 seconds per contract
Throughput	2.5 documents/minute
Fastest Analysis	17.68 seconds (21.5K chars, English)
Slowest Analysis	33.11 seconds (26.6K chars, Arabic)

Document Distribution

Languages:

🇸🇦 Arabic: 5 contracts (50%)
🇺🇸 English: 5 contracts (50%)

Contract Types Detected:

Employment Agreements: 6 (60%)
Purchase Agreements: 2 (20%)
Software Development: 1 (10%)

Document Size Range:

Smallest: 13,034 characters
Largest: 35,482 characters
Average: 24,285 characters
Median: 24,922 characters

Quality Metrics (0.0 - 1.0 scale)

Quality Dimension	Score	Performance
Structure Compliance	1.00	✅ Perfect (100%)
Completeness	1.00	✅ Perfect (100%)
Extraction Accuracy	0.53	⚠️ Good (53%)
Consistency	1.00	✅ Perfect (100%)

Per-Document Quality Breakdown:

Document #1 (Arabic, 17K chars):  Structure ✅ | Complete ✅ | Accuracy 44% | Consistent ✅
Document #2 (Arabic, 28K chars):  Structure ✅ | Complete ✅ | Accuracy 56% | Consistent ✅
Document #3 (Arabic, 27K chars):  Structure ✅ | Complete ✅ | Accuracy 67% | Consistent ✅
Document #4 (Arabic, 13K chars):  Structure ✅ | Complete ✅ | Accuracy 33% | Consistent ✅
Document #5 (English, 22K chars): Structure ✅ | Complete ✅ | Accuracy 44% | Consistent ✅
Document #6 (English, 35K chars): Structure ✅ | Complete ✅ | Accuracy 67% | Consistent ✅
Document #7 (English, 33K chars): Structure ✅ | Complete ✅ | Accuracy 78% | Consistent ✅
Document #8 (English, 16K chars): Structure ✅ | Complete ✅ | Accuracy 33% | Consistent ✅
Document #9 (English, 28K chars): Structure ✅ | Complete ✅ | Accuracy 56% | Consistent ✅

Token Usage & API Performance

API Metric	Value
Total Tokens Processed	68,837 tokens
Average Tokens/Document	7,648 tokens
Smallest Request	4,721 tokens
Largest Request	10,543 tokens
API Success Rate	100% (9/9 API calls)*

*One document processed without API call (cached or alternative method)

Complexity Analysis

Average Contract Complexity: 0.577 (Medium-High)

Complexity Distribution:

High (>0.7): 1 contract (Arabic MSP Agreement - 0.746)
Medium (0.5-0.7): 6 contracts
Low (<0.5): 3 contracts

Complexity Formula:

complexity = (char_count / 1000) * jargon_ratio * clause_density

API Response Time Analysis

Fastest: 17.677s  ━━━━━━━━━━━━━━━━━━░░░░░░░░
Average: 23.707s  ━━━━━━━━━━━━━━━━━━━━━━━━░░
Slowest: 33.105s  ━━━━━━━━━━━━━━━━━━━━━━━━━━

Distribution:
15-20s: ███ (3 contracts)
20-25s: ████ (4 contracts)
25-30s: ░ (0 contracts)
30-35s: ██ (2 contracts)

Key Findings

✅ Strengths:

Perfect structure compliance - all analyses follow 7-section format
100% success rate - no failures or errors
Consistent quality across English and Arabic contracts
Good throughput (2.5 docs/min) for comprehensive analysis

⚠️ Areas for Improvement:

Extraction accuracy varies (33% - 78% range)
Arabic contracts slightly slower (avg 25.5s vs 21.2s for English)
Some contracts have lower information density detection

🎯 Recommendations:

Optimize extraction accuracy through better prompt engineering
Implement caching for repeated contract patterns
Consider parallel processing for batch operations
Fine-tune information density detection algorithms

🏆 System Reliability

Based on the latest session metrics:

Uptime: 100% (no crashes or system failures)
API Reliability: 100% (all API calls successful)
Data Integrity: 100% (all files saved correctly)
Processing Consistency: Stable response times across session
Bilingual Support: Equal performance on English and Arabic contracts

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.cache		.cache
app		app
benches		benches
data		data
pages		pages
prompts		prompts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
suite.json		suite.json

Folders and files

Latest commit

History

Repository files navigation

🤖 AI-Driven Contract Lifecycle Management (VERDICT)

📋 Overview

🎯 Key Capabilities

🛠 Technology Stack

📁 Repository Structure

� Quick Start (Clone & Run)

�🔄 Analysis Pipeline

Main Pipeline Flow (main.py & 1_Upload.py)

Session State Management

File Storage Patterns

📊 Analysis Metrics & Quality Assessment

ContractAnalysisMetrics Class (src/analysis_metrics.py)

1. Session Metadata

2. Processing Volume Metrics

3. Performance Metrics

4. Quality Assessment Metrics (0.0-1.0 scale)

5. Content Analysis Metrics

6. Per-Document Metrics

Important Metrics Calculations

Average Response Time

Overall Quality Score

Token Efficiency

Metrics Output

🧪 RAG System Architecture

TfidfIndex Class (src/rag_model.py)

Core Components:

Key Methods:

✨ Features

🎯 Core Features

� Requirements

System Requirements

Installation Commands

�🚀 Quick Start

🏠 Local Development (Recommended for Privacy)

☁️ Cloud Deployment (Demo Only)

🧪 Testing & Development

Running Tests

Development Setup

🔐 AI Configuration & API Setup

🤖 Google Gemini 2.5 Flash

🔑 API Key Setup

🧪 Testing Gemini API

💡 Why Gemini for Hackathons?

📈 Real Performance Metrics (Latest Session)

Session Overview (October 25, 2025)

Processing Performance

Document Distribution

Quality Metrics (0.0 - 1.0 scale)

Token Usage & API Performance

Complexity Analysis

API Response Time Analysis

Key Findings

🏆 System Reliability

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

ContractAnalysisMetrics Class (`src/analysis_metrics.py`)

TfidfIndex Class (`src/rag_model.py`)

Packages