Skip to content

thm-msror/Unsupervised-CLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 AI-Driven Contract Lifecycle Management (VERDICT)

📋 Overview

VERDICT is an AI-powered contract analysis platform that automates legal contract management using Google Gemini 2.5 Pro. The platform helps legal and procurement teams reduce manual effort, minimize errors, accelerate contract review, and provide actionable insights. It supports contracts in English and Arabic with full RAG-powered Q&A capabilities.

🎯 Key Capabilities

  • 📄 Secure Contract Upload — Support for PDF and DOCX files
  • 🤖 AI-Powered Analysis — Comprehensive contract analysis using Gemini 2.5 Pro
  • 💬 RAG Q&A System — TF-IDF + MMR-powered question answering
  • 🌐 Arabic Translation — Full contract translation with RTL display
  • 📊 Contract Summarization — Automatic summary generation
  • ⚠️ Risk & Compliance Analysis — Identify potential issues and missing clauses
  • 🎨 Professional UI — VERDICT-branded Streamlit interface

🛠 Technology Stack

Component Technology Purpose
Frontend Streamlit Multi-page web application
LLM Google Gemini 2.5 Pro Preview Contract analysis and generation
Document Parsing doc_reader.py (PDF/DOCX) Text extraction from contracts
RAG System TF-IDF + MMR Diversification Question answering with citations
Embeddings Sentence-Transformers Semantic search capabilities
Translation Deep Translator (Google) Arabic translation with chunking
Storage File-based (JSON/TXT) Persistent analysis storage
API Backend Google Gemini API AI processing (250 req/day free tier)

📁 Repository Structure

UNSUPERVISED-CLM/
│
├── 🏠 main.py                        # Main Streamlit app (Home page)
│   │                                 # - Full contract analysis pipeline
│   │                                 # - 2-tab interface (Full Analysis + Arabic Translation)
│   │                                 # - RAG-powered Q&A chatbot
│   │                                 # - Saved analyses sidebar
│   │                                 # - Session state management
│
├── 📱 app/                           # UI components & styling
│   ├── theme.css                     # VERDICT design (Aurora Indigo→Teal gradient)
│   │                                 # - Custom header with navigation
│   │                                 # - Section cards & hero blocks
│   │                                 # - Chat interface styling
│   │                                 # - Responsive design system
│   ├── config.py                     # App configuration & settings
│   ├── shared.py                     # Shared utilities (header, theme loader)
│   ├── utils.py                      # Helper functions
│   ├── parse_utils.py                # Document parsing utilities
│   └── assets/                       # Images, logos, VERDICT branding
│
├── 📄 pages/                         # Streamlit multi-page app structure
│   ├── 1_Upload.py                   # Upload page with full analysis
│   │                                 # - Same functionality as main page
│   │                                 # - Parse → Analyze → Display → RAG → Translate
│   ├── 2_Create.py                   # Create new contracts page
│   ├── 3_Edit.py                     # Edit existing contracts page
│   └── 4_Results.py                  # Results display page
│
├── 🧠 src/                           # Core AI logic & processing
│   ├── doc_reader.py                 # Document parsing (PDF/DOCX → text)
│   │                                 # - Language detection (Arabic/English)
│   │                                 # - Text extraction with metadata
│   │
│   ├── llm_handler.py                # LLM wrapper for Gemini API
│   │                                 # - API configuration management
│   │                                 # - Request/response handling
│   │
│   ├── analysis.py                   # Main contract analysis logic
│   │                                 # - analyze_document() function
│   │                                 # - 7-section structured analysis
│   │                                 # - Prompt engineering for contracts
│   │
│   ├── analysis_metrics.py           # Performance & quality metrics
│   │                                 # - ContractAnalysisMetrics class
│   │                                 # - Session tracking
│   │                                 # - Quality assessment scores
│   │                                 # - Performance monitoring
│   │
│   ├── rag_model.py                  # RAG system implementation
│   │                                 # - TfidfIndex class
│   │                                 # - TF-IDF vectorization
│   │                                 # - MMR diversification
│   │                                 # - Extractive + generative answers
│   │                                 # - LEGAL_PATTERNS for contracts
│   │
│   ├── metrics_rag.py                # RAG performance metrics
│   │                                 # - Answer quality evaluation
│   │                                 # - Retrieval accuracy tracking
│   │
│   ├── contract_summary_generator.py # Summary generation system
│   │                                 # - ContractSummaryGenerator class
│   │                                 # - Multi-document grouping
│   │                                 # - Automatic summary creation
│   │
│   ├── translation_utils.py          # Arabic translation module
│   │                                 # - translate_to_arabic() with chunking
│   │                                 # - save_arabic_translation()
│   │                                 # - load_arabic_translation()
│   │                                 # - Rate limiting (0.5s delay)
│   │
│   ├── translation_handler.py        # Translation orchestration
│   ├── bilingual_analyzer.py         # Bilingual analysis support
│   ├── bilingual_contract_summary_generator.py
│   ├── data_extraction.py            # Contract information extraction
│   ├── risk_analysis.py              # Risk assessment logic
│   ├── summarization.py              # Text summarization
│   └── generate_summaries.py         # Batch summary generation
│
├── 💬 prompts/                       # Prompt engineering templates
│   ├── extraction_prompt.txt         # Information extraction prompts
│   ├── risk_prompt.txt               # Risk analysis prompts
│   └── summarization_prompt.txt      # Summarization prompts
│
├── 📊 data/                          # Data storage directories
│   ├── parsed/                       # Parsed contract JSON files
│   │                                 # Format: filename_YYYYMMDD_HHMMSS.json
│   │                                 # Contains: {text, language, metadata}
│   │
│   ├── analysed_documents/           # Gemini analysis output files
│   │                                 # Format: filename_analysis_YYYYMMDD_HHMMSS.txt
│   │                                 # Contains: Full contract analysis text
│   │                                 # Note: Files with -arabic.txt are translations
│   │
│   ├── contract_summaries/           # Generated summaries
│   │                                 # Auto-generated by ContractSummaryGenerator
│   │
│   ├── english/                      # English contract samples
│   │   ├── sample_contract_1.pdf
│   │   └── sample_contract_2.docx
│   │
│   ├── arabic/                       # Arabic contract samples
│   │   ├── sample_contract_1.pdf
│   │   └── sample_contract_2.docx
│   │
│   └── old_runs/                     # Historical analysis runs
│
├── 🧪 tests/                         # Test suite
│   │                                 # Contains unit & integration tests
│   │                                 # - test_imports.py
│   │                                 # - test_data_extraction.py
│   │                                 # - test_risk_analysis.py
│   │                                 # - test_gemini_api.py
│   │                                 # Run with: pytest tests/
│
├── 📋 requirements.txt               # Python dependencies
├── 🔒 .env                          # Environment variables (GEMINI_API_KEY)
├── .gitignore                       # Git ignore file
└── 📖 README.md                     # This file

� Quick Start (Clone & Run)

Follow these steps to set up VERDICT on a new machine without a virtual environment:

  1. Clone the repository
    git clone https://github.com/thm-msror/Unsupervised-CLM.git
    cd Unsupervised-CLM
  2. Install Python dependencies (uses the global interpreter)
    python -m pip install --upgrade pip
    python -m pip install -r requirements.txt
    python -m pip install deep-translator python-magic-bin

    deep-translator (Arabic translation) and python-magic-bin (file type detection on Windows) are required at runtime; add them to the requirements file if your environment does not already include them.

  3. Create the .env file in the project root with your own credentials
    GEMINI_API_KEY=AIza...your_key_here...
    STREAMLIT_SERVER_PORT=8501
  4. Launch the Streamlit app
    streamlit run main.py
  5. (Optional) Verify setup
    pytest
    python tests/test_env.py

�🔄 Analysis Pipeline

Main Pipeline Flow (main.py & 1_Upload.py)

┌─────────────────────────────────────────────────────────────────┐
│ 1. DOCUMENT UPLOAD                                              │
│    - User uploads PDF/DOCX file                                 │
│    - File stored in session state                               │
└─────────────────┬───────────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────────┐
│ 2. DOCUMENT PARSING (doc_reader.py)                            │
│    - Extract text from PDF/DOCX                                 │
│    - Detect language (Arabic/English)                           │
│    - Save to data/parsed/filename_YYYYMMDD_HHMMSS.json         │
│    Output: {text: str, language: str, metadata: dict}          │
└─────────────────┬───────────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────────┐
│ 3. GEMINI ANALYSIS (analyze_text_with_gemini)                  │
│    - Create temporary JSON with parsed data                     │
│    - Call analysis.analyze_document()                           │
│    - Use Gemini 2.5 Pro Preview model                          │
│    - Generate 7-section structured analysis:                    │
│      • Contract Overview                                        │
│      • Key Parties and Roles                                    │
│      • Important Dates and Deadlines                            │
│      • Legal Framework and Governing Law                        │
│      • Main Obligations and Responsibilities                    │
│      • Financial Terms and Conditions                           │
│      • Risk Analysis and Potential Issues                       │
│    - Save to data/analysed_documents/filename_analysis_*.txt   │
└─────────────────┬───────────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────────┐
│ 4. PARALLEL PROCESSING (Threading)                             │
│    ┌─────────────────────┐  ┌──────────────────────────┐      │
│    │ RAG Index Building  │  │ Summary Generation       │      │
│    │ (build_rag_index)   │  │ (generate_summary)       │      │
│    │                     │  │                          │      │
│    │ • Split analysis    │  │ • Group related docs     │      │
│    │   into sections     │  │ • Call Gemini for        │      │
│    │ • Build TF-IDF      │  │   comprehensive summary  │      │
│    │   vectors           │  │ • Save to                │      │
│    │ • Enable MMR search │  │   contract_summaries/    │      │
│    │ • Save with joblib  │  │                          │      │
│    └─────────────────────┘  └──────────────────────────┘      │
└─────────────────┬───────────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────────┐
│ 5. DISPLAY RESULTS (2-Tab Interface)                           │
│    ┌─────────────────────────────────────────────────────┐    │
│    │ Tab 1: Full Analysis                                 │    │
│    │ - Dark blue styled text box (#0d1933 background)    │    │
│    │ - Complete Gemini analysis with all 7 sections      │    │
│    │ - Scrollable with max-height 600px                  │    │
│    └─────────────────────────────────────────────────────┘    │
│    ┌─────────────────────────────────────────────────────┐    │
│    │ Tab 2: Arabic Translation                            │    │
│    │ - "Translate to Arabic" button (if not exists)      │    │
│    │ - Uses translation_utils.translate_to_arabic()      │    │
│    │ - Chunks text (4500 chars) with rate limiting       │    │
│    │ - Saves as filename-arabic.txt                      │    │
│    │ - RTL display with Arabic styling                   │    │
│    └─────────────────────────────────────────────────────┘    │
└─────────────────┬───────────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────────┐
│ 6. Q&A CHATBOT (answer_contract_question)                      │
│    - User asks question about contract                          │
│    - Load RAG index from joblib file                           │
│    - TF-IDF search for relevant sections (top_k=3)            │
│    - Apply MMR diversification                                 │
│    - Generate answer using Gemini with context                 │
│    - Display in chat interface with history                    │
└─────────────────────────────────────────────────────────────────┘

Session State Management

Streamlit uses st.session_state to persist data across reruns:

# Main page session state
st.session_state.analysis_results      # List of all analyzed contracts
st.session_state.current_analysis      # Currently displayed analysis
st.session_state.rag_index            # RAG index for Q&A
st.session_state.rag_built            # Flag: RAG ready
st.session_state.chat_history         # Q&A conversation history

# Upload page session state (separate namespace)
st.session_state.upload_analysis_results
st.session_state.upload_current_analysis
st.session_state.upload_rag_index
st.session_state.upload_rag_built
st.session_state.upload_chat_history

File Storage Patterns

Upload → Parse → Analyze → Translate
   ↓        ↓        ↓          ↓
contract.pdf
         parsed/contract_20250126_120000.json
                 analysed_documents/contract_analysis_20250126_120030.txt
                                analysed_documents/contract_analysis_20250126_120030-arabic.txt

Loading Logic (main.py load_saved_analyses_simple()):

  • Scans data/analysed_documents/ directory
  • Loads only files matching pattern: *_analysis_*.txt
  • Skips Arabic translations (files ending with -arabic.txt)
  • Sorts by timestamp (newest first)
  • Displays in sidebar with "✅ Loaded Contract Analysis #X from disk"

📊 Analysis Metrics & Quality Assessment

ContractAnalysisMetrics Class (src/analysis_metrics.py)

Comprehensive metrics tracking for LLM performance and quality assessment.

1. Session Metadata

{
    'session_id': '20250126_120000',           # Timestamp-based unique ID
    'session_start': '2025-01-26T12:00:00',    # ISO timestamp
    'session_end': '2025-01-26T12:05:30',      # ISO timestamp
    'session_duration_minutes': 5.5            # Total elapsed time
}

2. Processing Volume Metrics

{
    'documents_processed': 10,        # Total attempted
    'successful_analyses': 9,         # Successfully completed
    'failed_analyses': 1,             # Failed processing
    'total_processing_time': 450.2    # Cumulative seconds
}

3. Performance Metrics

API Response Times:

  • Tracks all Gemini API call latencies
  • Calculates average, median, p95, p99
  • Identifies performance bottlenecks

Throughput:

throughput_docs_per_minute = successful_analyses / (total_time / 60)

Success Rate:

success_rate = (successful_analyses / documents_processed) * 100

4. Quality Assessment Metrics (0.0-1.0 scale)

Structure Compliance:

  • Checks if analysis follows 7-section format
  • Validates presence of all required sections
  • Calculation: sections_found / 7

Completeness Scores:

  • Measures analysis depth and thoroughness
  • Checks for minimum content length per section
  • Penalties for missing information

Extraction Accuracy:

  • Validates extracted entities (parties, dates, amounts)
  • Cross-references with source text
  • Pattern matching for legal terms

Consistency Scores:

  • Checks internal logical consistency
  • Validates date sequences (effective < expiration)
  • Cross-references between sections

5. Content Analysis Metrics

Document Sizes:

{
    'document_sizes': [2500, 3200, 1800],     # Character counts
    'avg_document_size': 2500,
    'median_document_size': 2500
}

Language Distribution:

{
    'languages_detected': {
        'en': 7,      # 7 English contracts
        'ar': 3       # 3 Arabic contracts
    }
}

Complexity Scores:

  • Based on document length, legal jargon density, clause count
  • Formula: (char_count / 1000) * jargon_ratio * clause_density

6. Per-Document Metrics

Each analyzed document gets detailed tracking:

{
    'filename': 'contract.pdf',
    'text_length': 2500,
    'language': 'en',
    'processing_time': 45.2,
    'api_calls': 3,
    'tokens_used': 5000,
    'errors': [],
    'analysis_quality': {
        'structure_compliance': 1.0,
        'completeness': 0.95,
        'extraction_accuracy': 0.92,
        'consistency': 0.98
    },
    'success': True,
    'timestamp': '2025-01-26T12:01:30'
}

Important Metrics Calculations

Average Response Time

avg_response_time = sum(api_response_times) / len(api_response_times)

Overall Quality Score

quality_score = (
    structure_compliance * 0.25 +
    completeness * 0.30 +
    extraction_accuracy * 0.25 +
    consistency * 0.20
)

Token Efficiency

tokens_per_char = total_tokens / total_chars_processed
efficiency_score = 1.0 / tokens_per_char  # Higher is better

Metrics Output

JSON Export:

metrics.save_metrics('data/metrics/session_20250126_120000.json')

Console Dashboard:

=== VERDICT Contract Analysis Session ===
Session ID: 20250126_120000
Duration: 5.5 minutes

Processing:
- Documents: 10 processed
- Success: 9 (90.0%)
- Failed: 1 (10.0%)

Performance:
- Avg Response Time: 4.2s
- Throughput: 1.8 docs/min
- Total Tokens: 50,000

Quality (avg):
- Structure: 0.95
- Completeness: 0.92
- Accuracy: 0.89
- Consistency: 0.96
- Overall: 0.93

🧪 RAG System Architecture

TfidfIndex Class (src/rag_model.py)

Core Components:

  1. TF-IDF Vectorization:

    • Converts text to numerical vectors
    • Term frequency × Inverse document frequency
    • Captures importance of terms in corpus
  2. MMR Diversification:

    • Maximal Marginal Relevance algorithm
    • Balances relevance with diversity
    • Prevents redundant results
    • Formula: MMR = λ × Similarity(q,d) - (1-λ) × max Similarity(d,R)
  3. LEGAL_PATTERNS:

    • Pre-defined regex patterns for contract entities
    • Extracts: parties, dates, amounts, obligations, termination, renewal
    • Pattern-based extractive answers
  4. Hybrid Search:

    • Extractive: Direct text extraction with patterns
    • Generative: Gemini-generated answers with context
    • Combined approach for best results

Key Methods:

build_index(documents):

  • Splits documents into searchable chunks
  • Builds TF-IDF matrix
  • Saves index with joblib for persistence

search(query, top_k=5, use_mmr=True):

  • Vectorizes query
  • Computes cosine similarity
  • Applies MMR if enabled
  • Returns ranked results with scores

ask(question, use_extractive=True, use_generative=True):

  • Searches relevant context
  • Tries extractive answer first (pattern matching)
  • Falls back to generative (Gemini) if needed
  • Returns answer with source citations

✨ Features

🎯 Core Features

  1. 📄 Secure Contract Upload & Processing
    Multi-page Streamlit app with dedicated Upload page and Results display

  2. 🔍 AI-Powered Comprehensive Analysis
    7-section structured analysis using Gemini 2.5 Pro Preview

  3. 💬 RAG-Powered Q&A
    TF-IDF + MMR search with Gemini-generated answers

  4. 🌐 Arabic Translation
    Full translation with chunking, rate limiting, and RTL display

  5. 📊 Contract Summaries
    Automatic summary generation for contract groups

  6. 💾 Persistent Storage
    File-based storage with automatic loading on app restart


� Requirements

System Requirements

  • Python 3.8+ — Main runtime
  • Internet connection — For Google Gemini API access
  • Google AI API Key — Free tier available at Google AI Studio

Installation Commands

# Install Python dependencies
pip install -r requirements.txt

# Verify installation
python --version
pip list | grep google-generativeai

�🚀 Quick Start

🏠 Local Development (Recommended for Privacy)

Step 1: Clone & Install

git clone https://github.com/thm-msror/Unsupervised-CLM.git
cd Unsupervised-CLM
pip install -r requirements.txt

Step 2: Get Google Gemini API Key

# 1. Go to Google AI Studio: https://makersuite.google.com/app/apikey
# 2. Sign in with Google account
# 3. Click "Create API Key" 
# 4. Copy the API key (starts with AIza...)

Step 3: Configure Environment

# Create a .env file
New-Item -ItemType File -Path .env -Force

# Add your Gemini API key:
echo "GEMINI_API_KEY=your_api_key_here" >> .env

💡 Free Tier: Google Gemini offers generous free limits: 15 req/min, 1500 req/day, 1M tokens/month

Step 4: Run Application

# Run from repository root
python -m streamlit run .\main.py --server.port 8501
# Open your browser at: http://localhost:8501

🔑 API Key Security: Your Gemini API key is used client-side and not stored permanently
📁 Structure Note: The main entry point is main.py at the repository root for simple deployment


☁️ Cloud Deployment (Demo Only)

For hackathon demos, deploy on Render using cloud-hosted LLMs:

Step 1: Connect Repository

  • Link your GitHub repository to Render

Step 2: Configure Environment

  • Add environment variables from .env to Render dashboard
  • run python -m dotenv
  • Include API keys, model endpoints, and configuration

Step 3: Set Start Command

streamlit run main.py --server.port $PORT --server.address 0.0.0.0

🔒 Security: Add your GEMINI_API_KEY as an environment variable in Render dashboard


🧪 Testing & Development

Running Tests

# Run all tests
pytest tests/

# Run specific test modules
pytest tests/test_data_extraction.py -v
pytest tests/test_risk_analysis.py -v

Development Setup

# Install development dependencies
pip install -r requirements.txt

🔐 AI Configuration & API Setup

🤖 Google Gemini 2.5 Flash

This application uses Google Gemini 2.5 Flash for contract analysis:

Feature Specification
Model gemini-2.5-flash (latest)
Context Window 32K tokens
Speed Optimized for low-latency
Free Tier 15 req/min, 1500 req/day, 1M tokens/month

🔑 API Key Setup

Get Your Free API Key:

  1. Visit Google AI Studio
  2. Sign in with your Google account
  3. Click "Create API Key"
  4. Copy the key (starts with AIza...)

Add to Environment:

# Create .env file
echo "GEMINI_API_KEY=AIza..." >> .env

🧪 Testing Gemini API

Quick Test Script:

# Run the test script
python tests/test_gemini_api.py

Manual Test with Python:

import google.generativeai as genai

# Configure API
genai.configure(api_key="your_api_key_here")
model = genai.GenerativeModel('gemini-2.5-flash')

# Test contract analysis
contract_text = "Agreement between ABC Corp and XYZ Ltd, effective Jan 2025, $150/hour"
prompt = f"Extract key information from this contract: {contract_text}"

response = model.generate_content(prompt)
print(response.text)

💡 Why Gemini for Hackathons?

  • 🆓 Free Tier: Generous limits for development and demos
  • ⚡ Fast: Optimized for quick responses
  • 🧠 Smart: Excellent at structured data extraction
  • 📄 Contract-Ready: Great performance on legal document analysis

📈 Real Performance Metrics (Latest Session)

Session Overview (October 25, 2025)

{
  "session_id": "20251025_232811",
  "session_duration_minutes": 4.75,
  "documents_processed": 10,
  "successful_analyses": 10,
  "failed_analyses": 0,
  "success_rate": 100%
}

Processing Performance

Metric Value
Total Processing Time 239.96 seconds (~4 minutes)
Average Response Time 23.71 seconds per contract
Throughput 2.5 documents/minute
Fastest Analysis 17.68 seconds (21.5K chars, English)
Slowest Analysis 33.11 seconds (26.6K chars, Arabic)

Document Distribution

Languages:

  • 🇸🇦 Arabic: 5 contracts (50%)
  • 🇺🇸 English: 5 contracts (50%)

Contract Types Detected:

  • Employment Agreements: 6 (60%)
  • Purchase Agreements: 2 (20%)
  • Software Development: 1 (10%)

Document Size Range:

  • Smallest: 13,034 characters
  • Largest: 35,482 characters
  • Average: 24,285 characters
  • Median: 24,922 characters

Quality Metrics (0.0 - 1.0 scale)

Quality Dimension Score Performance
Structure Compliance 1.00 ✅ Perfect (100%)
Completeness 1.00 ✅ Perfect (100%)
Extraction Accuracy 0.53 ⚠️ Good (53%)
Consistency 1.00 ✅ Perfect (100%)

Per-Document Quality Breakdown:

Document #1 (Arabic, 17K chars):  Structure ✅ | Complete ✅ | Accuracy 44% | Consistent ✅
Document #2 (Arabic, 28K chars):  Structure ✅ | Complete ✅ | Accuracy 56% | Consistent ✅
Document #3 (Arabic, 27K chars):  Structure ✅ | Complete ✅ | Accuracy 67% | Consistent ✅
Document #4 (Arabic, 13K chars):  Structure ✅ | Complete ✅ | Accuracy 33% | Consistent ✅
Document #5 (English, 22K chars): Structure ✅ | Complete ✅ | Accuracy 44% | Consistent ✅
Document #6 (English, 35K chars): Structure ✅ | Complete ✅ | Accuracy 67% | Consistent ✅
Document #7 (English, 33K chars): Structure ✅ | Complete ✅ | Accuracy 78% | Consistent ✅
Document #8 (English, 16K chars): Structure ✅ | Complete ✅ | Accuracy 33% | Consistent ✅
Document #9 (English, 28K chars): Structure ✅ | Complete ✅ | Accuracy 56% | Consistent ✅

Token Usage & API Performance

API Metric Value
Total Tokens Processed 68,837 tokens
Average Tokens/Document 7,648 tokens
Smallest Request 4,721 tokens
Largest Request 10,543 tokens
API Success Rate 100% (9/9 API calls)*

*One document processed without API call (cached or alternative method)

Complexity Analysis

Average Contract Complexity: 0.577 (Medium-High)

Complexity Distribution:

  • High (>0.7): 1 contract (Arabic MSP Agreement - 0.746)
  • Medium (0.5-0.7): 6 contracts
  • Low (<0.5): 3 contracts

Complexity Formula:

complexity = (char_count / 1000) * jargon_ratio * clause_density

API Response Time Analysis

Fastest: 17.677s  ━━━━━━━━━━━━━━━━━━░░░░░░░░
Average: 23.707s  ━━━━━━━━━━━━━━━━━━━━━━━━░░
Slowest: 33.105s  ━━━━━━━━━━━━━━━━━━━━━━━━━━

Distribution:
15-20s: ███ (3 contracts)
20-25s: ████ (4 contracts)
25-30s: ░ (0 contracts)
30-35s: ██ (2 contracts)

Key Findings

Strengths:

  • Perfect structure compliance - all analyses follow 7-section format
  • 100% success rate - no failures or errors
  • Consistent quality across English and Arabic contracts
  • Good throughput (2.5 docs/min) for comprehensive analysis

⚠️ Areas for Improvement:

  • Extraction accuracy varies (33% - 78% range)
  • Arabic contracts slightly slower (avg 25.5s vs 21.2s for English)
  • Some contracts have lower information density detection

🎯 Recommendations:

  1. Optimize extraction accuracy through better prompt engineering
  2. Implement caching for repeated contract patterns
  3. Consider parallel processing for batch operations
  4. Fine-tune information density detection algorithms

🏆 System Reliability

Based on the latest session metrics:

  • Uptime: 100% (no crashes or system failures)
  • API Reliability: 100% (all API calls successful)
  • Data Integrity: 100% (all files saved correctly)
  • Processing Consistency: Stable response times across session
  • Bilingual Support: Equal performance on English and Arabic contracts

About

AI-powered Contract Lifecycle Management platform that analyzes legal contracts in seconds, extracts key data, identifies risks, and provides actionable insights. Built for QDB AI Hackathon 2025.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors