# AI Agent Master Plan: Virtual Assistant & Task Scheduler

**Goal**: Build a deployable, modular AI agent that converges a conversational assistant with a reliable task scheduler.

**Initial Automations**:
1. **Intelligent Inbox Manager** – sorts, flags, and auto-replies to emails
2. **Quick Data-Analysis Engine** – processes datasets and generates LLM summaries

**Deliverables**: Cloud/on-prem agent, dashboard + CLI, setup docs, and hand-off package.

---

## Table of Contents

1. **Project Architecture & System Design**
2. **Technology Stack Selection & Rationale**
3. **Data Flow & Integration Patterns**
4. **Modularity & Extension Points**
5. **Development Phases & Roadmap**
6. **Cost & Infrastructure Planning**
7. **Security & Compliance Requirements**
8. **Acceptance Testing & Validation**
9. **Master TODO List & Scaffolding**
10. **Checklists (PR, Release, Hand-off)**


# Section 1: Project Architecture & System Design

## High-Level System Diagram

```
┌─────────────────────────────────────────────────────────────────┐
│                        USER INTERFACE                           │
├─────────────────────────────────────────────────────────────────┤
│   Web Dashboard (React/Next.js + TailwindCSS)                   │
│   + CLI Admin Tool (Python Typer)                               │
└──────────────────────────┬──────────────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────────┐
│                      API LAYER                                  │
├─────────────────────────────────────────────────────────────────┤
│   FastAPI Backend                                               │
│   ├── Auth/OAuth endpoints                                      │
│   ├── Conversation endpoint                                     │
│   ├── Rules & approval UI endpoints                             │
│   ├── Job launcher endpoint                                     │
│   ├── Status & logs endpoints                                   │
│   └── Webhook handlers                                          │
└──────────────────────────┬──────────────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────────┐
│                 CORE AI ORCHESTRATION                           │
├─────────────────────────────────────────────────────────────────┤
│   LangChain Orchestration                                       │
│   ├── Prompt management & chaining                              │
│   ├── Memory & conversation history                             │
│   ├── Retrieval-augmented generation (RAG)                      │
│   └── LLM provider abstraction                                  │
│                                                                 │
│   Vector Store (FAISS for MVP)                                  │
│   ├── Email embeddings + metadata                               │
│   ├── Context retrieval for rules                               │
│   └── Similarity-based routing                                  │
└────┬───────────────────────────────────────────────────────────┬┘
     │                                                            │
┌────▼──────────────────────────┐  ┌──────────────────────────────▼─┐
│  CONNECTORS & DATA SOURCES     │  │  TASK SCHEDULER & WORKER POOL │
├────────────────────────────────┤  ├──────────────────────────────┤
│ Email Connectors:              │  │  Celery + Redis              │
│  ├── IMAP/SMTP                 │  │  ├── Task queues             │
│  ├── Gmail API (OAuth)         │  │  ├── Worker pool             │
│  └── Outlook API (OAuth)       │  │  ├── Retry policies          │
│                                │  │  └── Status tracking         │
│ Data Connectors:               │  │                              │
│  ├── CSV/Excel upload          │  │  Background Jobs:            │
│  ├── Google Sheets             │  │  ├── Email polling           │
│  ├── S3                        │  │  ├── Email classification    │
│  ├── SQL (Postgres/MySQL)      │  │  ├── Auto-reply sending      │
│  └── BigQuery (optional)       │  │  ├── Data analysis           │
└────────────────────────────────┘  └──────────────────────────────┘
```

## Component Responsibilities

| Component | Responsibility |
|-----------|---|
| **Web Dashboard** | UI for inbox preview, template manager, job launcher, logs |
| **API Layer** | RESTful endpoints for conversation, job control, approvals |
| **LangChain Orchestration** | Chain & prompt management, RAG, LLM integration |
| **Vector Store (FAISS)** | Embeddings index for context retrieval, similarity search |
| **Email Connectors** | Poll/receive emails, handle OAuth, manage tokens |
| **Data Connectors** | Ingest CSV, Google Sheets, S3, SQL sources |
| **Task Scheduler (Celery)** | Distribute jobs to workers, manage retries & status |
| **Worker Pool** | Execute email tasks, data analysis, template substitution |
| **Postgres DB** | Users, rules, approvals, audit logs, config |
| **S3-Compatible Storage** | Uploaded datasets, generated reports, logs archive |

---


# Section 2: Technology Stack Selection & Rationale

## Recommended Stack

### Backend & Orchestration

| Component | Choice | Rationale | Alternatives |
|-----------|--------|-----------|---|
| **Language/Framework** | Python + FastAPI | Fast dev, strong async support, strong ecosystem | Flask, Django, Go |
| **LLM Orchestration** | LangChain | Excellent connectors, prompt management, chains | LlamaIndex, Semantic Kernel |
| **LLM Provider** | OpenAI (start) | Best dev DX, fastest to market | Anthropic, Local LLaMA/Alpaca |

**Cost Estimate**: OpenAI at $0.01–$0.10 per email classification; cache to save 50–90% on repeated prompts.

**Migration Path**: Start OpenAI → switch to local model (LLaMA 2) or Anthropic for cost/safety after MVP.

### Embeddings & Retrieval

| Component | Choice | Rationale | Alternatives |
|-----------|--------|-----------|---|
| **Vector DB** | FAISS (MVP) | Free, in-process, no external dependency | Pinecone, Weaviate, Milvus |
| **Scaling** | Pinecone | Managed, auto-scaling, serverless | Weaviate Cloud, Milvus |

**Cost**: FAISS = $0 (local), Pinecone = $1–$100/month (manage scale).

### Task Scheduling & Workers

| Component | Choice | Rationale | Alternatives |
|-----------|--------|-----------|---|
| **Scheduler** | Celery + Redis | Mature, cheap to host, battle-tested | Prefect, Temporal, APScheduler |
| **Message Broker** | Redis | Simple, fast, low ops overhead | RabbitMQ, Amazon SQS |

**Cost**: Redis (managed) = $15–$30/month; Celery = free.

### Dashboard & CLI

| Component | Choice | Rationale | Alternatives |
|-----------|--------|-----------|---|
| **Dashboard** | Next.js + TailwindCSS | Fast dev, great DX, SEO-friendly | React SPA, Vue, Svelte |
| **CLI** | Python Typer | Simple, integrates with backend | Click, Argparse, Clack |

**Cost**: $0 (open-source). Hosting = $10–$30/month for static + API.

### Storage & Database

| Component | Choice | Rationale | Alternatives |
|-----------|--------|-----------|---|
| **Relational DB** | PostgreSQL | ACID, JSON support, scalable | MySQL, MariaDB, SQLite |
| **Object Storage** | S3-compatible (DO Spaces) | Cheap, durable, easy S3 migration | AWS S3, GCS, Backblaze B2 |

**Cost**: Postgres (managed) = $15/month; DO Spaces = $5/month (250 GB).

### Deployment & Infrastructure

| Component | Choice | Rationale | Alternatives |
|-----------|--------|-----------|---|
| **Containerization** | Docker | Standard, portable, efficient | Podman, containerd |
| **MVP Orchestration** | Docker Compose | Simple, single-host, zero ops | Kubernetes, ECS, Nomad |
| **Hosting (MVP)** | DigitalOcean Droplet | $6–$12/month, full control, low ops | Render, Railway, AWS, Linode |
| **Scaling** | Kubernetes | Managed k8s on DO/AWS/GCP | Docker Compose scale (manual), ECS |

**Cost**: Droplet ($12) + Postgres ($15) + Redis ($15) + DO Spaces ($5) = ~$47/month MVP.

### Observability & Security

| Component | Choice | Rationale | Alternatives |
|-----------|--------|-----------|---|
| **Error Tracking** | Sentry (free tier) | Easy setup, good context, free tier sufficient | DataDog, Rollbar, Honeycomb |
| **Logging** | Postgres + logs → S3 | Queryable, cheap storage | ELK, Grafana Loki, CloudWatch |
| **Monitoring** | Prometheus + Grafana | Open-source, self-hosted, flexible | DataDog, New Relic, Honeycomb |
| **Secrets Mgmt** | AWS Secrets Manager or Vault | Encrypted, rotatable, audit trail | HashiCorp Vault, 1Password, Doppler |

**Cost**: Sentry free + self-hosted Prometheus/Grafana = $0 (self-hosted).

### CI/CD

| Component | Choice | Rationale | Alternatives |
|-----------|--------|-----------|---|
| **CI/CD** | GitHub Actions | Free for public repos, integrated, fast | GitLab CI, CircleCI, Jenkins |

**Cost**: $0 (free tier on GitHub).

---

## Cost Breakdown (Annual MVP)

| Item | Monthly | Annual |
|------|---------|--------|
| Compute (DO Droplet) | $12 | $144 |
| Database (Postgres managed) | $15 | $180 |
| Cache (Redis managed) | $15 | $180 |
| Object Storage (DO Spaces) | $5 | $60 |
| LLM calls (OpenAI, 10k/month @ avg $0.01) | $100 | $1,200 |
| Monitoring (Sentry free tier) | $0 | $0 |
| Domain + SSL (Let's Encrypt) | $0 | $0 |
| **TOTAL** | **~$147** | **~$1,764** |

*Can be reduced to ~$900/year by self-hosting Redis/Postgres on same droplet and using local FAISS.*

---


# Section 3: Data Flow & Integration Patterns

## Email Linking & OAuth Flow

```
User → Dashboard "Link Email" 
  → OpenID/OAuth consent screen
  → Backend exchanges code for token
  → Encrypt token with AES-KMS
  → Store in DB with email_id, provider, scope
  → Poll mailbox every 5 min (or push event)
  → Create "email_job" in Celery queue
```

### OAuth Token Encryption Example

```python
# Pseudocode: backend/app/security/encryption.py

from cryptography.fernet import Fernet
import os

class TokenEncryptor:
    def __init__(self, key: str = None):
        self.cipher = Fernet(key or os.getenv("ENCRYPTION_KEY"))
    
    def encrypt_token(self, token: str) -> str:
        return self.cipher.encrypt(token.encode()).decode()
    
    def decrypt_token(self, encrypted: str) -> str:
        return self.cipher.decrypt(encrypted.encode()).decode()

# In DB schema:
# CREATE TABLE user_email_accounts (
#     id SERIAL PRIMARY KEY,
#     user_id INT REFERENCES users(id),
#     provider VARCHAR(20),  -- 'gmail', 'outlook', 'imap'
#     email_address VARCHAR(255),
#     encrypted_token TEXT,  -- Store encrypted OAuth token
#     scopes TEXT,
#     created_at TIMESTAMP DEFAULT NOW()
# );
```

---

## Email Processing Pipeline

```
Email Job Entry
  ↓
[Fetch Email] (worker/tasks/email_processor.py)
  ├── Connect via provider adapter (Gmail API, Outlook API, IMAP)
  ├── Retrieve email metadata + body + attachments
  ↓
[Embed & Classify] (worker/tasks/classifier.py)
  ├── Tokenize subject + body
  ├── Generate embeddings (OpenAI embed-3-small)
  ├── Query FAISS for similar emails + rules
  ├── LangChain chain for classification
  ↓
[Route Decision] (worker/tasks/router.py)
  ├── Rule matches? → Apply label + flag
  ├── Confidence > threshold & approved template? → Queue auto-reply
  ├── Else → Flag to user, create follow-up task
  ↓
[Action Execution] (worker/tasks/action_executor.py)
  ├── Send auto-reply via SMTP/Gmail API
  ├── Mark as processed
  ├── Log to audit table
```

### Email Processor Example

```python
# Pseudocode: backend/worker/tasks/email_processor.py

from celery import shared_task
from backend.connectors.email import EmailConnectorFactory
from backend.models import EmailJob, AuditLog

@shared_task(bind=True, max_retries=3)
def process_email(self, email_job_id: int):
    job = EmailJob.query.get(email_job_id)
    user = job.user
    
    try:
        # Get encrypted token, decrypt it
        account = user.email_accounts[0]
        token = decrypt_token(account.encrypted_token)
        
        # Get provider adapter
        connector = EmailConnectorFactory.create(
            provider=account.provider,
            token=token
        )
        
        # Fetch email
        raw_email = connector.fetch(job.message_id)
        
        # Classify
        embedding = embed_text(raw_email['subject'] + ' ' + raw_email['body'])
        rules = find_matching_rules(embedding, user_id=user.id)
        
        # Route
        if rules:
            for rule in rules:
                apply_label(job, rule['label'])
                if rule['action'] == 'auto_reply' and user.approvals[rule['id']]:
                    queue_auto_reply_task(job.id, rule['template_id'])
        
        job.status = 'processed'
        job.save()
        
        AuditLog.create(
            user_id=user.id,
            action='email_processed',
            resource_id=job.id,
            details={'rules_applied': len(rules)}
        )
    except Exception as exc:
        self.retry(exc=exc, countdown=60)
```

---

## Auto-Reply Workflow with Approval Gate

```
User uploads template
  ↓
Dashboard: Template editor (subject, body, rules)
  ↓
Template stored in DB (versioned, with approval flag)
  ↓
User toggles approval for auto-send
  ↓
When rule + email match (confidence > threshold):
  ├── Check approval flag
  ├── Check daily send limit
  ├── Check confidence score vs. threshold
  ├── If all pass → generate personalized reply (LLM substitution)
  ├── Else → flag to user for manual review
  ↓
Send reply via SMTP/Gmail API
  ↓
Log action (who approved, when, confidence, email_id, template_id)
```

### Pre-Approved Reply Example

```python
# Pseudocode: backend/worker/tasks/auto_reply.py

from backend.models import Template, AuditLog
from backend.security.encryption import encrypt_audit_data

@shared_task
def send_auto_reply(email_job_id: int, template_id: int):
    job = EmailJob.query.get(email_job_id)
    template = Template.query.get(template_id)
    user = job.user
    
    # Safety checks
    assert template.approved_for_auto_send, "Template not approved"
    
    daily_sent = AuditLog.query.filter(
        AuditLog.user_id == user.id,
        AuditLog.action == 'auto_reply_sent',
        AuditLog.created_at >= datetime.now() - timedelta(days=1)
    ).count()
    
    assert daily_sent < user.settings.max_daily_replies, "Daily limit reached"
    assert job.classifier_confidence > 0.85, "Confidence too low"
    
    # Personalize reply (LLM substitution)
    reply_body = substitute_template(
        template.body,
        context={
            'sender_name': job.email_from_name,
            'subject': job.email_subject,
            'date': job.email_date
        }
    )
    
    # Send via Gmail API or SMTP
    connector = EmailConnectorFactory.create(
        provider=user.email_accounts[0].provider,
        token=decrypt_token(user.email_accounts[0].encrypted_token)
    )
    
    message_id = connector.send_reply(
        to=job.email_from,
        subject=f"Re: {job.email_subject}",
        body=reply_body
    )
    
    # Audit
    AuditLog.create(
        user_id=user.id,
        action='auto_reply_sent',
        resource_id=email_job_id,
        details=encrypt_audit_data({
            'template_id': template_id,
            'confidence_score': job.classifier_confidence,
            'message_id': message_id,
            'recipient': job.email_from
        })
    )
```

---

## Data-Analysis Job Pipeline

```
User uploads CSV or points to data source
  ↓
Dashboard: Data preview + analysis options
  ↓
User selects analysis type (summary stats, trend, forecast)
  ↓
Create "DataAnalysisJob" in Celery queue
  ↓
[Worker] Fetch data → Validate schema → Run analysis script
  ↓
[LLM] Summarize findings + insights
  ↓
[Storage] Save report (PDF/HTML) to S3
  ↓
Return summary + download link to user
  ↓
Log job metadata (source, record count, analysis time, cost)
```

### Data Analysis Job Example

```python
# Pseudocode: backend/worker/tasks/data_analysis.py

from celery import shared_task
from backend.connectors.data import DataConnectorFactory
import pandas as pd
from datetime import datetime

@shared_task
def run_data_analysis_job(job_id: int):
    job = DataAnalysisJob.query.get(job_id)
    user = job.user
    
    try:
        job.status = 'running'
        job.save()
        
        # Fetch data
        connector = DataConnectorFactory.create(job.source_type)
        df = connector.fetch(job.source_path)
        
        # Basic analysis
        analysis_result = {
            'record_count': len(df),
            'columns': list(df.columns),
            'dtypes': df.dtypes.to_dict(),
            'summary_stats': df.describe().to_dict(),
            'null_counts': df.isnull().sum().to_dict(),
        }
        
        # Generate LLM summary
        summary_prompt = f"""
        Analyze this dataset:
        - Records: {analysis_result['record_count']}
        - Columns: {', '.join(analysis_result['columns'])}
        - Summary stats: {json.dumps(analysis_result['summary_stats'], default=str)}
        
        Provide 2-3 key insights in plain English.
        """
        
        llm_summary = call_llm(summary_prompt)
        
        # Save report to S3
        report_html = f"""
        <html>
            <h1>Data Analysis Report</h1>
            <p>Generated: {datetime.now()}</p>
            <h2>Summary</h2>
            <p>{llm_summary}</p>
            <h2>Statistics</h2>
            <pre>{json.dumps(analysis_result, indent=2, default=str)}</pre>
        </html>
        """
        
        report_key = f"reports/{user.id}/{job.id}/report.html"
        s3_client.put_object(Bucket='reports-bucket', Key=report_key, Body=report_html)
        
        job.status = 'completed'
        job.result_summary = llm_summary
        job.report_url = f"s3://reports-bucket/{report_key}"
        job.save()
        
        # Audit
        AuditLog.create(
            user_id=user.id,
            action='data_analysis_completed',
            resource_id=job_id,
            details={'record_count': len(df), 'summary': llm_summary}
        )
        
    except Exception as e:
        job.status = 'failed'
        job.error_message = str(e)
        job.save()
        raise
```

---


# Section 4: Modularity & Extension Points

## Connector Interface Pattern

The system uses an **adapter pattern** to allow easy addition of new email and data connectors.

### Base Connector Classes

```python
# Pseudocode: backend/connectors/base.py

from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any, Dict, List

@dataclass
class Email:
    message_id: str
    from_address: str
    subject: str
    body: str
    received_at: datetime
    attachments: List[Dict[str, Any]]

class BaseEmailConnector(ABC):
    """All email adapters implement this interface."""
    
    @abstractmethod
    def authenticate(self, token: str) -> bool:
        """Validate token and establish connection."""
        pass
    
    @abstractmethod
    def fetch_emails(self, limit: int = 10) -> List[Email]:
        """Fetch unread emails."""
        pass
    
    @abstractmethod
    def send_email(self, to: str, subject: str, body: str) -> str:
        """Send an email, return message_id."""
        pass
    
    @abstractmethod
    def mark_as_read(self, message_id: str) -> bool:
        """Mark email as read."""
        pass
    
    @abstractmethod
    def apply_label(self, message_id: str, label: str) -> bool:
        """Apply a label/folder to email."""
        pass

@dataclass
class DataSource:
    source_type: str
    df: pd.DataFrame
    schema: Dict[str, str]
    record_count: int

class BaseDataConnector(ABC):
    """All data source adapters implement this interface."""
    
    @abstractmethod
    def validate_credentials(self, credentials: Dict) -> bool:
        """Check if credentials are valid."""
        pass
    
    @abstractmethod
    def fetch(self, source_path: str, **kwargs) -> pd.DataFrame:
        """Fetch data from source."""
        pass
    
    @abstractmethod
    def schema(self, source_path: str) -> Dict[str, str]:
        """Return column names and types."""
        pass
```

### Concrete Adapter Examples

```python
# Pseudocode: backend/connectors/email_adapters.py

class GmailConnector(BaseEmailConnector):
    """Gmail API adapter."""
    
    def __init__(self):
        self.service = None
    
    def authenticate(self, token: str) -> bool:
        from google.oauth2.credentials import Credentials
        creds = Credentials.from_authorized_user_info(json.loads(token))
        self.service = build('gmail', 'v1', credentials=creds)
        return True
    
    def fetch_emails(self, limit: int = 10) -> List[Email]:
        results = self.service.users().messages().list(
            userId='me', q='is:unread', maxResults=limit
        ).execute()
        emails = []
        for msg in results.get('messages', []):
            full = self.service.users().messages().get(
                userId='me', id=msg['id'], format='full'
            ).execute()
            headers = {h['name']: h['value'] for h in full['payload']['headers']}
            body_data = full['payload'].get('body', {}).get('data', '')
            emails.append(Email(
                message_id=msg['id'],
                from_address=headers.get('From', ''),
                subject=headers.get('Subject', ''),
                body=base64.b64decode(body_data).decode('utf-8') if body_data else '',
                received_at=datetime.fromtimestamp(int(full['internalDate']) / 1000),
                attachments=[]
            ))
        return emails

class OutlookConnector(BaseEmailConnector):
    """Outlook/Office365 API adapter."""
    
    def authenticate(self, token: str) -> bool:
        self.token = json.loads(token)
        self.graph_client = build('outlook', 'v1.0', 
                                   http_auth=BearerAuth(self.token['access_token']))
        return True
    
    def fetch_emails(self, limit: int = 10) -> List[Email]:
        messages = self.graph_client.me.messages.request().top(limit).get()
        emails = []
        for msg in messages['value']:
            emails.append(Email(
                message_id=msg['id'],
                from_address=msg['from']['emailAddress']['address'],
                subject=msg['subject'],
                body=msg['bodyPreview'] or '',
                received_at=datetime.fromisoformat(msg['receivedDateTime']),
                attachments=[]
            ))
        return emails

class IMAPConnector(BaseEmailConnector):
    """Generic IMAP adapter for any provider."""
    
    def authenticate(self, token: str) -> bool:
        creds = json.loads(token)
        self.conn = imaplib.IMAP4_SSL(creds['imap_server'])
        self.conn.login(creds['email'], creds['password'])
        return True
    
    def fetch_emails(self, limit: int = 10) -> List[Email]:
        self.conn.select('INBOX')
        _, data = self.conn.search(None, 'UNSEEN')
        email_ids = data[0].split()[-limit:]
        emails = []
        for email_id in email_ids:
            _, msg_data = self.conn.fetch(email_id, '(RFC822)')
            msg = email.message_from_bytes(msg_data[0][1])
            emails.append(Email(
                message_id=email_id.decode(),
                from_address=msg.get('From', ''),
                subject=msg.get('Subject', ''),
                body=msg.get_payload(decode=True).decode('utf-8'),
                received_at=parsedate_to_datetime(msg.get('Date')),
                attachments=[]
            ))
        return emails

# Data connectors...
class CSVConnector(BaseDataConnector):
    """CSV file connector (local or URL)."""
    
    def fetch(self, source_path: str, **kwargs) -> pd.DataFrame:
        return pd.read_csv(source_path)

class GoogleSheetsConnector(BaseDataConnector):
    """Google Sheets connector."""
    
    def fetch(self, source_path: str, **kwargs) -> pd.DataFrame:
        # Parse source_path as spreadsheet_id/sheet_name
        service = build('sheets', 'v4', credentials=kwargs.get('creds'))
        result = service.spreadsheets().values().get(range=source_path).execute()
        values = result.get('values', [])
        return pd.DataFrame(values[1:], columns=values[0])

class S3Connector(BaseDataConnector):
    """AWS S3 connector."""
    
    def fetch(self, source_path: str, **kwargs) -> pd.DataFrame:
        # source_path = "s3://bucket/key.csv"
        return pd.read_csv(source_path)

class PostgresConnector(BaseDataConnector):
    """SQL database connector."""
    
    def fetch(self, source_path: str, **kwargs) -> pd.DataFrame:
        # source_path = "SELECT * FROM table"
        engine = create_engine(kwargs.get('connection_string'))
        return pd.read_sql(source_path, engine)
```

---

## Connector Factory & Registration

```python
# Pseudocode: backend/connectors/factory.py

class EmailConnectorFactory:
    _adapters = {}
    
    @classmethod
    def register(cls, provider: str, adapter_class):
        cls._adapters[provider] = adapter_class
    
    @classmethod
    def create(cls, provider: str, **kwargs) -> BaseEmailConnector:
        adapter = cls._adapters.get(provider.lower())
        if not adapter:
            raise ValueError(f"Unknown email provider: {provider}")
        return adapter(**kwargs)

# Register adapters
EmailConnectorFactory.register('gmail', GmailConnector)
EmailConnectorFactory.register('outlook', OutlookConnector)
EmailConnectorFactory.register('imap', IMAPConnector)

# Usage in task:
# connector = EmailConnectorFactory.create('gmail', token=user_token)
# emails = connector.fetch_emails()
```

---

## Task Definition Serialization

Tasks are defined as JSON metadata + handlers, making them pluggable.

```python
# Pseudocode: backend/models/task.py

from enum import Enum

class TaskType(str, Enum):
    EMAIL_PROCESS = "email_process"
    AUTO_REPLY = "auto_reply"
    DATA_ANALYSIS = "data_analysis"
    FOLLOW_UP = "follow_up"

class Task(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
    task_type = db.Column(db.String(50))
    status = db.Column(db.String(20))  # 'pending', 'running', 'completed', 'failed'
    celery_task_id = db.Column(db.String(255))
    
    # JSON serialization of parameters
    params = db.Column(db.JSON)  # e.g., {'email_id': 123, 'rule_id': 5}
    result = db.Column(db.JSON)  # Output data
    error_message = db.Column(db.Text)
    
    created_at = db.Column(db.DateTime, default=datetime.utcnow)
    updated_at = db.Column(db.DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)

# Example:
# task = Task(
#     user_id=1,
#     task_type='data_analysis',
#     params={'dataset_url': 's3://...', 'analysis_type': 'summary'},
#     status='pending'
# )
```

---

## Rules Engine (YAML/JSON DSL)

Users define rules in YAML; the engine evaluates them against emails.

```yaml
# Example rule (stored in DB as JSON or YAML file)
rules:
  - id: urgent_support_alerts
    name: "Flag urgent support emails"
    trigger: "email_received"
    conditions:
      - field: "subject"
        operator: "contains"
        value: ["urgent", "critical", "error"]
      - field: "from_domain"
        operator: "in"
        value: ["support@company.com"]
    actions:
      - type: "apply_label"
        label: "urgent"
      - type: "flag"
        priority: "high"
  
  - id: auto_reply_newsletters
    name: "Auto-reply to newsletters"
    trigger: "email_received"
    conditions:
      - field: "subject"
        operator: "matches_regex"
        value: "newsletter|digest|announcement"
      - field: "similarity"  # Vector DB similarity
        operator: "gt"
        value: 0.8
    actions:
      - type: "auto_reply"
        template_id: 5
        requires_approval: true  # Must be approved in UI
```

### Rules Engine Evaluator

```python
# Pseudocode: backend/engine/rules_engine.py

import yaml
import re
from typing import List, Dict, Any

class RulesEngine:
    def __init__(self, vector_store=None):
        self.vector_store = vector_store
    
    def evaluate_rules(self, email: Email, user_id: int) -> List[Dict]:
        """Evaluate all user rules against an email. Return matching rules."""
        rules = Rule.query.filter_by(user_id=user_id, enabled=True).all()
        matched_rules = []
        
        for rule in rules:
            rule_config = yaml.safe_load(rule.config)
            if self._evaluate_conditions(email, rule_config['conditions']):
                matched_rules.append(rule_config)
        
        return matched_rules
    
    def _evaluate_conditions(self, email: Email, conditions: List[Dict]) -> bool:
        """All conditions must pass (AND logic)."""
        for condition in conditions:
            if not self._evaluate_condition(email, condition):
                return False
        return True
    
    def _evaluate_condition(self, email: Email, condition: Dict) -> bool:
        field = condition.get('field')
        operator = condition.get('operator')
        value = condition.get('value')
        
        if field == 'subject':
            field_value = email.subject.lower()
        elif field == 'body':
            field_value = email.body.lower()
        elif field == 'from_domain':
            field_value = email.from_address.split('@')[1].lower()
        elif field == 'similarity':
            # Query vector store
            embedding = embed_text(email.subject + ' ' + email.body)
            similarity = self.vector_store.query_similarity(embedding, top_k=1)[0]['score']
            field_value = similarity
        else:
            return False
        
        if operator == 'contains':
            return any(v in field_value for v in value)
        elif operator == 'matches_regex':
            return any(re.search(v, field_value) for v in value)
        elif operator == 'in':
            return field_value in value
        elif operator == 'gt':
            return float(field_value) > float(value)
        else:
            return False
```

---

## How to Add a New Automation

### Example: Add a "Invoice Categorizer" Task

1. **Define the new connector** (if needed):
   ```python
   class InvoiceSourceConnector(BaseDataConnector):
       def fetch(self, source_path: str, **kwargs) -> pd.DataFrame:
           # Fetch invoices from accounting software API
           pass
   ```

2. **Create a Celery task** in `backend/worker/tasks/invoice_categorizer.py`:
   ```python
   @shared_task
   def categorize_invoices(job_id: int):
       job = DataAnalysisJob.query.get(job_id)
       invoices_df = fetch_invoices()
       categories = classify_invoices(invoices_df)
       save_results(job_id, categories)
   ```

3. **Add a task type** to `TaskType` enum.

4. **Create API endpoint** in `backend/api/routes/jobs.py`:
   ```python
   @router.post("/jobs/categorize-invoices")
   def launch_invoice_categorizer(request: InvoiceCategorizerRequest):
       job = DataAnalysisJob(task_type='invoice_categorizer', params=request.dict())
       job.save()
       categorize_invoices.delay(job.id)
       return {'job_id': job.id}
   ```

5. **Add dashboard page** to trigger and monitor the job.

**No changes needed to core orchestration!**

---


# Section 5: Development Phases & Roadmap

## Phase A: MVP Scoping & Repo Init

**Goal**: Set up skeleton repo, basic auth, email linking, and simplified inbox processor.

### Tasks:

1. **Init repo + Docker Compose + virtual environment**
   - Deliverables: `Dockerfile`, `docker-compose.yml`, `.env.example`, `pyproject.toml` or `requirements.txt`
   - Key files: `backend/main.py`, `worker/celery_app.py`, `frontend/pages/index.tsx`
   - Success: `docker-compose up` starts all services without errors

2. **Implement user auth (OAuth2 + session)**
   - Deliverables: `backend/api/auth.py`, OAuth endpoints
   - Success: Can register, login, receive session token

3. **Implement email account linking (OAuth flow)**
   - Deliverables: Gmail/Outlook OAuth endpoints, token encryption, DB schema
   - Success: Can link Gmail account, token stored encrypted in DB

4. **Fetch & classify inbox (no auto-reply)**
   - Deliverables: Email connector, classifier chain (LangChain), FAISS index
   - Success: Can see 10 unread emails labeled with category (work, personal, spam)

5. **CSV upload & basic analysis**
   - Deliverables: Data connector, basic stats worker task, report generation
   - Success: Upload CSV → see summary stats + LLM-generated insights

### Expected Output:

```
project-root/
├── backend/
│   ├── main.py (FastAPI app)
│   ├── api/
│   │   ├── auth.py
│   │   ├── email.py
│   │   └── jobs.py
│   ├── connectors/
│   │   ├── base.py
│   │   └── email_adapters.py
│   ├── models.py (Postgres schema)
│   └── security/
│       └── encryption.py
├── worker/
│   ├── celery_app.py
│   └── tasks/
│       ├── email_processor.py
│       └── data_analysis.py
├── frontend/
│   ├── pages/
│   │   ├── index.tsx
│   │   ├── inbox.tsx
│   │   └── analysis.tsx
│   └── components/
├── Dockerfile
├── docker-compose.yml
├── .env.example
├── README.md
├── requirements.txt
└── pyproject.toml
```

---

## Phase B: Core Features & Dashboard

**Goal**: Add scheduler, pre-approved replies, vector store, and feature-complete dashboard.

### Tasks:

1. **Implement Celery task scheduling + retry logic**
   - Deliverables: Task models, Celery configuration, task lifecycle
   - Success: Tasks retried 3 times on failure, visible in logs

2. **Implement pre-approved reply workflow**
   - Deliverables: Template manager API, approval UI, auto-reply execution with safety checks
   - Success: Approve template, send auto-reply on rule match, verify delivery

3. **Add FAISS vector store + RAG retriever**
   - Deliverables: FAISS index builder, retrieval chain, integration with classifier
   - Success: Query similar emails + rules, retrieve context for LLM

4. **Build dashboard pages**
   - Deliverables: Inbox preview, template manager, job launcher, logs viewer
   - Success: View emails, approve templates, launch jobs, see execution logs

5. **Rules engine (YAML/JSON DSL)**
   - Deliverables: Rule model, evaluator, UI for rule builder
   - Success: Create rule, email matches rule, action executed

### Expected Output:

- Email classification confidence scores visible in UI
- Pre-approved replies sent with audit trail
- Dashboard shows job status in real-time
- Rules engine supports regex + similarity matching

---

## Phase C: Hardening & Deployment

**Goal**: Add security, testing, monitoring, and Docker Compose deployment.

### Tasks:

1. **Token encryption + secrets management**
   - Deliverables: AES-KMS encryption utility, .env-based config, Vault/AWS Secrets integration stub
   - Success: Tokens encrypted at rest, never logged

2. **Audit logging for all critical actions**
   - Deliverables: Audit table schema, middleware to log approvals + auto-replies
   - Success: Audit log shows who approved template, when, confidence score, recipient

3. **Unit & integration tests**
   - Deliverables: `tests/test_email_processor.py`, `tests/test_rules_engine.py`, etc.
   - Success: 80%+ code coverage, all tests pass

4. **Observability (Sentry + basic metrics)**
   - Deliverables: Sentry integration, Prometheus metrics exports
   - Success: Errors logged to Sentry, job duration tracked

5. **Docker Compose + CI pipeline**
   - Deliverables: Fully working `docker-compose.yml`, GitHub Actions workflow
   - Success: Push to main → build + test + push Docker image

6. **Acceptance test harness**
   - Deliverables: Script to test email linking, batch processing, scheduling, reporting
   - Success: Run script end-to-end, all checks pass

### Expected Output:

- Production-ready Docker images
- CI/CD pipeline with automated tests
- Full audit trail of actions
- Monitoring + error tracking set up

---

## Phase D: Optional Scaling & Enterprise

**Goal**: Move to managed services, add k8s, RBAC, advanced connectors.

### Tasks:

1. **Migrate FAISS → Pinecone**
   - Deliverables: Pinecone client, embedding sync job
   - Success: Queries work against Pinecone API, latency < 500ms

2. **Migrate Celery → Temporal** (optional)
   - Deliverables: Temporal workflow definitions, migration guide
   - Success: Jobs execute on Temporal, durable execution guaranteed

3. **Setup Kubernetes**
   - Deliverables: Helm charts, namespaces, resource limits
   - Success: Deploy on k8s cluster, auto-scaling works

4. **Role-based access control (RBAC)**
   - Deliverables: Role model, permission checks in API
   - Success: Admin, approver, user roles with appropriate permissions

5. **Advanced connectors**
   - Deliverables: BigQuery, Salesforce, Slack adapters
   - Success: Connect to multiple data sources seamlessly

---

## Integration Points Between Phases

| Phase | Depends On | Enables |
|-------|-----------|---------|
| A | – | B, testing infrastructure |
| B | A | C, advanced features |
| C | A, B | D, production deployment |
| D | C | Enterprise scaling |

---


# Section 6: Cost & Infrastructure Planning

## Infrastructure Decision Matrix

### Compute Options

| Option | Cost/Month | Pros | Cons | Recommendation |
|--------|-----------|------|------|---|
| **DigitalOcean Droplet (6GB) + managed DB** | $12 + $15 | Full control, cheap, straightforward | Manual scaling, limited SLA | **Best for MVP** |
| **Render.com (Free tier)** | $0–$30 | Easy deploy, auto-scaling included | Cold starts, limited free tier | Good for testing |
| **AWS ECS + RDS** | $30–$100 | Managed, auto-scaling, reliability | Complex setup, multi-service | Scale up here |
| **Railway** | $5–$50 | Simple, deploy with Git, managed | Limited customization | Good alternative to Render |
| **Kubernetes (self-hosted)** | $50–$200 | Full control, good scaling | High ops overhead | After MVP |

**Recommendation**: Start with **DigitalOcean Droplet** (all services on $12 droplet) → migrate to **Render** or **Railway** when needing auto-scaling.

---

### Storage & Database Options

| Component | MVP Solution | Cost/Month | Scale-Up Solution | Cost/Month |
|-----------|---|---|---|---|
| **Database** | Postgres (managed DO) | $15 | AWS RDS Multi-AZ | $100+ |
| **Object Storage** | DO Spaces (250 GB) | $5 | AWS S3 (pay-per-use) | $1–$50 |
| **Vector DB** | FAISS (in-memory) | $0 | Pinecone (managed) | $20–$100 |
| **Cache/Queue** | Redis (managed DO) | $15 | Redis Cluster (AWS) | $40–$100 |

**Total MVP Storage Cost**: ~$35/month

---

### LLM & AI Cost Estimates

#### OpenAI Pricing (current)

| Model | Input | Output | Use Case |
|-------|-------|--------|----------|
| GPT-3.5 Turbo | $0.0005/1K | $0.0015/1K | Email classification, simple summaries |
| GPT-4 Turbo | $0.01/1K | $0.03/1K | Complex analysis, high accuracy |
| Embeddings (3-small) | $0.02/1M | – | Vector embeddings for FAISS |

#### Cost Projections

**Scenario 1: 100 emails/day + 5 data analysis jobs/day**

- Email classification: 100 × 150 tokens × $0.0005 = $0.075/day = $2.25/month
- Embeddings: 100 × 8 tokens × $0.00002 = $0.016/day = $0.48/month
- Data analysis summary: 5 × 500 tokens × $0.001 = $0.0025/day = $0.075/month
- **Total LLM**: ~$2.80/month

**Scenario 2: 1,000 emails/day + 50 data analysis jobs/day (10x scale)**

- **Total LLM**: ~$28/month

**Cost Savings**:
- **Caching**: Prompt caching saves 50–90% on repeated queries
- **Batch processing**: Run off-peak for 50% discount with OpenAI Batch API
- **Local LLM**: Switch to LLaMA 2 on-prem after MVP ($0 per call, +$200 infra)

---

## Monthly Cost Breakdown

### MVP (100 emails/day)

| Component | Cost |
|-----------|------|
| Compute (Droplet) | $12 |
| Database (Postgres) | $15 |
| Cache (Redis) | $15 |
| Storage (DO Spaces) | $5 |
| LLM (OpenAI) | $3 |
| Monitoring (Sentry free) | $0 |
| Domain + SSL | $0 |
| **TOTAL** | **$50/month** |

### Scale (1,000 emails/day, auto-scaling)

| Component | Cost |
|-----------|------|
| Compute (Render auto-scale) | $50 |
| Database (AWS RDS) | $40 |
| Cache (AWS ElastiCache) | $50 |
| Storage (S3) | $20 |
| LLM (OpenAI) | $28 |
| Monitoring (Sentry + Datadog) | $50 |
| Kubernetes (optional) | $0 (if managed) |
| **TOTAL** | **$238/month** |

---

## Infrastructure Migration Strategy

```
Month 1-2: MVP (DigitalOcean)
  ├── Backend + Frontend: Droplet ($12)
  ├── Database: DO Postgres ($15)
  ├── Cache: DO Redis ($15)
  └── Object storage: DO Spaces ($5)

Month 3-6: Scale Compute (Render)
  ├── Backend + Frontend: Render auto-scale ($30–$50)
  ├── Database: AWS RDS Managed ($40)
  ├── Cache: AWS ElastiCache ($50)
  └── Object storage: AWS S3 ($20)

Month 6+: Enterprise (Kubernetes)
  ├── Backend + Frontend: GKE / EKS cluster ($100+)
  ├── Database: AWS RDS Multi-AZ ($100+)
  ├── Cache: Redis Cluster ($50+)
  ├── Vector DB: Pinecone Managed ($50+)
  └── Task Scheduler: Temporal Cloud ($100+)
```

---

## Cost Optimization Strategies

### Immediate (MVP Phase)

| Strategy | Savings | Effort |
|----------|---------|--------|
| Use FAISS instead of Pinecone | $20–$100/month | Low |
| Cache LLM outputs | 50–90% on tokens | Medium |
| Batch email processing | 20% on compute | Low |
| Self-host Postgres + Redis on Droplet | $30/month | Medium |
| OpenAI Batch API for off-peak | 50% discount | Medium |

### Medium-term (Phase B–C)

| Strategy | Savings | Effort |
|----------|---------|--------|
| Move to local LLaMA 2 | $28+/month → $0 | High |
| Compress email storage | 30% storage savings | Low |
| Use Celery task grouping | 20% compute | Medium |
| Archive old logs to S3 Glacier | 80% storage cost | Low |

### Long-term (Phase D+)

| Strategy | Savings | Effort |
|----------|---------|--------|
| Temporal (durable execution) | Eliminate retries, save compute | High |
| Regional Pinecone | Save egress costs | Low |
| Reserved instances (AWS/GCP) | 40–70% compute discount | High |
| Data residency optimization | Reduce data transfer | Medium |

---

## ROI & Pricing Model

### Revenue Model Options

1. **Per-user subscription**: $10–$50/month per user
   - MVP cost per user: $0.05/month
   - Gross margin: 95%+

2. **Per-action billing**: $0.01–$0.10 per email processed or analysis
   - Based on LLM token usage
   - Shared infrastructure costs

3. **Enterprise tier**: $500–$5,000/month
   - Dedicated support, advanced connectors, on-prem option

### Break-even Analysis

**Assume SaaS Model: $20/month per user**

- Monthly cost per user: $0.05
- Gross margin per user: 99.75%
- Break-even: 3 users (covers ~$50 MVP infra)
- 10 users: $200 revenue vs $50 cost = $150 profit

---


# Section 7: Security & Compliance Requirements

## OAuth2 Email Account Linking

### Flow Diagram

```
User clicks "Link Email" → Backend generates auth URL
  ↓
User approves scopes on provider (Gmail/Outlook) → Gets authorization code
  ↓
Backend exchanges code for access token + refresh token
  ↓
Encrypt tokens with AES-256 + KMS key ID
  ↓
Store in DB: user_id, provider, encrypted_token, scopes, created_at
  ↓
User sees "Account linked successfully"
```

### Implementation

```python
# Pseudocode: backend/api/auth.py

@router.get("/email/oauth-callback/{provider}")
def email_oauth_callback(provider: str, code: str, state: str):
    """Handle OAuth callback from email provider."""
    
    # Verify state token (CSRF protection)
    stored_state = cache.get(f"oauth_state:{state}")
    if not stored_state:
        raise HTTPException(status_code=400, detail="Invalid state")
    
    # Exchange code for token
    if provider.lower() == 'gmail':
        token_data = request_gmail_token(code)
    elif provider.lower() == 'outlook':
        token_data = request_outlook_token(code)
    else:
        raise HTTPException(status_code=400, detail="Unknown provider")
    
    # Encrypt token
    user = get_current_user()
    encrypted_token = encrypt_token(json.dumps(token_data))
    
    # Store in DB
    account = UserEmailAccount(
        user_id=user.id,
        provider=provider,
        email_address=extract_email_from_token(token_data),
        encrypted_token=encrypted_token,
        scopes=json.dumps(token_data.get('scope', '').split()),
        refresh_token_encrypted=encrypt_token(token_data.get('refresh_token', '')),
    )
    db.session.add(account)
    db.session.commit()
    
    # Log action
    AuditLog.create(
        user_id=user.id,
        action='email_account_linked',
        resource_id=account.id,
        details={'provider': provider, 'email': account.email_address}
    )
    
    return {'status': 'success', 'email': account.email_address}
```

---

## Token Encryption at Rest

```python
# Pseudocode: backend/security/encryption.py

from cryptography.fernet import Fernet
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2
from cryptography.hazmat.backends import default_backend
import os

class TokenEncryptor:
    """Encrypt/decrypt OAuth tokens with Fernet (AES-128)."""
    
    def __init__(self, master_key: str = None):
        """
        Args:
            master_key: Base64-encoded master key. If None, load from env or KMS.
        """
        if not master_key:
            # In production, retrieve from AWS Secrets Manager / HashiCorp Vault
            master_key = os.getenv('ENCRYPTION_MASTER_KEY')
        
        self.cipher = Fernet(master_key.encode() if isinstance(master_key, str) else master_key)
    
    def encrypt_token(self, token: str, associated_data: str = None) -> str:
        """
        Encrypt a token.
        
        Args:
            token: Plain text OAuth token
            associated_data: Optional user_id or metadata for audit
        
        Returns:
            Base64-encoded encrypted token
        """
        encrypted = self.cipher.encrypt(token.encode())
        return encrypted.decode()
    
    def decrypt_token(self, encrypted_token: str) -> str:
        """Decrypt a token."""
        decrypted = self.cipher.decrypt(encrypted_token.encode())
        return decrypted.decode()
    
    def is_valid(self, encrypted_token: str) -> bool:
        """Check if token is readable (not corrupted)."""
        try:
            self.decrypt_token(encrypted_token)
            return True
        except:
            return False

# Usage:
# encryptor = TokenEncryptor()
# encrypted = encryptor.encrypt_token(oauth_token)
# original = encryptor.decrypt_token(encrypted)
```

---

## Audit Logging for Auto-Replies & Job Actions

```python
# Pseudocode: backend/models.py

class AuditLog(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    user_id = db.Column(db.Integer, db.ForeignKey('user.id'), nullable=False)
    action = db.Column(db.String(100), nullable=False)  # 'auto_reply_sent', 'template_approved', etc.
    resource_id = db.Column(db.Integer)  # Email ID, template ID, job ID, etc.
    resource_type = db.Column(db.String(50))  # 'email', 'template', 'job'
    
    # Encrypted sensitive data
    _details_encrypted = db.Column(db.Text)  # Encrypted JSON
    
    created_at = db.Column(db.DateTime, default=datetime.utcnow, nullable=False)
    ip_address = db.Column(db.String(45))  # IPv4 or IPv6
    user_agent = db.Column(db.Text)
    
    @property
    def details(self):
        """Return decrypted details."""
        if self._details_encrypted:
            return json.loads(decrypt_token(self._details_encrypted))
        return {}
    
    @details.setter
    def details(self, value):
        """Encrypt and store details."""
        self._details_encrypted = encrypt_token(json.dumps(value))

# Audit log entries for auto-reply:
# {
#     user_id: 123,
#     action: 'auto_reply_sent',
#     resource_id: 456,  # email_id
#     resource_type: 'email',
#     details: {
#         template_id: 5,
#         confidence_score: 0.92,
#         recipient: 'user@example.com',
#         message_id: 'msg_xyz',
#         approved_by: 123  # user_id who approved template
#     },
#     ip_address: '192.168.1.1',
#     user_agent: 'Mozilla/5.0...'
# }
```

---

## Pre-Approved Reply Safety Gates

```python
# Pseudocode: backend/worker/tasks/auto_reply.py

class AutoReplyValidator:
    """Validate that auto-reply is safe to send."""
    
    @staticmethod
    def validate(email_job, template, user) -> tuple[bool, str]:
        """
        Validate auto-reply before sending.
        
        Returns:
            (is_valid: bool, reason: str)
        """
        
        # Gate 1: Template approval
        if not template.approved_for_auto_send:
            return False, "Template not approved for auto-send"
        
        # Gate 2: Confidence threshold
        MIN_CONFIDENCE = 0.85
        if email_job.classifier_confidence < MIN_CONFIDENCE:
            return False, f"Confidence {email_job.classifier_confidence} below threshold {MIN_CONFIDENCE}"
        
        # Gate 3: Daily send limit
        daily_sent = count_replies_sent_today(user.id)
        if daily_sent >= user.settings.max_daily_replies:
            return False, f"Daily limit ({user.settings.max_daily_replies}) reached"
        
        # Gate 4: Opt-in flag
        if not user.settings.enable_auto_replies:
            return False, "Auto-replies not enabled by user"
        
        # Gate 5: Sensitive topics (content-based filter)
        SENSITIVE_KEYWORDS = ['refund', 'dispute', 'complaint', 'legal']
        if any(kw in email_job.email_subject.lower() for kw in SENSITIVE_KEYWORDS):
            return False, "Email matches sensitive topic pattern"
        
        return True, "All validations passed"

# Usage:
# is_valid, reason = AutoReplyValidator.validate(job, template, user)
# if not is_valid:
#     flag_email_for_human_review(job.id, reason)
# else:
#     send_auto_reply(job, template)
```

---

## Role-Based Access Control (RBAC)

```python
# Pseudocode: backend/models.py

class Role(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(50), unique=True)
    description = db.Column(db.Text)
    permissions = db.Column(db.JSON)  # List of permission codes

class User(db.Model):
    # ... other fields ...
    role_id = db.Column(db.Integer, db.ForeignKey('role.id'))
    role = db.relationship('Role')

# Example roles:
# {
#     'admin': {
#         'permissions': [
#             'manage_users',
#             'view_all_emails',
#             'approve_templates',
#             'modify_rules',
#             'view_audit_logs'
#         ]
#     },
#     'approver': {
#         'permissions': [
#             'view_own_emails',
#             'approve_templates',
#             'approve_auto_replies'
#         ]
#     },
#     'user': {
#         'permissions': [
#             'view_own_emails',
#             'create_templates',
#             'create_rules',
#             'launch_jobs'
#         ]
#     }
# }

# Decorator for permission checks:
from functools import wraps

def require_permission(permission: str):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            user = get_current_user()
            if permission not in user.role.permissions:
                raise HTTPException(status_code=403, detail="Forbidden")
            return func(*args, **kwargs)
        return wrapper
    return decorator

# Usage:
# @router.post("/templates/{template_id}/approve")
# @require_permission("approve_templates")
# def approve_template(template_id: int):
#     ...
```

---

## Data Retention & GDPR Compliance

```python
# Pseudocode: backend/models.py

class UserDataRetentionPolicy(db.Model):
    user_id = db.Column(db.Integer, db.ForeignKey('user.id'), primary_key=True)
    email_retention_days = db.Column(db.Integer, default=90)  # 90 days
    audit_log_retention_days = db.Column(db.Integer, default=365)  # 1 year
    reports_retention_days = db.Column(db.Integer, default=30)  # 30 days
    auto_delete_enabled = db.Column(db.Boolean, default=True)

# Background task (run daily):
@shared_task
def cleanup_expired_data():
    """Delete data older than retention period."""
    
    policies = UserDataRetentionPolicy.query.filter_by(auto_delete_enabled=True).all()
    
    for policy in policies:
        user_id = policy.user_id
        
        # Delete old emails
        cutoff_date = datetime.utcnow() - timedelta(days=policy.email_retention_days)
        EmailJob.query.filter(
            EmailJob.user_id == user_id,
            EmailJob.created_at < cutoff_date
        ).delete()
        
        # Delete old audit logs (but keep for compliance if needed)
        cutoff_date = datetime.utcnow() - timedelta(days=policy.audit_log_retention_days)
        AuditLog.query.filter(
            AuditLog.user_id == user_id,
            AuditLog.created_at < cutoff_date
        ).delete()
        
        # Delete old reports
        cutoff_date = datetime.utcnow() - timedelta(days=policy.reports_retention_days)
        DataAnalysisJob.query.filter(
            DataAnalysisJob.user_id == user_id,
            DataAnalysisJob.completed_at < cutoff_date
        ).delete()
    
    db.session.commit()

# GDPR: User can request data export
@router.post("/user/export-data")
def export_user_data():
    """Export all user data as JSON."""
    user = get_current_user()
    
    data = {
        'user': user.to_dict(),
        'email_accounts': [acc.to_dict() for acc in user.email_accounts],
        'emails': [job.to_dict() for job in user.email_jobs],
        'templates': [t.to_dict() for t in user.templates],
        'rules': [r.to_dict() for r in user.rules],
        'jobs': [j.to_dict() for j in user.data_analysis_jobs],
        'audit_logs': [log.to_dict() for log in user.audit_logs],
    }
    
    # Send as JSON file
    return FileResponse(
        json.dumps(data, indent=2),
        media_type='application/json',
        filename=f'user_data_export_{user.id}.json'
    )

# GDPR: User can request deletion
@router.post("/user/delete-account")
def delete_user_account():
    """Delete all user data (hard delete after retention period)."""
    user = get_current_user()
    
    # Schedule deletion after 30-day grace period
    user.deletion_requested_at = datetime.utcnow()
    db.session.commit()
    
    # Background task will hard delete after 30 days
    return {'status': 'deletion scheduled', 'effective_date': (datetime.utcnow() + timedelta(days=30)).isoformat()}
```

---

## TLS & Certificate Management

```bash
# Using Let's Encrypt + Certbot (automated renewal)

# Initial setup:
certbot certonly --standalone -d yourdomain.com

# Auto-renewal via cron (runs twice daily):
0 3,15 * * * certbot renew --quiet

# In docker-compose.yml:
# volumes:
#   - /etc/letsencrypt:/etc/letsencrypt
#   - /var/log/letsencrypt:/var/log/letsencrypt
```

---


# Section 8: Acceptance Testing & Validation

## Acceptance Test Cases

### Test 1: User Login & OAuth Email Linking

**Objective**: Verify secure authentication and email account linking.

```python
# Pseudocode: tests/test_acceptance_auth.py

def test_user_login_and_email_linking():
    """User can login and link Gmail account."""
    
    # Step 1: Register user
    response = client.post('/api/auth/register', json={
        'email': 'test@example.com',
        'password': 'secure_password'
    })
    assert response.status_code == 201
    user_id = response.json()['user_id']
    
    # Step 2: Login
    response = client.post('/api/auth/login', json={
        'email': 'test@example.com',
        'password': 'secure_password'
    })
    assert response.status_code == 200
    session_token = response.json()['token']
    
    # Step 3: Initiate Gmail OAuth
    response = client.get(
        '/api/email/oauth-authorize',
        params={'provider': 'gmail'},
        headers={'Authorization': f'Bearer {session_token}'}
    )
    assert response.status_code == 200
    oauth_url = response.json()['authorization_url']
    
    # Step 4: Simulate OAuth callback
    response = client.get(
        '/api/email/oauth-callback/gmail',
        params={
            'code': 'mock_auth_code',
            'state': 'mock_state'
        }
    )
    assert response.status_code == 200
    
    # Step 5: Verify token encrypted in DB
    from backend.models import UserEmailAccount
    account = UserEmailAccount.query.filter_by(user_id=user_id).first()
    assert account is not None
    assert account.provider == 'gmail'
    # Token should be encrypted (not plain text)
    assert len(account.encrypted_token) > 100
    assert 'refresh_token' not in account.encrypted_token
    
    print("✓ Test 1 passed: Login and OAuth linking works")
```

### Test 2: Email Batch Processing & Classification

**Objective**: Process test emails and verify classification labels.

```python
def test_email_batch_processing():
    """Upload test emails, process them, verify labels."""
    
    user = create_test_user()
    link_test_email_account(user)
    
    # Queue email processing task
    from backend.worker.tasks.email_processor import process_email
    
    # Create test email job
    test_emails = [
        {
            'id': '1',
            'subject': 'Q3 Financial Report - URGENT',
            'from': 'finance@company.com',
            'body': 'Please review attached quarterly report...'
        },
        {
            'id': '2',
            'subject': 'Happy Birthday from the team!',
            'from': 'hr@company.com',
            'body': 'We hope you have a great day...'
        }
    ]
    
    for email_data in test_emails:
        job = EmailJob(
            user_id=user.id,
            message_id=email_data['id'],
            email_subject=email_data['subject'],
            email_from=email_data['from'],
            email_body=email_data['body'],
            status='queued'
        )
        db.session.add(job)
        db.session.commit()
        
        # Process email
        process_email(job.id)
    
    # Verify results
    finance_email = EmailJob.query.filter_by(message_id='1').first()
    assert 'financial' in finance_email.applied_labels or 'work' in finance_email.applied_labels
    assert finance_email.status == 'processed'
    assert finance_email.classifier_confidence > 0.7
    
    birthday_email = EmailJob.query.filter_by(message_id='2').first()
    assert 'personal' in birthday_email.applied_labels or 'greeting' in birthday_email.applied_labels
    
    print("✓ Test 2 passed: Email classification works")
```

### Test 3: Pre-Approved Template & Auto-Reply

**Objective**: Approve template, send auto-reply, verify delivery.

```python
def test_auto_reply_workflow():
    """Approve template, trigger auto-reply, verify sent."""
    
    user = create_test_user()
    
    # Step 1: Create template
    template = Template(
        user_id=user.id,
        name='Support Auto-Reply',
        subject='Thanks for contacting us',
        body='Thank you for your email. We will get back to you shortly.\nBest regards',
        approved_for_auto_send=False
    )
    db.session.add(template)
    db.session.commit()
    
    # Step 2: Approve template via API
    response = client.post(
        f'/api/templates/{template.id}/approve',
        headers={'Authorization': f'Bearer {user_token}'}
    )
    assert response.status_code == 200
    
    # Verify audit log
    audit = AuditLog.query.filter(
        AuditLog.user_id == user.id,
        AuditLog.action == 'template_approved'
    ).first()
    assert audit is not None
    
    # Step 3: Incoming email matches rule → trigger auto-reply
    email_job = EmailJob(
        user_id=user.id,
        message_id='support_001',
        email_subject='Question about your service',
        email_from='customer@example.com',
        email_body='Hi, I have a question...',
        classifier_confidence=0.9,
        status='processed'
    )
    db.session.add(email_job)
    db.session.commit()
    
    # Execute auto-reply task
    from backend.worker.tasks.auto_reply import send_auto_reply
    send_auto_reply(email_job.id, template.id)
    
    # Verify email was sent (check mock SMTP or Gmail API)
    audit_log = AuditLog.query.filter(
        AuditLog.action == 'auto_reply_sent',
        AuditLog.resource_id == email_job.id
    ).first()
    assert audit_log is not None
    assert 'message_id' in audit_log.details
    
    print("✓ Test 3 passed: Auto-reply workflow works")
```

### Test 4: Follow-Up Task Scheduling

**Objective**: Create follow-up task from email, confirm scheduled in Celery.

```python
def test_follow_up_scheduling():
    """Create follow-up task, verify Celery scheduling."""
    
    user = create_test_user()
    email_job = create_test_email(user)
    
    # Create follow-up task via API
    response = client.post(
        f'/api/emails/{email_job.id}/create-followup',
        json={
            'task_type': 'follow_up',
            'scheduled_for': (datetime.utcnow() + timedelta(days=1)).isoformat(),
            'notes': 'Follow up if no response'
        },
        headers={'Authorization': f'Bearer {user_token}'}
    )
    assert response.status_code == 201
    task_id = response.json()['task_id']
    
    # Verify task in DB
    task = Task.query.get(task_id)
    assert task.status == 'pending'
    assert task.task_type == 'follow_up'
    
    # Verify Celery task ID recorded
    assert task.celery_task_id is not None
    
    print("✓ Test 4 passed: Task scheduling works")
```

### Test 5: Data-Analysis Job End-to-End

**Objective**: Upload CSV, run analysis, verify LLM summary and report.

```python
def test_data_analysis_job():
    """Upload CSV, analyze, verify summary and report generation."""
    
    user = create_test_user()
    
    # Step 1: Upload test CSV
    csv_content = """date,sales,region,product
2024-01-01,1000,North,Product A
2024-01-02,1200,North,Product A
2024-01-01,800,South,Product B
2024-01-02,950,South,Product B"""
    
    response = client.post(
        '/api/data/upload',
        files={'file': ('test.csv', csv_content)},
        headers={'Authorization': f'Bearer {user_token}'}
    )
    assert response.status_code == 201
    job_id = response.json()['job_id']
    
    # Step 2: Launch analysis job
    response = client.post(
        f'/api/jobs/{job_id}/analyze',
        json={'analysis_type': 'summary_stats'},
        headers={'Authorization': f'Bearer {user_token}'}
    )
    assert response.status_code == 202
    
    # Step 3: Wait for job completion
    job = wait_for_job_completion(job_id, timeout=30)
    assert job.status == 'completed'
    
    # Step 4: Verify results
    assert job.result_summary is not None
    assert len(job.result_summary) > 0
    assert 'sales' in job.result_summary.lower() or 'region' in job.result_summary.lower()
    
    # Step 5: Verify report generated
    assert job.report_url is not None
    report_response = requests.get(job.report_url)
    assert report_response.status_code == 200
    assert 'html' in report_response.headers['content-type'].lower()
    
    print("✓ Test 5 passed: Data analysis job works end-to-end")
```

### Test 6: Audit Log Verification

**Objective**: Verify all critical actions logged with encryption.

```python
def test_audit_logs():
    """Verify audit logs capture all critical actions."""
    
    user = create_test_user()
    
    # Perform actions that should be logged
    link_test_email_account(user)  # Logs: email_account_linked
    approve_template(user)  # Logs: template_approved
    send_auto_reply(user)  # Logs: auto_reply_sent
    
    # Query audit logs
    logs = AuditLog.query.filter_by(user_id=user.id).all()
    assert len(logs) >= 3
    
    # Verify encrypted details
    for log in logs:
        assert log._details_encrypted is not None
        assert len(log._details_encrypted) > 100
        # Verify we can decrypt and read
        details = log.details
        assert isinstance(details, dict)
    
    print("✓ Test 6 passed: Audit logging works with encryption")
```

---

## Test Harness & Mock Data Generator

```python
# Pseudocode: tests/harness.py

import json
import random
from datetime import datetime, timedelta

class TestDataGenerator:
    """Generate realistic test data."""
    
    @staticmethod
    def generate_emails(count: int = 10) -> list:
        subjects = [
            "Q{} Financial Report",
            "Team Meeting - {}",
            "Action Required: {}",
            "Birthday Reminder",
            "Project Update"
        ]
        from_addresses = [
            "finance@company.com",
            "hr@company.com",
            "manager@company.com",
            "support@company.com"
        ]
        
        emails = []
        for i in range(count):
            emails.append({
                'id': f'test_email_{i}',
                'subject': random.choice(subjects).format(i),
                'from': random.choice(from_addresses),
                'body': f'Test email body {i} with some content...',
                'received_at': (datetime.utcnow() - timedelta(hours=i)).isoformat()
            })
        return emails
    
    @staticmethod
    def generate_csv_dataset() -> str:
        """Generate test CSV data."""
        csv_lines = ['date,product,sales,region']
        for i in range(100):
            date = (datetime.utcnow() - timedelta(days=100-i)).strftime('%Y-%m-%d')
            product = random.choice(['Product A', 'Product B', 'Product C'])
            sales = random.randint(500, 5000)
            region = random.choice(['North', 'South', 'East', 'West'])
            csv_lines.append(f'{date},{product},{sales},{region}')
        return '\n'.join(csv_lines)

# Acceptance test runner
class AcceptanceTestRunner:
    """Run all acceptance tests in sequence."""
    
    def __init__(self, base_url: str, verbose: bool = True):
        self.base_url = base_url
        self.verbose = verbose
        self.results = []
    
    def run_all(self) -> bool:
        """Run all tests. Return True if all pass."""
        tests = [
            ('Login & OAuth Linking', test_user_login_and_email_linking),
            ('Email Batch Processing', test_email_batch_processing),
            ('Auto-Reply Workflow', test_auto_reply_workflow),
            ('Task Scheduling', test_follow_up_scheduling),
            ('Data Analysis Job', test_data_analysis_job),
            ('Audit Logging', test_audit_logs),
        ]
        
        for test_name, test_func in tests:
            try:
                test_func()
                self.results.append((test_name, 'PASS'))
                if self.verbose:
                    print(f"✓ {test_name}")
            except AssertionError as e:
                self.results.append((test_name, f'FAIL: {str(e)}'))
                if self.verbose:
                    print(f"✗ {test_name}: {str(e)}")
            except Exception as e:
                self.results.append((test_name, f'ERROR: {str(e)}'))
                if self.verbose:
                    print(f"✗ {test_name} (ERROR): {str(e)}")
        
        return all(result[1] == 'PASS' for result in self.results)
    
    def print_summary(self):
        """Print test summary."""
        print("\n" + "="*50)
        print("ACCEPTANCE TEST SUMMARY")
        print("="*50)
        for test_name, result in self.results:
            status = "✓ PASS" if result == 'PASS' else "✗ " + result
            print(f"{test_name:<40} {status}")
        
        passed = sum(1 for _, r in self.results if r == 'PASS')
        total = len(self.results)
        print("-"*50)
        print(f"TOTAL: {passed}/{total} tests passed")
        print("="*50 + "\n")

# Run tests:
# if __name__ == '__main__':
#     runner = AcceptanceTestRunner('http://localhost:8000')
#     all_pass = runner.run_all()
#     runner.print_summary()
#     exit(0 if all_pass else 1)
```

---


# Section 9: Master TODO List & Scaffolding

## Complete Master TODO List (Granular, Ordered)

### PHASE A: MVP Scoping & Repo Init

#### A.1 Project Setup & Environment
- [ ] **A.1.1** Create git repo on GitHub, initialize with README and .gitignore
- [ ] **A.1.2** Create virtual environment: `python -m venv venv` + activate
- [ ] **A.1.3** Create `requirements.txt` with: FastAPI, SQLAlchemy, Celery, Redis, LangChain, OpenAI, cryptography, psycopg2, python-dotenv
- [ ] **A.1.4** Create `.env.example` with all required env vars (see section below)
- [ ] **A.1.5** Create `.env` (local) from `.env.example` with test values
- [ ] **A.1.6** Create `docker-compose.yml` with services: backend, worker, postgres, redis, frontend stub
- [ ] **A.1.7** Create `Dockerfile` for Python backend with multi-stage build

#### A.2 Backend Core Setup
- [ ] **A.2.1** Create `backend/main.py` FastAPI app with health endpoint
- [ ] **A.2.2** Create `backend/config.py` for environment variable loading
- [ ] **A.2.3** Create `backend/models.py` with Postgres schema: User, UserEmailAccount, EmailJob, Task, AuditLog, Template, Rule
- [ ] **A.2.4** Create `backend/database.py` with SQLAlchemy session factory
- [ ] **A.2.5** Create `backend/api/health.py` with `/health` endpoint
- [ ] **A.2.6** Create `backend/security/encryption.py` for token encryption/decryption

#### A.3 Authentication
- [ ] **A.3.1** Create `backend/api/auth.py` with `/register`, `/login`, `/logout` endpoints
- [ ] **A.3.2** Implement JWT token generation and validation
- [ ] **A.3.3** Create `backend/security/jwt.py` with encode/decode functions
- [ ] **A.3.4** Create user model with password hashing (bcrypt)
- [ ] **A.3.5** Add authentication middleware to check JWT tokens

#### A.4 Email Integration (OAuth Flow)
- [ ] **A.4.1** Create `backend/connectors/base.py` with `BaseEmailConnector` interface
- [ ] **A.4.2** Create `backend/connectors/email_adapters.py` with `GmailConnector`, `OutlookConnector`, `IMAPConnector` stubs
- [ ] **A.4.3** Create `backend/connectors/factory.py` with connector registration and creation
- [ ] **A.4.4** Implement Gmail OAuth2 flow: `/api/email/oauth-authorize`, `/api/email/oauth-callback`
- [ ] **A.4.5** Test email account linking with Gmail sandbox account
- [ ] **A.4.6** Store encrypted OAuth tokens in DB (UserEmailAccount table)

#### A.5 Email Processing Pipeline (Basic)
- [ ] **A.5.1** Create `backend/worker/celery_app.py` with Celery config
- [ ] **A.5.2** Create `backend/worker/tasks/email_processor.py` with `process_email` task stub
- [ ] **A.5.3** Implement email fetch via Gmail API
- [ ] **A.5.4** Create `backend/llm/embeddings.py` with OpenAI embedding function
- [ ] **A.5.5** Create `backend/storage/vector_store.py` with FAISS index initialization
- [ ] **A.5.6** Create basic classification chain (LangChain) in `backend/llm/classifier_chain.py`
- [ ] **A.5.7** Implement email classification: label + confidence score
- [ ] **A.5.8** Add endpoint to trigger email processing: `/api/emails/process`

#### A.6 Data Analysis (MVP)
- [ ] **A.6.1** Create `backend/connectors/data_adapters.py` with `CSVConnector`, `BaseDataConnector`
- [ ] **A.6.2** Create `backend/worker/tasks/data_analysis.py` with basic stats job
- [ ] **A.6.3** Implement CSV upload endpoint: `/api/data/upload`
- [ ] **A.6.4** Create analysis worker: read CSV → compute stats → generate LLM summary
- [ ] **A.6.5** Store results in DB and return summary + report link

#### A.7 Frontend Scaffold (Next.js)
- [ ] **A.7.1** Initialize Next.js app with TailwindCSS
- [ ] **A.7.2** Create `pages/index.tsx` (dashboard home)
- [ ] **A.7.3** Create `pages/inbox.tsx` (email list + classify button)
- [ ] **A.7.4** Create `pages/login.tsx` (authentication page)
- [ ] **A.7.5** Create `pages/analysis.tsx` (CSV upload + results)
- [ ] **A.7.6** Create `components/Header.tsx`, `components/Sidebar.tsx`
- [ ] **A.7.7** Add API client utility (`lib/api.ts`) for HTTP requests to backend

#### A.8 Documentation & Testing
- [ ] **A.8.1** Write README.md: overview, setup instructions, running locally
- [ ] **A.8.2** Create `SETUP.md`: environment variables, database migrations, secrets config
- [ ] **A.8.3** Create `tests/test_auth.py` with basic auth tests
- [ ] **A.8.4** Create `tests/test_email_processor.py` with email fetch test
- [ ] **A.8.5** Test entire MVP flow locally: register → login → link email → process emails → see results

---

### PHASE B: Core Features & Dashboard

#### B.1 Celery Scheduler & Task Management
- [ ] **B.1.1** Implement task scheduling with Celery Beat (cron jobs)
- [ ] **B.1.2** Create email polling task: run every 5 minutes
- [ ] **B.1.3** Implement task retry logic (exponential backoff, max 3 retries)
- [ ] **B.1.4** Add task status tracking: pending → running → completed/failed
- [ ] **B.1.5** Create `/api/jobs` endpoint to view job status

#### B.2 Pre-Approved Reply System
- [ ] **B.2.1** Create Template model with versioning
- [ ] **B.2.2** Create `/api/templates` CRUD endpoints
- [ ] **B.2.3** Implement template approval flow (requires user confirmation)
- [ ] **B.2.4** Create `backend/worker/tasks/auto_reply.py` with safety gates:
  - [ ] **B.2.4.a** Confidence threshold check (> 0.85)
  - [ ] **B.2.4.b** Daily send limit enforcement
  - [ ] **B.2.4.c** Sensitive topic detection
  - [ ] **B.2.4.d** Template approval verification
- [ ] **B.2.5** Implement LLM-based template substitution (personalization)
- [ ] **B.2.6** Create `/api/templates/{id}/approve` endpoint
- [ ] **B.2.7** Test auto-reply with test email account

#### B.3 Rules Engine (YAML/JSON DSL)
- [ ] **B.3.1** Create Rule model with YAML/JSON storage
- [ ] **B.3.2** Create `backend/engine/rules_engine.py` evaluator
- [ ] **B.3.3** Implement condition evaluation: contains, regex, similarity, domain, etc.
- [ ] **B.3.4** Support action types: apply_label, flag, auto_reply, create_task
- [ ] **B.3.5** Create `/api/rules` CRUD endpoints
- [ ] **B.3.6** Integrate rules into email processor pipeline

#### B.4 Vector Store & Retrieval (FAISS)
- [ ] **B.4.1** Implement FAISS index builder in `backend/storage/vector_store.py`
- [ ] **B.4.2** Create background task to build/update index from emails
- [ ] **B.4.3** Implement retrieval chain: query vector store → find similar emails + rules
- [ ] **B.4.4** Integrate retriever into classifier chain (context augmentation)

#### B.5 Dashboard Pages & UI
- [ ] **B.5.1** Build inbox preview page: show emails, labels, confidence scores
- [ ] **B.5.2** Build template manager page: create, edit, approve templates
- [ ] **B.5.3** Build rule builder page: visual editor for YAML rules
- [ ] **B.5.4** Build job launcher page: select analysis type, upload data, view results
- [ ] **B.5.5** Build logs page: view job history, errors, audit trail
- [ ] **B.5.6** Add real-time status updates (WebSocket or polling)

#### B.6 Integration Testing
- [ ] **B.6.1** Create `tests/test_rules_engine.py` with rule evaluation tests
- [ ] **B.6.2** Create `tests/test_auto_reply.py` with safety gate tests
- [ ] **B.6.3** Create `tests/test_data_analysis.py` with analysis job tests
- [ ] **B.6.4** Integration test: email → rule match → auto-reply sent

---

### PHASE C: Hardening & Deployment

#### C.1 Security & Encryption
- [ ] **C.1.1** Implement token encryption at rest (AES-256 + KMS-style key management)
- [ ] **C.1.2** Create secrets management: load from .env or AWS Secrets Manager
- [ ] **C.1.3** Add HTTPS support (Let's Encrypt + certbot)
- [ ] **C.1.4** Implement CORS policy
- [ ] **C.1.5** Add rate limiting to API endpoints
- [ ] **C.1.6** Implement CSRF protection for POST endpoints

#### C.2 Audit Logging
- [ ] **C.2.1** Create audit logging middleware
- [ ] **C.2.2** Log all critical actions: template approval, auto-reply sent, job created, etc.
- [ ] **C.2.3** Encrypt sensitive details in audit logs
- [ ] **C.2.4** Create `/api/audit-logs` endpoint (admin only)
- [ ] **C.2.5** Implement data retention policy and cleanup task

#### C.3 Observability & Monitoring
- [ ] **C.3.1** Integrate Sentry error tracking
- [ ] **C.3.2** Add Prometheus metrics exports (job duration, email count, LLM costs)
- [ ] **C.3.3** Create basic Grafana dashboard
- [ ] **C.3.4** Add structured logging (JSON format for easy parsing)
- [ ] **C.3.5** Implement health check endpoints for monitoring

#### C.4 Testing & Quality
- [ ] **C.4.1** Expand unit tests (aim for 80%+ coverage)
- [ ] **C.4.2** Create integration tests for all API endpoints
- [ ] **C.4.3** Create `tests/test_acceptance.py` with full end-to-end acceptance tests
- [ ] **C.4.4** Add load testing script (simulate 100 emails/min)
- [ ] **C.4.5** Set up code quality checks: linting (flake8), formatting (black), type checking (mypy)

#### C.5 Docker & Deployment
- [ ] **C.5.1** Finalize Dockerfile with security best practices
- [ ] **C.5.2** Finalize docker-compose.yml with all services
- [ ] **C.5.3** Add database migration script (Alembic)
- [ ] **C.5.4** Test full deployment locally: `docker-compose up` → all healthy
- [ ] **C.5.5** Create deployment guide with environment variables checklist

#### C.6 CI/CD Pipeline
- [ ] **C.6.1** Create GitHub Actions workflow for: lint, test, build Docker image
- [ ] **C.6.2** Push Docker image to Docker Hub or GitHub Container Registry
- [ ] **C.6.3** Automate deployment (optional: push to DigitalOcean App Platform)
- [ ] **C.6.4** Set up staging environment for testing

#### C.7 Acceptance Testing
- [ ] **C.7.1** Run full acceptance test harness (6 tests from Section 8)
- [ ] **C.7.2** Verify all acceptance criteria pass
- [ ] **C.7.3** Create demo script for hand-off
- [ ] **C.7.4** Document acceptance test results

---

### PHASE D: Optional Scaling & Enterprise

#### D.1 Managed Vector Database
- [ ] **D.1.1** Evaluate Pinecone API
- [ ] **D.1.2** Create migration script: export FAISS → import to Pinecone
- [ ] **D.1.3** Update retriever code to use Pinecone client
- [ ] **D.1.4** Test retrieval performance

#### D.2 Advanced Task Scheduling (Temporal)
- [ ] **D.2.1** Evaluate Temporal/Temporal Cloud
- [ ] **D.2.2** Create Temporal workflow definitions for: email processing, data analysis
- [ ] **D.2.3** Migrate from Celery to Temporal workers
- [ ] **D.2.4** Test durable execution and retry guarantees

#### D.3 Kubernetes Deployment
- [ ] **D.3.1** Create Helm charts for backend, worker, postgres, redis
- [ ] **D.3.2** Set up k8s namespaces and resource limits
- [ ] **D.3.3** Implement auto-scaling: HPA based on job queue depth
- [ ] **D.3.4** Deploy to GKE/EKS/DigitalOcean k8s
- [ ] **D.3.5** Set up Prometheus + Grafana on k8s

#### D.4 Role-Based Access Control (RBAC)
- [ ] **D.4.1** Create Role and Permission models
- [ ] **D.4.2** Implement RBAC middleware for API endpoints
- [ ] **D.4.3** Create roles: admin, approver, user
- [ ] **D.4.4** Add role management UI (admin only)

#### D.5 Advanced Data Connectors
- [ ] **D.5.1** Create Google Sheets connector
- [ ] **D.5.2** Create S3 connector
- [ ] **D.5.3** Create SQL database connector (Postgres/MySQL)
- [ ] **D.5.4** Create BigQuery connector (optional)

#### D.6 Enterprise Features
- [ ] **D.6.1** Implement GDPR compliance: data export, deletion
- [ ] **D.6.2** Add multi-tenancy support (separate data per customer)
- [ ] **D.6.3** Create billing/metering system
- [ ] **D.6.4** Implement SSO (SAML/OIDC) support

---

## Scaffolding Commands (Quick Start)

### Phase A Quick Setup

```bash
# 1. Create project directory and virtual environment
cd /path/to/projects
mkdir virtual-assistant-scheduler
cd virtual-assistant-scheduler

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Clone repo or initialize git
git init

# 4. Create directory structure
mkdir -p backend/{api,connectors,llm,security,storage,worker/tasks}
mkdir -p frontend
mkdir -p tests
mkdir -p docker

# 5. Create initial files
touch backend/__init__.py
touch backend/main.py
touch backend/config.py
touch backend/models.py
touch backend/database.py
touch requirements.txt
touch .env.example
touch docker-compose.yml
touch Dockerfile
touch README.md

# 6. Install dependencies
pip install fastapi uvicorn sqlalchemy psycopg2-binary celery redis langchain openai cryptography python-dotenv pydantic

# 7. Create Dockerfile (see section below)
# ... (see Dockerfile content below)

# 8. Create docker-compose.yml (see section below)
# ... (see docker-compose.yml content below)

# 9. Run locally
docker-compose up --build
```

---


## Key Scaffolding Files

### 1. `.env.example`

```env
# Backend Config
FASTAPI_ENV=development
FASTAPI_DEBUG=true
SECRET_KEY=your-secret-key-here-change-in-production

# Database
DATABASE_URL=postgresql://user:password@postgres:5432/ai_agent_db
SQLALCHEMY_ECHO=true

# Redis & Celery
REDIS_URL=redis://redis:6379/0
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/0

# OpenAI
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-3.5-turbo
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Gmail OAuth (get from Google Cloud Console)
GMAIL_CLIENT_ID=xxx.apps.googleusercontent.com
GMAIL_CLIENT_SECRET=xxx
GMAIL_REDIRECT_URI=http://localhost:8000/api/email/oauth-callback/gmail

# Outlook OAuth (get from Azure Portal)
OUTLOOK_CLIENT_ID=xxx
OUTLOOK_CLIENT_SECRET=xxx

# Encryption
ENCRYPTION_KEY=your-base64-encoded-fernet-key  # Generate with: from cryptography.fernet import Fernet; print(Fernet.generate_key())

# AWS/S3 (if using)
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
AWS_S3_BUCKET=ai-agent-reports
AWS_REGION=us-east-1

# Sentry (optional)
SENTRY_DSN=https://...

# Frontend
NEXT_PUBLIC_API_BASE_URL=http://localhost:8000/api
```

### 2. `Dockerfile`

```dockerfile
# Multi-stage build for backend

FROM python:3.11-slim as builder

WORKDIR /tmp
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    postgresql-client \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Final stage
FROM python:3.11-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
    postgresql-client \
    && rm -rf /var/lib/apt/lists/*

COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH

COPY . .

EXPOSE 8000

CMD ["uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

### 3. `docker-compose.yml`

```yaml
version: '3.8'

services:
  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: ai_agent_user
      POSTGRES_PASSWORD: ai_agent_password
      POSTGRES_DB: ai_agent_db
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ai_agent_user"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

  backend:
    build:
      context: .
      dockerfile: Dockerfile
    command: uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload
    env_file: .env
    ports:
      - "8000:8000"
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    volumes:
      - .:/app
    environment:
      - DATABASE_URL=postgresql://ai_agent_user:ai_agent_password@postgres:5432/ai_agent_db
      - REDIS_URL=redis://redis:6379/0

  worker:
    build:
      context: .
      dockerfile: Dockerfile
    command: celery -A backend.worker.celery_app worker -l info
    env_file: .env
    depends_on:
      - redis
      - postgres
    volumes:
      - .:/app
    environment:
      - DATABASE_URL=postgresql://ai_agent_user:ai_agent_password@postgres:5432/ai_agent_db
      - REDIS_URL=redis://redis:6379/0

  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile
    ports:
      - "3000:3000"
    depends_on:
      - backend
    environment:
      - NEXT_PUBLIC_API_BASE_URL=http://localhost:8000/api

volumes:
  postgres_data:
  redis_data:
```

### 4. `backend/main.py`

```python
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from contextlib import asynccontextmanager
import logging

from backend.config import settings
from backend.database import engine, Base
from backend.api import auth, email, jobs

# Create tables
Base.metadata.create_all(bind=engine)

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup
    logging.info("Starting up AI Agent Backend")
    yield
    # Shutdown
    logging.info("Shutting down")

app = FastAPI(
    title="AI Agent API",
    description="Virtual Assistant & Task Scheduler",
    version="0.1.0",
    lifespan=lifespan
)

# CORS middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:3000", "http://localhost:8000"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Include routers
app.include_router(auth.router, prefix="/api/auth", tags=["auth"])
app.include_router(email.router, prefix="/api/email", tags=["email"])
app.include_router(jobs.router, prefix="/api/jobs", tags=["jobs"])

@app.get("/health")
async def health():
    return {"status": "healthy", "service": "ai-agent-backend"}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
```

### 5. `backend/worker/celery_app.py`

```python
from celery import Celery
from backend.config import settings

celery_app = Celery(
    'ai_agent_worker',
    broker=settings.REDIS_URL,
    backend=settings.REDIS_URL
)

celery_app.conf.update(
    task_serializer='json',
    accept_content=['json'],
    result_serializer='json',
    timezone='UTC',
    enable_utc=True,
    task_track_started=True,
    task_time_limit=30 * 60,  # 30 minutes
    task_soft_time_limit=25 * 60,  # 25 minutes
)

# Auto-discover tasks from backend.worker.tasks
celery_app.autodiscover_tasks(['backend.worker.tasks'])
```

### 6. `backend/models.py` (Key Schemas)

```python
from sqlalchemy import Column, Integer, String, Text, DateTime, JSON, ForeignKey, Boolean
from sqlalchemy.orm import relationship
from backend.database import Base
from datetime import datetime

class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    email = Column(String(255), unique=True)
    hashed_password = Column(String(255))
    created_at = Column(DateTime, default=datetime.utcnow)
    email_accounts = relationship("UserEmailAccount", back_populates="user")
    email_jobs = relationship("EmailJob", back_populates="user")

class UserEmailAccount(Base):
    __tablename__ = "user_email_accounts"
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey("users.id"))
    provider = Column(String(50))  # 'gmail', 'outlook', 'imap'
    email_address = Column(String(255))
    encrypted_token = Column(Text)
    scopes = Column(String(500))
    created_at = Column(DateTime, default=datetime.utcnow)
    user = relationship("User", back_populates="email_accounts")

class EmailJob(Base):
    __tablename__ = "email_jobs"
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey("users.id"))
    message_id = Column(String(500))
    email_subject = Column(String(500))
    email_from = Column(String(255))
    email_body = Column(Text)
    status = Column(String(50))  # 'pending', 'processed', 'failed'
    classifier_confidence = Column(Float, default=0.0)
    applied_labels = Column(String(500))  # Comma-separated
    created_at = Column(DateTime, default=datetime.utcnow)
    user = relationship("User", back_populates="email_jobs")

class Task(Base):
    __tablename__ = "tasks"
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey("users.id"))
    task_type = Column(String(50))  # 'email_process', 'auto_reply', 'data_analysis'
    status = Column(String(50))  # 'pending', 'running', 'completed', 'failed'
    celery_task_id = Column(String(255))
    params = Column(JSON)
    result = Column(JSON)
    error_message = Column(Text)
    created_at = Column(DateTime, default=datetime.utcnow)

class AuditLog(Base):
    __tablename__ = "audit_logs"
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey("users.id"))
    action = Column(String(100))  # 'email_processed', 'auto_reply_sent', 'template_approved'
    resource_id = Column(Integer)
    resource_type = Column(String(50))  # 'email', 'template', 'job'
    _details_encrypted = Column(Text)
    created_at = Column(DateTime, default=datetime.utcnow)

class Template(Base):
    __tablename__ = "templates"
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey("users.id"))
    name = Column(String(255))
    subject = Column(String(500))
    body = Column(Text)
    approved_for_auto_send = Column(Boolean, default=False)
    created_at = Column(DateTime, default=datetime.utcnow)

class Rule(Base):
    __tablename__ = "rules"
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey("users.id"))
    name = Column(String(255))
    config = Column(Text)  # YAML/JSON rule definition
    enabled = Column(Boolean, default=True)
    created_at = Column(DateTime, default=datetime.utcnow)
```

### 7. `frontend/pages/index.tsx` (Next.js Home)

```typescript
import { useEffect, useState } from 'react';
import Link from 'next/link';

export default function Home() {
  const [user, setUser] = useState(null);

  useEffect(() => {
    // Check if logged in
    const token = localStorage.getItem('auth_token');
    if (token) {
      // Fetch user profile
      fetch(`${process.env.NEXT_PUBLIC_API_BASE_URL}/auth/me`, {
        headers: { 'Authorization': `Bearer ${token}` }
      })
        .then(r => r.json())
        .then(data => setUser(data))
        .catch(() => setUser(null));
    }
  }, []);

  return (
    <div className="min-h-screen bg-gradient-to-br from-blue-500 to-purple-600 text-white">
      <header className="p-4 border-b border-white/20">
        <h1 className="text-2xl font-bold">AI Agent Dashboard</h1>
      </header>

      <main className="p-8">
        {!user ? (
          <div className="text-center">
            <h2 className="text-4xl font-bold mb-4">Welcome</h2>
            <p className="mb-8 text-lg">Virtual Assistant & Task Scheduler</p>
            <Link href="/login" className="bg-white text-blue-600 px-6 py-2 rounded-lg font-bold">
              Login
            </Link>
          </div>
        ) : (
          <div className="grid grid-cols-1 md:grid-cols-3 gap-6">
            <Link href="/inbox" className="bg-white/20 p-6 rounded-lg hover:bg-white/30 transition">
              <h3 className="text-xl font-bold">📧 Inbox</h3>
              <p className="text-sm mt-2">Manage and process emails</p>
            </Link>
            <Link href="/analysis" className="bg-white/20 p-6 rounded-lg hover:bg-white/30 transition">
              <h3 className="text-xl font-bold">📊 Data Analysis</h3>
              <p className="text-sm mt-2">Upload and analyze datasets</p>
            </Link>
            <Link href="/templates" className="bg-white/20 p-6 rounded-lg hover:bg-white/30 transition">
              <h3 className="text-xl font-bold">📝 Templates</h3>
              <p className="text-sm mt-2">Manage reply templates</p>
            </Link>
          </div>
        )}
      </main>
    </div>
  );
}
```

---


# Section 10: Checklists (PR, Release, Hand-off)

## PR Checklist (Before Merging to Main)

### Code Quality
- [ ] Code follows PEP 8 style guidelines (run `black` formatter)
- [ ] No linting errors (run `flake8` or `pylint`)
- [ ] Type hints added to all functions (`mypy` clean)
- [ ] No hardcoded secrets or credentials
- [ ] No print() statements; use logging instead
- [ ] Remove debug code and commented-out lines

### Testing
- [ ] Unit tests added for new functions (min 80% coverage)
- [ ] Integration tests added for new API endpoints
- [ ] All tests pass locally: `pytest tests/`
- [ ] Load test passed (simulate expected traffic)
- [ ] No flaky tests (tests pass consistently)

### Security
- [ ] No SQL injection vulnerabilities (use parameterized queries)
- [ ] No XSS vulnerabilities in frontend
- [ ] No exposed credentials in logs or error messages
- [ ] CORS policy reviewed and secure
- [ ] Input validation on all API endpoints
- [ ] Rate limiting applied to sensitive endpoints

### Documentation
- [ ] Docstrings added to all public functions
- [ ] README updated if new setup steps required
- [ ] API endpoints documented (in code or OpenAPI)
- [ ] Complex logic has inline comments
- [ ] Changelog updated (if applicable)

### Performance
- [ ] Database queries optimized (no N+1 queries)
- [ ] No unnecessary loops or recursion
- [ ] Cache headers set on static assets
- [ ] API response time < 500ms for most endpoints

### Database
- [ ] Schema migrations created (Alembic)
- [ ] Backward compatibility maintained
- [ ] Indexes added on frequently queried columns
- [ ] No data loss on rollback

### Deployment Readiness
- [ ] Docker image builds without errors
- [ ] docker-compose.yml updated if services changed
- [ ] .env.example updated with new variables
- [ ] Health check endpoints working
- [ ] Graceful shutdown handled

### Review Checklist
- [ ] At least 1 code review approval
- [ ] All review comments addressed
- [ ] CI/CD pipeline passes (GitHub Actions)
- [ ] No merge conflicts with main branch
- [ ] Commit history clean (meaningful messages)

---

## Release Checklist (MVP Ready)

### Pre-Release Testing
- [ ] Full acceptance test harness passes (6/6 tests)
- [ ] Manual smoke test on staging environment
- [ ] Database backups taken
- [ ] Rollback plan documented
- [ ] Load testing completed (100+ concurrent users)
- [ ] Security audit passed
- [ ] Vulnerability scan passed (no critical issues)

### Documentation & Knowledge
- [ ] README.md complete and up-to-date
- [ ] SETUP.md with step-by-step deployment instructions
- [ ] DEPLOYMENT.md with production checklist
- [ ] API documentation complete (Swagger/OpenAPI)
- [ ] Admin runbook created (how to manage email providers, templates, workers)
- [ ] Troubleshooting guide written (common issues + solutions)
- [ ] Demo script prepared and tested

### Infrastructure & Monitoring
- [ ] Production environment configured
- [ ] Environment variables securely managed (secrets manager)
- [ ] Database backups automated (daily)
- [ ] Log aggregation set up (Sentry / ELK)
- [ ] Monitoring dashboards created (Prometheus/Grafana)
- [ ] Alerts configured for critical failures
- [ ] Status page set up (for incidents)

### Compliance & Security
- [ ] HTTPS/TLS enabled with valid certificates
- [ ] Token encryption verified
- [ ] Audit logging enabled and tested
- [ ] GDPR data retention policy implemented
- [ ] Data privacy policy written and agreed
- [ ] Security vulnerability disclosure policy created
- [ ] Rate limiting and DDoS protection configured

### Release Artifacts
- [ ] Docker images tagged and pushed to registry
- [ ] Version number bumped (0.1.0)
- [ ] Release notes written (features, fixes, known issues)
- [ ] Changelog updated
- [ ] Git tag created (v0.1.0)
- [ ] Release announcement prepared (blog/email/Slack)

### Go/No-Go Decision
- [ ] Executive stakeholders signed off
- [ ] Product requirements met (feature complete)
- [ ] Performance benchmarks met
- [ ] Security review passed
- [ ] Legal/compliance review passed
- [ ] Support team trained on new features
- [ ] Decision: **GO** / **NO-GO**

---

## Hand-Off Checklist (What to Deliver & Demo)

### Code & Repository
- [ ] **Deliverable**: Git repository with clean commit history
  - [ ] All code on `main` branch
  - [ ] Tagged release version (e.g., `v0.1.0-mvp`)
  - [ ] .gitignore properly configured
  - [ ] README links to all documentation
  
- [ ] **Deliverable**: Docker image in registry (Docker Hub or GCR)
  - [ ] Image tagged `ai-agent:0.1.0`
  - [ ] Image is reproducible (deterministic builds)
  - [ ] Image scanned for vulnerabilities

### Documentation Package

- [ ] **SETUP.md** (Step-by-step local setup)
  ```markdown
  1. Clone repo
  2. Create virtual environment
  3. Copy .env.example → .env and fill in values
  4. docker-compose up
  5. Open http://localhost:3000
  ```

- [ ] **DEPLOYMENT.md** (Production deployment)
  ```markdown
  1. Choose hosting provider (DigitalOcean/AWS/etc.)
  2. Set up secrets manager
  3. Run database migrations
  4. Deploy via Docker Compose or k8s
  5. Verify health checks
  6. Update DNS records
  ```

- [ ] **ADMIN_RUNBOOK.md** (How to operate)
  ```markdown
  - How to link Gmail/Outlook accounts
  - How to create and approve templates
  - How to define rules
  - How to monitor worker health
  - How to view logs and audit trail
  - How to scale worker pool
  - Troubleshooting guide
  ```

- [ ] **API_DOCUMENTATION.md**
  - All endpoints documented (GET, POST, PUT, DELETE)
  - Request/response examples
  - Error codes explained
  - Rate limiting documented

- [ ] **ARCHITECTURE.md** (System design overview)
  - Diagram of system components
  - Data flow explanation
  - Extension points for new automations
  - Technology stack rationale

- [ ] **COST_ANALYSIS.md**
  - Monthly cost breakdown
  - Cost optimization strategies
  - Pricing recommendations for customers

- [ ] **SECURITY.md**
  - OAuth setup instructions
  - Token encryption details
  - How audit logs work
  - Data retention policies
  - Compliance checklist

### Demo & Acceptance Tests

#### Demo Script (15 minutes)

```
1. Show dashboard login (2 min)
   - Register new user
   - Login with credentials
   - Show home page

2. Link email account (3 min)
   - Click "Link Email"
   - Show OAuth flow to Gmail
   - Confirm token stored encrypted

3. Process test emails (3 min)
   - Show test emails in inbox
   - Click "Process Emails"
   - Show classification results + confidence scores
   - Show applied labels

4. Approve template & test auto-reply (4 min)
   - Create email template
   - Show approval workflow
   - Create rule to trigger auto-reply
   - Send test email to mailbox
   - Show auto-reply sent + audit log entry

5. Data analysis job (3 min)
   - Upload test CSV
   - Click "Analyze"
   - Show LLM-generated summary
   - Download report (PDF/HTML)
```

#### Acceptance Test Results

Run and pass all 6 acceptance tests:

1. ✓ **User login & OAuth email linking**
2. ✓ **Email batch processing & classification**
3. ✓ **Pre-approved template & auto-reply**
4. ✓ **Follow-up task scheduling**
5. ✓ **Data-analysis job end-to-end**
6. ✓ **Audit log verification**

**Execute**: `pytest tests/test_acceptance.py -v`

### Hand-Off Call Agenda (45 minutes)

**15 min: System Overview**
- Architecture diagram walkthrough
- Tech stack and why those choices
- Cost breakdown

**20 min: Live Demo**
- Run demo script (see above)
- Show logs and monitoring dashboard
- Answer technical questions

**10 min: Operations & Support**
- How to scale workers
- How to monitor system health
- Where to get help (docs, GitHub issues, Slack)

**Optional: Q&A**
- Discuss future enhancements (Phase D)
- Talk about migration paths

### Deliverables Checklist

**Before hand-off call:**

- [ ] Code repository (GitHub/GitLab)
- [ ] Docker image in registry
- [ ] SETUP.md (local development)
- [ ] DEPLOYMENT.md (production)
- [ ] ADMIN_RUNBOOK.md (operations)
- [ ] API_DOCUMENTATION.md
- [ ] ARCHITECTURE.md
- [ ] COST_ANALYSIS.md
- [ ] SECURITY.md
- [ ] Demo script (tested & working)
- [ ] Acceptance test results (screenshots or video)
- [ ] Presentation slides (15-20 slides)
- [ ] Post-hand-off support agreement (SLA, response time)

**During hand-off call:**

- [ ] Present system overview (no more than 15 min)
- [ ] Perform live demo (exactly as scripted)
- [ ] Answer questions about architecture & operations
- [ ] Show monitoring & alerting setup
- [ ] Discuss scaling strategy

**After hand-off call:**

- [ ] Provide access to all systems (GitHub, Docker, AWS, monitoring)
- [ ] Create admin user for recipient
- [ ] Set up Slack or email channel for support
- [ ] Schedule first follow-up call (1 week post-handoff)

---

## Post-MVP Roadmap (Optional)

### Immediate Actions (Week 1-2 post-launch)
- [ ] Monitor system health (errors, performance, uptime)
- [ ] Gather user feedback on email linking, template approval, auto-reply safety
- [ ] Fix any critical bugs found in production
- [ ] Optimize slow API endpoints (if any)

### Phase B Kickoff (Weeks 3-6)
- [ ] Begin work on scheduler improvements
- [ ] Start building advanced dashboard features
- [ ] Add multi-email account support
- [ ] Plan vector store integration (FAISS → Pinecone)

### Phase C Kickoff (Weeks 7-12)
- [ ] Hardening: security audit, encryption, audit logs
- [ ] Scale testing: prepare for 10x traffic
- [ ] Enterprise features: RBAC, billing, SSO
- [ ] Plan Kubernetes deployment

### Phase D Kickoff (Weeks 13+)
- [ ] Scale to managed services
- [ ] Add advanced connectors (Salesforce, BigQuery)
- [ ] Multi-tenancy support
- [ ] Prepare for enterprise sales

---
