Skip to content

techjusticelab/Motion-API

Repository files navigation

Motion Index Fiber

A high-performance legal document processing API built with Go and Fiber, designed for California public defenders. Features production-ready DigitalOcean integration with MCP-powered services and comprehensive test coverage following TDD principles.

Features

  • 🏗️ Cloud-Native Architecture: Full DigitalOcean integration with MCP tools and S3-compatible APIs
  • 📄 Document Processing: Multi-format text extraction (PDF, DOCX, TXT) with AI-powered classification
  • 🔍 Advanced Search: OpenSearch integration with legal-specific filtering and aggregations
  • ☁️ Hybrid Storage: DigitalOcean Spaces with intelligent CDN delivery and caching
  • 🔐 Secure Authentication: JWT-based auth with Supabase integration and middleware
  • ⚡ High Performance: Built on Fiber v2 with zero-allocation design and concurrent processing
  • 📊 Production Monitoring: Comprehensive health checks, metrics, and observability
  • 🧪 Test-Driven: 100% test coverage following TDD and UNIX philosophy principles

Quick Start

Prerequisites

  • Go 1.21+ with modules support
  • DigitalOcean Account with:
    • Spaces storage bucket
    • Managed OpenSearch cluster
    • CDN configured (automatic via MCP)
  • Supabase Account for authentication
  • OpenAI API Key for document classification
  • Optional: Docker for containerized deployment

Installation

  1. Clone the repository:
git clone <repository-url>
cd motion-index-fiber
  1. Install dependencies:
go mod tidy
  1. Configure environment:
cp .env.example .env
# Edit .env with your configuration
  1. Run the server:
# Development mode
go run cmd/server/main.go

# Or with built binary
go build -o bin/server cmd/server/main.go
./bin/server

The server will start on http://localhost:6000 (default) or the port specified in your .env file.

Configuration

Environment Configuration

Create a .env file from the template and configure the following:

Core Application

# Server Configuration
PORT=6000
ENVIRONMENT=local  # local, staging, production
PRODUCTION=false
ALLOWED_ORIGINS=http://localhost:3000,http://localhost:5173

# Authentication
JWT_SECRET=your-jwt-secret-key

DigitalOcean Services

# DigitalOcean Spaces Storage
DO_SPACES_KEY=your-spaces-access-key
DO_SPACES_SECRET=your-spaces-secret-key
DO_SPACES_BUCKET=motion-index-docs
DO_SPACES_REGION=nyc3
DO_SPACES_CDN_DOMAIN=optional-custom-cdn-domain

# DigitalOcean Managed OpenSearch
OPENSEARCH_HOST=your-cluster.k.db.ondigitalocean.com
OPENSEARCH_PORT=25060
OPENSEARCH_USERNAME=doadmin
OPENSEARCH_PASSWORD=your-opensearch-password
OPENSEARCH_USE_SSL=true
OPENSEARCH_INDEX=documents

# Legacy OpenSearch Variables (for compatibility)
ES_HOST=your-cluster.k.db.ondigitalocean.com
ES_PORT=25060
ES_USERNAME=doadmin
ES_PASSWORD=your-opensearch-password
ES_USE_SSL=true
ES_INDEX=documents

External Services

# Supabase Authentication
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_KEY=your-service-key

# OpenAI for Document Classification
OPENAI_API_KEY=your-openai-api-key
OPENAI_MODEL=gpt-4

# Processing Configuration
MAX_FILE_SIZE=104857600  # 100MB
MAX_WORKERS=10
BATCH_SIZE=50
PROCESS_TIMEOUT=5m

API Endpoints

Health & Status

  • GET / - Root status and service information
  • GET /health - Basic health check

Document Processing & Management

  • POST /api/v1/categorise - Upload and process documents with AI classification
  • POST /api/v1/analyze-redactions - Analyze PDF redactions for legal compliance
  • POST /api/v1/redact-document - Create redacted version of a document
  • POST /api/v1/update-metadata - Update document metadata (currently unprotected)
  • DELETE /api/v1/documents/:id - Delete documents (currently unprotected)

Search & Discovery

  • POST /api/v1/search - Advanced document search with legal filtering
  • GET /api/v1/legal-tags - Get available legal document types and counts
  • GET /api/v1/document-types - Get document type classifications
  • GET /api/v1/document-stats - Index statistics and analytics
  • GET /api/v1/field-options - Get available search field options
  • GET /api/v1/metadata-fields - Get available metadata fields with types
  • GET /api/v1/metadata-fields/:field - Get values for specific metadata fields
  • GET /api/v1/documents/:id - Get specific document details
  • GET /api/v1/documents/:id/redactions - Get redaction analysis for a document

File Storage & CDN

  • GET /api/v1/documents/* - Serve documents (automatic CDN redirects)

Storage Management

  • GET /api/v1/storage/documents - List documents in storage
  • GET /api/v1/storage/documents/count - Get document count statistics

Batch Processing

  • POST /api/v1/batch/classify - Start batch classification job
  • GET /api/v1/batch/:job_id/status - Get batch job status
  • GET /api/v1/batch/:job_id/results - Get batch job results
  • DELETE /api/v1/batch/:job_id - Cancel batch job

Document Indexing

  • POST /api/v1/index/document - Index a document for search

Testing Endpoints

# Test basic connectivity
curl http://localhost:6000/

# Test health check
curl http://localhost:6000/health

# Test search functionality
curl -X POST http://localhost:6000/api/v1/search \
  -H "Content-Type: application/json" \
  -d '{"query": "motion to dismiss", "size": 10}'

# Test document upload (multipart form)
curl -X POST http://localhost:6000/api/v1/categorise \
  -F "file=@document.pdf" \
  -F "case_name=Test Case" \
  -F "category=motion"

# Test metadata fields
curl http://localhost:6000/api/v1/metadata-fields

# Test specific field values
curl http://localhost:6000/api/v1/metadata-fields/court

# Test document stats
curl http://localhost:6000/api/v1/document-stats

# Test document retrieval
curl http://localhost:6000/api/v1/documents/some-document-id

# Test document redaction analysis
curl http://localhost:6000/api/v1/documents/some-document-id/redactions

# Test redact document
curl -X POST http://localhost:6000/api/v1/redact-document \
  -H "Content-Type: application/json" \
  -d '{"document_id": "some-document-id", "apply_redactions": true}'

# Test storage document listing
curl http://localhost:6000/api/v1/storage/documents

# Test storage document count
curl http://localhost:6000/api/v1/storage/documents/count

Development

Architecture & Project Structure

High-Level Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Web Client    │    │   Mobile App    │    │  Third Party    │
└─────────┬───────┘    └─────────┬───────┘    └─────────┬───────┘
          │                      │                      │
          └──────────────────────┼──────────────────────┘
                                 │
                    ┌─────────────▼──────────────┐
                    │     Motion Index API       │
                    │    (Fiber v2 + Go 1.21)   │
                    └─────────────┬──────────────┘
                                 │
          ┌──────────────────────┼──────────────────────┐
          │                      │                      │
    ┌─────▼─────┐        ┌───────▼────────┐     ┌──────▼──────┐
    │DigitalOcean│        │   Supabase     │     │   OpenAI    │
    │  Services  │        │ Authentication │     │Classification│
    │ (MCP+S3)   │        │      (JWT)     │     │    (GPT-4)  │
    └─────┬─────┘        └────────────────┘     └─────────────┘
          │
    ┌─────▼─────┐        ┌────────────────┐
    │  Spaces   │        │   OpenSearch   │
    │ Storage   │        │   Cluster      │
    │ + CDN     │        │  (Search)      │
    └───────────┘        └────────────────┘

Project Structure

motion-index-fiber/
├── cmd/server/                    # Application entry point
├── pkg/                          # Public libraries (following Go conventions)
│   ├── cloud/digitalocean/       # DigitalOcean MCP integration
│   │   ├── config/              # Configuration management
│   │   ├── spaces/              # Storage service implementation
│   │   ├── opensearch/          # Search service implementation
│   │   └── factory.go           # Service factory pattern
│   ├── processing/              # Document processing pipeline
│   │   ├── classifier/          # AI-powered classification
│   │   ├── extractor/           # Multi-format text extraction
│   │   └── pipeline/            # Processing workflow
│   ├── search/                  # Search interfaces and models
│   ├── storage/                 # Storage interfaces and utilities
│   ├── auth/                    # Authentication utilities
│   └── redaction/               # Legal document redaction
├── internal/                    # Private application code
│   ├── config/                  # Application configuration
│   ├── handlers/                # HTTP request handlers
│   ├── middleware/              # Custom middleware
│   ├── models/                  # Data models and validation
│   └── testutil/               # Test utilities (UNIX principles)
├── docs/                        # Comprehensive documentation
│   ├── api/                     # API documentation
│   ├── development/             # Development guides
│   └── deployment/              # Deployment guides
├── test/                        # Test suites
│   ├── integration/             # Integration tests
│   ├── unit/                    # Unit tests
│   └── testdata/               # Test fixtures
├── deployments/                 # Deployment configurations
│   ├── digitalocean/           # DigitalOcean App Platform
│   ├── docker/                 # Docker configurations
│   └── k8s/                    # Kubernetes manifests
└── scripts/                     # Utility scripts

Development Workflow

UNIX Philosophy & TDD Principles

This project follows strict UNIX philosophy and Test-Driven Development principles:

  • Do One Thing Well: Each component has a single, focused responsibility
  • Composable: Services can be combined and used independently
  • Testable: 100% test coverage with unit, integration, and benchmark tests
  • Observable: Comprehensive health checks, metrics, and logging

Running Tests

# Run all tests with coverage
go test ./... -v -coverprofile=coverage.out
go tool cover -html=coverage.out

# Unit tests only
go test ./... -short -v

# Integration tests (requires real DigitalOcean credentials)
RUN_INTEGRATION_TESTS=true go test ./... -v -tags=integration

# Benchmark tests
go test ./... -bench=. -benchmem

# Specific package tests
go test ./internal/config/... -v
go test ./pkg/cloud/digitalocean/... -v

# Test with race detection
go test ./... -race -v

Test Categories

  1. Unit Tests (*_test.go): Fast, isolated tests with mocks
  2. Integration Tests (integration_test.go): Tests with real services
  3. Benchmark Tests: Performance validation and optimization
  4. End-to-End Tests: Full API workflow testing

Building & Development Commands

# Development build
go build -o bin/server cmd/server/main.go

# Production build with optimizations
go build -ldflags="-s -w" -o bin/server cmd/server/main.go

# Cross-platform builds
GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o bin/server-linux cmd/server/main.go
GOOS=darwin GOARCH=amd64 go build -ldflags="-s -w" -o bin/server-darwin cmd/server/main.go

# Development server with auto-reload (install air)
go install github.com/cosmtrek/air@latest
air

# Code quality checks
go vet ./...
go fmt ./...
golangci-lint run

# Dependency management
go mod tidy
go mod verify
go mod download

Deployment

DigitalOcean App Platform (Recommended)

The application is optimized for DigitalOcean App Platform with auto-scaling and managed services:

# 1. Push to GitHub repository
git push origin main

# 2. Deploy via DigitalOcean CLI
doctl apps create --spec deployments/digitalocean/app.yaml

# 3. Update deployment
doctl apps update <app-id> --spec deployments/digitalocean/app.yaml

# 4. Monitor deployment
doctl apps list
doctl apps get <app-id>

Environment Variables for Production

Configure in App Platform dashboard or via API:

  • All production environment variables from Configuration section
  • ENVIRONMENT=production
  • PRODUCTION=true
  • Real DigitalOcean credentials and endpoints

Docker Deployment

# Build image
docker build -f deployments/docker/Dockerfile -t motion-index-fiber:latest .

# Run with production configuration
docker run -d \
  --name motion-index \
  -p 8003:8003 \
  --env-file .env.production \
  --restart unless-stopped \
  motion-index-fiber:latest

# Docker Compose (includes monitoring)
docker-compose -f deployments/docker/docker-compose.yml up -d

Kubernetes Deployment

# Apply Kubernetes manifests
kubectl apply -f deployments/k8s/

# Monitor deployment
kubectl get pods -l app=motion-index
kubectl logs -f deployment/motion-index

# Scale deployment
kubectl scale deployment motion-index --replicas=3

Implementation Status

✅ Phase 1-6: Core Infrastructure (COMPLETED)

  • Foundation Setup: Fiber v2 application with comprehensive middleware
  • DigitalOcean Integration: Full MCP integration with Spaces and OpenSearch
  • Application Integration: Handler implementation and service factory pattern
  • Search Service: OpenSearch client with legal document support
  • Authentication: JWT middleware with Supabase integration
  • Configuration Management: Environment-based config with validation
  • Health Monitoring: Comprehensive health checks and metrics collection

✅ Phase 7: Test-Driven Quality Assurance (COMPLETED)

  • Unit Test Foundation: Configuration validation and mock services
  • UNIX Philosophy: Testable, composable, single-responsibility components
  • Test Utilities: Comprehensive test helpers following TDD principles
  • Service Layer Testing: DigitalOcean factory and service integration tests

🚧 Phase 7-8: Testing & Production Readiness (IN PROGRESS)

  • Service layer testing with mocks and real implementations
  • Handler testing with HTTP utilities and endpoint validation
  • Integration testing with end-to-end workflows
  • Performance optimization and benchmarking
  • Security hardening and input validation

📋 Phase 9: Feature Completion (PLANNED)

  • Document Processing: Multi-format extraction and AI classification
  • Advanced Search: Legal filtering, aggregations, and analytics
  • Storage Optimization: CDN management, batch operations, and caching
  • Redaction Analysis: PDF compliance and legal document validation
  • Performance Monitoring: Real-time metrics and alerting

🎯 Current Focus: Test Coverage & Production Readiness

  • Test Coverage: Achieving 100% coverage with unit, integration, and benchmark tests
  • Performance: Optimizing for high-throughput legal document processing
  • Security: Hardening authentication, input validation, and error handling
  • Observability: Enhanced monitoring, logging, and health checks

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Performance Targets

Document Processing

  • Upload Throughput: 50MB/s for large files (>10MB), 200ms for small files (<1MB)
  • Text Extraction: <2 seconds for 50MB PDFs, <500ms for DOCX/TXT files
  • AI Classification: <1 second per document (GPT-4 integration)
  • Batch Processing: 100 documents/second with concurrent workers

Search Performance

  • Query Response: <100ms typical, <500ms complex legal queries
  • Index Operations: <50ms for document indexing
  • Aggregations: <200ms for legal tag and metadata aggregations
  • Full-Text Search: <100ms for most legal document searches

System Performance

  • Concurrent Requests: 1000+ simultaneous connections
  • Memory Usage: <1GB typical, <2GB peak under load
  • CPU Utilization: <70% under normal load, auto-scaling available
  • Storage Throughput: 100MB/s download via CDN, 50MB/s upload
  • Health Checks: <100ms response time, <5% error rate

Support

For issues and questions, please open an issue on GitHub or contact the development team.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages