# Privacy-Conscious Delegation with DSPy

This notebook demonstrates how to build privacy-conscious AI systems using DSPy that can delegate sensitive tasks while maintaining data privacy.

Based on the DSPy tutorial: [Privacy-Conscious Delegation](https://dspy.ai/tutorials/papillon/)

## Setup

Import necessary libraries and configure the environment.

In [None]:
import os
import sys
sys.path.append('../../')

import dspy
from utils import setup_default_lm, print_step, print_result, print_error
from dotenv import load_dotenv
import hashlib
import uuid

# Load environment variables
load_dotenv('../../.env')

## Language Model Configuration

Set up DSPy with a language model for privacy-conscious processing.

In [None]:
print_step("Setting up Language Model", "Configuring DSPy for privacy-conscious delegation")

try:
    # Use a privacy-focused model configuration
    lm = setup_default_lm(provider="openai", model="gpt-4o", max_tokens=1000)
    dspy.configure(lm=lm)
    print_result("Language model configured successfully!", "Status")
except Exception as e:
    print_error(f"Failed to configure language model: {e}")

## Privacy-Preserving Data Handler

Create a class to handle sensitive data with privacy protection.

In [None]:
class PrivacyHandler:
    """Handles privacy-sensitive data processing."""
    
    def __init__(self):
        self.sensitive_data_map = {}
        self.anonymization_map = {}
    
    def anonymize_text(self, text: str, sensitive_entities: list) -> str:
        """Replace sensitive entities with anonymized placeholders."""
        anonymized_text = text
        
        for entity in sensitive_entities:
            if entity not in self.anonymization_map:
                # Generate a unique placeholder
                placeholder = f"[ENTITY_{len(self.anonymization_map)}]"
                self.anonymization_map[entity] = placeholder
                self.sensitive_data_map[placeholder] = entity
            
            anonymized_text = anonymized_text.replace(entity, self.anonymization_map[entity])
        
        return anonymized_text
    
    def deanonymize_text(self, text: str) -> str:
        """Restore original entities from anonymized text."""
        deanonymized_text = text
        
        for placeholder, original in self.sensitive_data_map.items():
            deanonymized_text = deanonymized_text.replace(placeholder, original)
        
        return deanonymized_text

# Test the privacy handler
privacy_handler = PrivacyHandler()

# Example sensitive data
sensitive_text = "John Smith works at Google and his email is john.smith@gmail.com"
sensitive_entities = ["John Smith", "Google", "john.smith@gmail.com"]

anonymized = privacy_handler.anonymize_text(sensitive_text, sensitive_entities)
print_result(f"Original: {sensitive_text}")
print_result(f"Anonymized: {anonymized}")
print_result(f"Restored: {privacy_handler.deanonymize_text(anonymized)}")

## Privacy-Conscious Processing Signatures

Define signatures for privacy-aware text processing.

In [None]:
class SensitiveDataDetection(dspy.Signature):
    """Detect sensitive information in text without exposing the actual data."""
    
    text = dspy.InputField(desc="Text to analyze for sensitive information")
    sensitive_types = dspy.OutputField(desc="Types of sensitive information detected (without revealing actual values)")

class PrivacyAnalysis(dspy.Signature):
    """Analyze anonymized text for insights while preserving privacy."""
    
    anonymized_text = dspy.InputField(desc="Text with sensitive information anonymized")
    analysis = dspy.OutputField(desc="Analysis and insights from the anonymized text")

class SecureRecommendation(dspy.Signature):
    """Provide recommendations based on anonymized analysis."""
    
    analysis = dspy.InputField(desc="Analysis of anonymized data")
    recommendations = dspy.OutputField(desc="Privacy-safe recommendations")

## Privacy-Conscious Processing Module

Create a module that processes sensitive data while maintaining privacy.

In [None]:
class PrivacyConsciousProcessor(dspy.Module):
    """A module for processing sensitive data with privacy protection."""
    
    def __init__(self):
        super().__init__()
        self.privacy_handler = PrivacyHandler()
        self.detect_sensitive = dspy.ChainOfThought(SensitiveDataDetection)
        self.analyze_privacy = dspy.ChainOfThought(PrivacyAnalysis)
        self.secure_recommend = dspy.ChainOfThought(SecureRecommendation)
    
    def forward(self, text: str, sensitive_entities: list = None):
        """Process text while maintaining privacy."""
        
        # Step 1: Detect sensitive information types
        print_step("Step 1: Sensitive Data Detection")
        detection_result = self.detect_sensitive(text=text)
        print_result(detection_result.sensitive_types, "Detected Sensitive Types")
        
        # Step 2: Anonymize the text if sensitive entities are provided
        if sensitive_entities:
            anonymized_text = self.privacy_handler.anonymize_text(text, sensitive_entities)
            print_step("Step 2: Text Anonymization")
            print_result(anonymized_text, "Anonymized Text")
        else:
            anonymized_text = text
        
        # Step 3: Analyze the anonymized text
        print_step("Step 3: Privacy-Safe Analysis")
        analysis_result = self.analyze_privacy(anonymized_text=anonymized_text)
        print_result(analysis_result.analysis, "Analysis")
        
        # Step 4: Generate secure recommendations
        print_step("Step 4: Secure Recommendations")
        recommendation_result = self.secure_recommend(analysis=analysis_result.analysis)
        print_result(recommendation_result.recommendations, "Recommendations")
        
        return dspy.Prediction(
            sensitive_types=detection_result.sensitive_types,
            anonymized_text=anonymized_text,
            analysis=analysis_result.analysis,
            recommendations=recommendation_result.recommendations
        )

# Initialize the processor
processor = PrivacyConsciousProcessor()

## Example: Processing Employee Feedback

Let's process employee feedback that contains sensitive information.

In [None]:
# Example employee feedback with sensitive information
employee_feedback = """
Hi, this is Sarah Johnson from the Marketing Department. 
I wanted to provide feedback about our recent project with Microsoft. 
My manager, David Wilson, has been very supportive, but I think we need 
better tools for collaboration. My work email is sarah.j@company.com 
and I've been working on the Adobe Creative Suite integration project.
The project budget was $150,000 and we completed it on March 15, 2024.
"""

# Identify sensitive entities
sensitive_entities = [
    "Sarah Johnson", 
    "David Wilson", 
    "Microsoft", 
    "sarah.j@company.com", 
    "$150,000", 
    "March 15, 2024"
]

print_step("Processing Employee Feedback", "Analyzing feedback while protecting privacy")

# Process the feedback
result = processor(text=employee_feedback, sensitive_entities=sensitive_entities)

print_step("Final Results Summary")
print("✓ Sensitive information detected and protected")
print("✓ Analysis completed on anonymized data")
print("✓ Secure recommendations generated")
print("✓ Original data privacy maintained")

## Example: Customer Service Data Protection

Process customer service interactions while protecting customer privacy.

In [None]:
# Customer service interaction with PII
customer_interaction = """
Customer: Hi, I'm having issues with my order. My name is Emily Chen 
and my order number is ORD-12345-XYZ. I live at 123 Main Street, 
San Francisco, CA 94102. My phone number is (555) 123-4567 and 
email is emily.chen@email.com. The order was placed on my credit 
card ending in 1234.
"""

customer_sensitive_entities = [
    "Emily Chen",
    "ORD-12345-XYZ", 
    "123 Main Street, San Francisco, CA 94102",
    "(555) 123-4567",
    "emily.chen@email.com",
    "1234"
]

print_step("Processing Customer Service Data", "Protecting customer privacy")

# Process customer data
customer_result = processor(text=customer_interaction, sensitive_entities=customer_sensitive_entities)

# Show how we can still get insights without exposing customer data
print_step("Privacy-Protected Customer Insights")
print("✓ Customer issue type identified without exposing identity")
print("✓ Process improvements suggested based on anonymized patterns")
print("✓ Service quality metrics generated safely")

## Advanced Privacy Features

Implement additional privacy-preserving techniques.

In [None]:
class AdvancedPrivacyProcessor(dspy.Module):
    """Advanced privacy-conscious processing with differential privacy concepts."""
    
    def __init__(self):
        super().__init__()
        self.basic_processor = PrivacyConsciousProcessor()
    
    def add_noise_to_metrics(self, metrics: dict, noise_level: float = 0.1):
        """Add noise to numerical metrics for differential privacy."""
        import random
        
        noisy_metrics = {}
        for key, value in metrics.items():
            if isinstance(value, (int, float)):
                noise = random.gauss(0, noise_level * abs(value))
                noisy_metrics[key] = max(0, value + noise)  # Ensure non-negative
            else:
                noisy_metrics[key] = value
        
        return noisy_metrics
    
    def k_anonymize_groups(self, data_groups: list, k: int = 3):
        """Implement k-anonymity for grouped data."""
        # Simple k-anonymity implementation
        anonymized_groups = []
        
        for group in data_groups:
            if len(group) >= k:
                anonymized_groups.append(f"Group of {len(group)} entities")
            else:
                anonymized_groups.append(f"Small group (< {k} entities)")
        
        return anonymized_groups

# Example of advanced privacy features
advanced_processor = AdvancyPrivacyProcessor()

# Simulate some metrics
sample_metrics = {
    "response_time": 2.5,
    "satisfaction_score": 4.2,
    "resolution_rate": 0.85,
    "category": "technical_support"
}

print_step("Advanced Privacy Features")
noisy_metrics = advanced_processor.add_noise_to_metrics(sample_metrics)
print_result(f"Original metrics: {sample_metrics}")
print_result(f"Privacy-protected metrics: {noisy_metrics}")

# K-anonymity example
data_groups = [
    ["user1", "user2", "user3", "user4"],  # Group of 4
    ["user5", "user6"],  # Group of 2
    ["user7", "user8", "user9", "user10", "user11"]  # Group of 5
]

k_anon_groups = advanced_processor.k_anonymize_groups(data_groups, k=3)
print_result(f"K-anonymized groups: {k_anon_groups}")

## Best Practices for Privacy-Conscious AI

Key principles for building privacy-aware AI systems:

1. **Data Minimization**: Only collect and process necessary data
2. **Purpose Limitation**: Use data only for stated purposes
3. **Anonymization**: Remove or mask identifying information
4. **Differential Privacy**: Add noise to protect individual privacy
5. **Secure Processing**: Use encrypted and secure computation methods
6. **Transparency**: Be clear about data usage and privacy protections
7. **Regular Audits**: Monitor and audit privacy protections

## Conclusion

This notebook demonstrated how to build privacy-conscious AI systems using DSPy that can:

- Detect sensitive information automatically
- Anonymize data while preserving utility
- Generate insights without exposing private information
- Implement advanced privacy techniques like differential privacy
- Maintain transparency about privacy protections

These techniques are essential for building AI systems that comply with privacy regulations like GDPR, CCPA, and HIPAA.