# Coordination Gap Detection Demo

This notebook demonstrates the coordination gap detection system's capabilities using realistic mock data scenarios.

## What You'll Learn

1. **Loading Data**: How to work with mock Slack scenarios
2. **Gap Detection**: Running the detection pipeline
3. **Entity Extraction**: Identifying teams, people, and projects
4. **Impact Analysis**: Understanding gap severity and cost
5. **Visualization**: Exploring coordination patterns

## Prerequisites

Ensure the API is running:
```bash
docker compose up -d
```

In [None]:
# Install required packages (if not already available)
# !pip install requests pandas matplotlib seaborn networkx

In [None]:
import requests
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from typing import Dict, List, Any
import json

# Configure plotting
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

## 1. API Client Setup

Let's create a simple client to interact with the Gap Detection API.

In [None]:
class GapDetectionClient:
    """Simple client for interacting with the Gap Detection API."""
    
    def __init__(self, base_url: str = "http://localhost:8000"):
        self.base_url = base_url
        self.session = requests.Session()
    
    def detect_gaps(
        self,
        timeframe_days: int = 30,
        gap_types: List[str] = None,
        min_impact_score: float = 0.6,
        include_evidence: bool = True,
    ) -> Dict[str, Any]:
        """Run gap detection."""
        payload = {
            "timeframe_days": timeframe_days,
            "gap_types": gap_types or ["duplicate_work"],
            "min_impact_score": min_impact_score,
            "include_evidence": include_evidence,
        }
        
        response = self.session.post(
            f"{self.base_url}/api/v1/gaps/detect",
            json=payload,
        )
        response.raise_for_status()
        return response.json()
    
    def list_gaps(
        self,
        gap_type: str = None,
        min_impact_score: float = None,
        limit: int = 10,
    ) -> Dict[str, Any]:
        """List existing gaps."""
        params = {"limit": limit}
        if gap_type:
            params["gap_type"] = gap_type
        if min_impact_score:
            params["min_impact_score"] = min_impact_score
        
        response = self.session.get(
            f"{self.base_url}/api/v1/gaps",
            params=params,
        )
        response.raise_for_status()
        return response.json()
    
    def get_gap(self, gap_id: str) -> Dict[str, Any]:
        """Get specific gap details."""
        response = self.session.get(f"{self.base_url}/api/v1/gaps/{gap_id}")
        response.raise_for_status()
        return response.json()

# Initialize client
client = GapDetectionClient()
print("✓ API client initialized")

## 2. Load Mock Data Scenarios

The system includes 6 realistic mock data scenarios:

### Positive Cases (Should Detect Gaps)
1. **OAuth Duplication**: Platform and Auth teams independently implementing OAuth2
2. **API Redesign**: Mobile and Backend teams duplicating API work
3. **Auth Migration**: Security and Platform duplicating JWT migration

### Edge Cases (Should NOT Detect)
4. **Similar Topics, Different Scope**: User auth vs service auth (different purposes)
5. **Sequential Work**: Team B starts after Team A completes (no overlap)
6. **Intentional Collaboration**: Teams explicitly coordinating together

Let's load the OAuth duplication scenario as an example:

In [None]:
# Note: In production, data would be loaded from Slack/GitHub/etc.
# For this demo, we'll use the mock data scenarios

print("Mock data scenarios available:")
print("""\n1. oauth_duplication - Platform and Auth teams independently implementing OAuth2
2. api_redesign_duplication - Mobile and Backend duplicating API work  
3. auth_migration_duplication - Security and Platform duplicating JWT migration
4. similar_topics_different_scope - Different scopes (should NOT detect)
5. sequential_work - No temporal overlap (should NOT detect)
6. intentional_collaboration - Explicit coordination (should NOT detect)
""")

## 3. Run Gap Detection

Let's run the detection pipeline and analyze the results.

In [None]:
# Run gap detection
print("Running gap detection...")
detection_result = client.detect_gaps(
    timeframe_days=30,
    gap_types=["duplicate_work"],
    min_impact_score=0.6,
    include_evidence=True,
)

print(f"\n✓ Detection complete!")
print(f"  - Total gaps detected: {detection_result['metadata']['total_gaps']}")
print(f"  - Messages analyzed: {detection_result['metadata']['messages_analyzed']}")
print(f"  - Detection time: {detection_result['metadata']['detection_time_ms']}ms")

## 4. Analyze Detected Gaps

Let's examine the gaps that were detected.

In [None]:
# Convert gaps to DataFrame for easy analysis
gaps = detection_result.get('gaps', [])

if gaps:
    # Create summary DataFrame
    gap_summary = pd.DataFrame([
        {
            'Gap ID': gap['id'],
            'Type': gap['type'],
            'Topic': gap.get('topic', 'N/A'),
            'Teams': ', '.join(gap.get('teams_involved', [])),
            'Impact Score': gap['impact_score'],
            'Confidence': gap['confidence'],
            'Evidence Count': len(gap.get('evidence', [])),
        }
        for gap in gaps
    ])
    
    print("\nDetected Gaps Summary:")
    print("=" * 80)
    display(gap_summary)
else:
    print("\nNo gaps detected in the current dataset.")
    print("This is expected if:")
    print("  - Detection algorithm is not fully implemented yet")
    print("  - No duplicate work scenarios are loaded")
    print("  - Thresholds are too strict")

## 5. Deep Dive: Gap Evidence Analysis

Let's examine the evidence for the first detected gap (if any).

In [None]:
if gaps:
    # Analyze first gap in detail
    gap = gaps[0]
    
    print(f"Gap Analysis: {gap['id']}")
    print("=" * 80)
    print(f"\nType: {gap['type']}")
    print(f"Topic: {gap.get('topic', 'N/A')}")
    print(f"Teams Involved: {', '.join(gap.get('teams_involved', []))}")
    print(f"Impact Score: {gap['impact_score']:.2f}")
    print(f"Confidence: {gap['confidence']:.2f}")
    
    # Evidence timeline
    evidence = gap.get('evidence', [])
    if evidence:
        print(f"\nEvidence ({len(evidence)} items):")
        print("-" * 80)
        
        evidence_df = pd.DataFrame([
            {
                'Source': e.get('source', 'N/A'),
                'Author': e.get('author', 'N/A'),
                'Channel': e.get('channel', 'N/A'),
                'Timestamp': e.get('timestamp', 'N/A'),
                'Preview': e.get('content', '')[:60] + '...' if len(e.get('content', '')) > 60 else e.get('content', ''),
            }
            for e in evidence[:10]  # Show first 10
        ])
        
        display(evidence_df)
    
    # Recommendation
    if 'recommendation' in gap:
        print(f"\nRecommendation:")
        print("-" * 80)
        print(gap['recommendation'])
else:
    print("No gaps available for detailed analysis.")

## 6. Entity Extraction Analysis

Let's analyze the entities (teams, people, projects) involved in detected gaps.

In [None]:
if gaps:
    # Extract all entities from gaps
    all_teams = set()
    all_people = set()
    all_projects = set()
    
    for gap in gaps:
        all_teams.update(gap.get('teams_involved', []))
        
        # Extract from evidence
        for evidence in gap.get('evidence', []):
            if 'author' in evidence:
                all_people.add(evidence['author'])
            if 'team' in evidence.get('metadata', {}):
                all_teams.add(evidence['metadata']['team'])
    
    print("Entity Extraction Summary:")
    print("=" * 80)
    print(f"\nTeams involved: {len(all_teams)}")
    for team in sorted(all_teams):
        print(f"  - {team}")
    
    print(f"\nPeople involved: {len(all_people)}")
    for person in sorted(all_people):
        print(f"  - {person}")
    
    # Team co-occurrence matrix
    teams_list = list(all_teams)
    if len(teams_list) >= 2:
        print("\nTeam Co-occurrence in Gaps:")
        team_cooccurrence = pd.DataFrame(
            0, 
            index=teams_list, 
            columns=teams_list
        )
        
        for gap in gaps:
            involved = gap.get('teams_involved', [])
            for t1 in involved:
                for t2 in involved:
                    if t1 in team_cooccurrence.index and t2 in team_cooccurrence.columns:
                        team_cooccurrence.loc[t1, t2] += 1
        
        display(team_cooccurrence)
else:
    print("No entities to analyze.")

## 7. Impact Score Distribution

Let's visualize the distribution of impact scores across detected gaps.

In [None]:
if gaps and len(gaps) > 0:
    # Extract impact scores
    impact_scores = [gap['impact_score'] for gap in gaps]
    confidence_scores = [gap['confidence'] for gap in gaps]
    
    # Create visualization
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Impact score distribution
    axes[0].hist(impact_scores, bins=10, edgecolor='black', alpha=0.7)
    axes[0].axvline(0.8, color='red', linestyle='--', label='Critical (0.8+)')
    axes[0].axvline(0.6, color='orange', linestyle='--', label='High (0.6+)')
    axes[0].set_xlabel('Impact Score')
    axes[0].set_ylabel('Frequency')
    axes[0].set_title('Gap Impact Score Distribution')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # Confidence vs Impact scatter
    axes[1].scatter(confidence_scores, impact_scores, alpha=0.6, s=100)
    axes[1].axhline(0.8, color='red', linestyle='--', alpha=0.5, label='Critical Impact')
    axes[1].axhline(0.6, color='orange', linestyle='--', alpha=0.5, label='High Impact')
    axes[1].axvline(0.7, color='blue', linestyle='--', alpha=0.5, label='Min Confidence')
    axes[1].set_xlabel('Confidence Score')
    axes[1].set_ylabel('Impact Score')
    axes[1].set_title('Confidence vs Impact')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Summary statistics
    print("\nImpact Score Statistics:")
    print("=" * 80)
    print(f"Mean: {pd.Series(impact_scores).mean():.2f}")
    print(f"Median: {pd.Series(impact_scores).median():.2f}")
    print(f"Std Dev: {pd.Series(impact_scores).std():.2f}")
    print(f"Min: {pd.Series(impact_scores).min():.2f}")
    print(f"Max: {pd.Series(impact_scores).max():.2f}")
else:
    print("No gaps available for visualization.")

## 8. Gap Type Breakdown

Analyze the distribution of different gap types.

In [None]:
if gaps and len(gaps) > 0:
    # Count by type
    gap_types = pd.Series([gap['type'] for gap in gaps]).value_counts()
    
    # Visualization
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Bar chart
    gap_types.plot(kind='bar', ax=axes[0], color='steelblue', edgecolor='black')
    axes[0].set_xlabel('Gap Type')
    axes[0].set_ylabel('Count')
    axes[0].set_title('Gaps by Type')
    axes[0].tick_params(axis='x', rotation=45)
    axes[0].grid(True, alpha=0.3)
    
    # Pie chart
    gap_types.plot(kind='pie', ax=axes[1], autopct='%1.1f%%', startangle=90)
    axes[1].set_ylabel('')
    axes[1].set_title('Gap Type Distribution')
    
    plt.tight_layout()
    plt.show()
    
    print("\nGap Type Summary:")
    print("=" * 80)
    for gap_type, count in gap_types.items():
        print(f"{gap_type}: {count} ({count/len(gaps)*100:.1f}%)")
else:
    print("No gaps available for type analysis.")

## 9. Temporal Analysis

Analyze when gaps were detected over time.

In [None]:
if gaps and len(gaps) > 0:
    # Extract detection timestamps
    gap_dates = []
    for gap in gaps:
        if 'detected_at' in gap:
            gap_dates.append(pd.to_datetime(gap['detected_at']))
    
    if gap_dates:
        # Create timeline
        gap_timeline = pd.Series(1, index=gap_dates)
        daily_gaps = gap_timeline.resample('D').sum()
        
        # Plot
        plt.figure(figsize=(14, 5))
        daily_gaps.plot(kind='line', marker='o', linewidth=2, markersize=8)
        plt.xlabel('Date')
        plt.ylabel('Gaps Detected')
        plt.title('Gap Detection Timeline')
        plt.grid(True, alpha=0.3)
        plt.tight_layout()
        plt.show()
        
        print("\nTemporal Statistics:")
        print("=" * 80)
        print(f"First gap detected: {min(gap_dates).strftime('%Y-%m-%d %H:%M:%S')}")
        print(f"Latest gap detected: {max(gap_dates).strftime('%Y-%m-%d %H:%M:%S')}")
        print(f"Total days with gaps: {(daily_gaps > 0).sum()}")
        print(f"Average gaps per day: {daily_gaps.mean():.2f}")
    else:
        print("No temporal data available for gaps.")
else:
    print("No gaps available for temporal analysis.")

## 10. Cost Estimation

Estimate the organizational cost of detected coordination gaps.

In [None]:
if gaps and len(gaps) > 0:
    # Cost estimation parameters
    AVG_HOURLY_RATE = 100  # $100/hour loaded engineer cost
    
    total_estimated_hours = 0
    gap_costs = []
    
    for gap in gaps:
        # Estimate hours based on impact score and evidence
        impact = gap['impact_score']
        evidence_count = len(gap.get('evidence', []))
        
        # Simple heuristic: impact * evidence * 10 hours
        estimated_hours = impact * evidence_count * 10
        estimated_cost = estimated_hours * AVG_HOURLY_RATE
        
        gap_costs.append({
            'gap_id': gap['id'],
            'type': gap['type'],
            'impact_score': impact,
            'estimated_hours': estimated_hours,
            'estimated_cost_usd': estimated_cost,
        })
        
        total_estimated_hours += estimated_hours
    
    # Create cost DataFrame
    cost_df = pd.DataFrame(gap_costs)
    
    print("\nCost Estimation:")
    print("=" * 80)
    display(cost_df)
    
    # Summary
    total_cost = cost_df['estimated_cost_usd'].sum()
    print(f"\nTotal Estimated Impact:")
    print(f"  - Engineering hours wasted: {total_estimated_hours:.1f} hours")
    print(f"  - Estimated cost: ${total_cost:,.2f}")
    print(f"  - Average cost per gap: ${total_cost/len(gaps):,.2f}")
    
    # Visualization
    plt.figure(figsize=(10, 6))
    cost_df.plot(x='gap_id', y='estimated_cost_usd', kind='bar', 
                 color='coral', edgecolor='black', legend=False)
    plt.xlabel('Gap ID')
    plt.ylabel('Estimated Cost (USD)')
    plt.title('Estimated Organizational Cost by Gap')
    plt.xticks(rotation=45)
    plt.grid(True, alpha=0.3, axis='y')
    plt.tight_layout()
    plt.show()
else:
    print("No gaps available for cost estimation.")

## 11. Actionable Recommendations

Extract and summarize recommendations for resolving detected gaps.

In [None]:
if gaps and len(gaps) > 0:
    print("Actionable Recommendations:")
    print("=" * 80)
    
    for i, gap in enumerate(gaps, 1):
        print(f"\n{i}. Gap: {gap['id']}")
        print(f"   Type: {gap['type']}")
        print(f"   Impact: {gap['impact_score']:.2f}")
        print(f"   Teams: {', '.join(gap.get('teams_involved', []))}")
        
        if 'recommendation' in gap:
            print(f"\n   Recommendation:")
            print(f"   {gap['recommendation']}")
        else:
            print(f"\n   Recommendation: [To be generated by LLM]")
        
        print("-" * 80)
else:
    print("No recommendations available.")

## 12. Export Results

Export the analysis results for further processing or reporting.

In [None]:
if gaps and len(gaps) > 0:
    # Export to JSON
    export_data = {
        'analysis_date': datetime.now().isoformat(),
        'total_gaps': len(gaps),
        'metadata': detection_result.get('metadata', {}),
        'gaps': gaps,
    }
    
    output_file = 'gap_analysis_results.json'
    with open(output_file, 'w') as f:
        json.dump(export_data, f, indent=2)
    
    print(f"✓ Results exported to: {output_file}")
    
    # Export summary to CSV
    if 'gap_summary' in dir():
        csv_file = 'gap_summary.csv'
        gap_summary.to_csv(csv_file, index=False)
        print(f"✓ Summary exported to: {csv_file}")
else:
    print("No results to export.")

## Summary

This notebook demonstrated:

1. **API Integration**: Using the Gap Detection API programmatically
2. **Gap Analysis**: Understanding detected coordination failures
3. **Entity Extraction**: Identifying teams and people involved
4. **Impact Assessment**: Quantifying organizational cost
5. **Visualization**: Exploring patterns in coordination gaps
6. **Recommendations**: Actionable steps to resolve gaps

## Next Steps

1. **Load Real Data**: Connect to actual Slack/GitHub/Google Docs sources
2. **Tune Detection**: Adjust thresholds based on your organization
3. **Monitor Trends**: Track gap detection over time
4. **Act on Gaps**: Implement recommendations to improve coordination
5. **Measure Impact**: Quantify time and cost savings

## Resources

- [Gap Detection Documentation](../docs/GAP_DETECTION.md)
- [Entity Extraction Guide](../docs/ENTITY_EXTRACTION.md)
- [API Examples](../docs/API_EXAMPLES.md)
- [README](../README.md)