# Notebook 3: Anomaly Detection & Alerts

## Making It Automatic 🤖

**👋 Recap:** In Notebooks 1 & 2, you:
- Explored data manually
- Used AI to generate SQL
- Found all 6 planted issues

**But...** you ran everything manually. That doesn't scale.

---

## What We're Building

A **fully automated system** that:
1. 🤖 **Classifies** findings (critical problem? opportunity? minor issue?)
2. 📨 **Sends Slack alerts** with smart recommendations
3. ⚙️ **Runs from YAML config** (add checks without coding)
4. 🚀 **Deploys to production** (GitHub Actions, Cloud Functions, etc.)

**By the end:** You'll see alerts arrive in Slack in real-time!

Let's finish strong! 🎉

---

## Step 1: Setup

**What this does:** Installs everything + imports libraries.

In [None]:
# Install packages
!pip install -q openai google-cloud-bigquery pandas requests

print("✓ Packages installed!")

In [None]:
# Imports
from google.cloud import bigquery
from google.colab import auth
import openai
import pandas as pd
import requests
import json
from datetime import datetime

pd.set_option('display.max_columns', None)

print("✓ Libraries imported!")

---

## Step 2: Authenticate & Configure

**📝 Update these values:**
- `OPENAI_API_KEY`: From platform.openai.com
- `SLACK_WEBHOOK`: From api.slack.com (optional - or just watch!)
- BigQuery project already configured

In [None]:
# Authenticate Google Cloud
auth.authenticate_user()
print("✓ Google Cloud authenticated!")

In [None]:
# Configuration
PROJECT_ID = "npa-workshop-2025"  # ⬅️ Workshop project ID
DATASET_ID = "npa_workshop"
TABLE_ID = "news_events"
TABLE_REF = f"{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}"

# OpenAI (for classification)
OPENAI_API_KEY = "sk-your-key-here"  # ⬅️ UPDATE THIS

# Initialize OpenAI client (new API v1.0+)
from openai import OpenAI
client_openai = OpenAI(api_key=OPENAI_API_KEY)

# Slack (optional - can skip if just watching)
SLACK_WEBHOOK = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"  # ⬅️ UPDATE THIS (optional)

# BigQuery client
client = bigquery.Client(project=PROJECT_ID)

print("✓ Configuration complete!")
print(f"  BigQuery: {TABLE_REF}")
print(f"  OpenAI: {OPENAI_API_KEY[:10]}..." if OPENAI_API_KEY.startswith('sk-') else "  ⚠️ No OpenAI key")
print(f"  Slack: {SLACK_WEBHOOK[:40]}..." if SLACK_WEBHOOK.startswith('https://hooks') else "  ⚠️ No Slack webhook (optional)")

---

# Part 1: AI Classification 🧠

**The problem:** We found issues, but how do we know if they're critical?

**The solution:** Use OpenAI **function calling** to classify findings.

**How it works:**
1. Give AI the finding ("iOS tracking stopped")
2. Ask it to classify using a structured format
3. Get back JSON: category, severity, title, message, recommendation, emoji

**This is like giving the AI a form to fill out!**

In [None]:
def classify_finding(finding_description, model="gpt-4"):
    """
    Use OpenAI function calling to classify an analytics finding.
    
    Returns structured JSON with category, severity, title, message, etc.
    """
    tools = [{
        "type": "function",
        "function": {
            "name": "classify_analytics_finding",
            "description": "Classify an analytics finding as problem or opportunity",
            "parameters": {
                "type": "object",
                "properties": {
                    "category": {
                        "type": "string",
                        "enum": ["problem_critical", "problem_minor", "opportunity", "informational"],
                        "description": "Type of finding"
                    },
                    "severity": {
                        "type": "string",
                        "enum": ["high", "medium", "low"],
                        "description": "How urgent is this?"
                    },
                    "title": {
                        "type": "string",
                        "description": "Short, clear title (< 50 chars)"
                    },
                    "message": {
                        "type": "string",
                        "description": "Detailed explanation of what was found"
                    },
                    "recommendation": {
                        "type": "string",
                        "description": "Specific next steps to take"
                    },
                    "emoji": {
                        "type": "string",
                        "description": "Single emoji representing severity/category"
                    }
                },
                "required": ["category", "severity", "title", "message", "recommendation", "emoji"]
            }
        }
    }]
    
    response = client_openai.chat.completions.create(
        model=model,
        messages=[{
            "role": "user",
            "content": f"Classify this analytics finding:\n\n{finding_description}"
        }],
        tools=tools,
        tool_choice={"type": "function", "function": {"name": "classify_analytics_finding"}}
    )
    
    result = json.loads(response.choices[0].message.tool_calls[0].function.arguments)
    return result

print("✓ Classification function defined!")
print("\nReady to classify findings with AI! 🤖")

---

## Test: Classify iOS Tracking Break

**Let's classify the iOS issue we found in Notebook 2.**

In [None]:
# The finding from Notebook 2
finding = """
iOS scroll_depth events dropped to zero after 2pm on October 3rd, 2025.
Expected approximately 2,100 events per hour based on historical average.
Found 842 events at 2pm, then 0 events for all subsequent hours.
This affects all iOS users and completely blocks engagement metrics.
"""

print("🔍 Finding: iOS scroll_depth tracking stopped\n")
print("🤖 Classifying with OpenAI function calling...\n")

classification = classify_finding(finding)

print("📊 Classification Result:")
print("=" * 70)
print(json.dumps(classification, indent=2))
print("=" * 70)

print(f"\n✨ Perfect! AI classified this as:")
print(f"   Category: {classification['category']}")
print(f"   Severity: {classification['severity']}")
print(f"   Emoji: {classification['emoji']}")

---

## Test: Classify Newsletter Spike

**Now let's classify good news - the newsletter spike!**

In [None]:
finding = """
Newsletter signup events increased 45% on October 4th, 2025.
Previous day had 17,500 signups.
October 4th had 25,375 signups.
This is significantly above the weekly baseline and represents positive momentum.
"""

print("🔍 Finding: Newsletter signups spiked\n")
print("🤖 Classifying...\n")

classification = classify_finding(finding)

print("📊 Classification Result:")
print("=" * 70)
print(json.dumps(classification, indent=2))
print("=" * 70)

print(f"\n✨ Notice the difference!")
print(f"   Category: {classification['category']} (not a problem!)")
print(f"   Emoji: {classification['emoji']} (celebration!)")
print(f"\n   Same system, different tone. The AI understands context! 🎉")

---

# Part 2: Slack Alerts 📨

**Now let's send these classifications to Slack!**

**How it works:**
1. Take the classification JSON
2. Format it into a nice Slack message
3. Color-code by severity (red = critical, green = opportunity)
4. Send via webhook

**⚠️ Note:** You need a Slack webhook URL. If you don't have one, that's okay - you'll see the formatted message here!

In [None]:
def send_slack_alert(classification, webhook_url=None):
    """
    Send classification to Slack as a formatted alert.
    """
    # Color mapping
    colors = {
        "problem_critical": "danger",    # Red
        "problem_minor": "warning",      # Yellow
        "opportunity": "good",           # Green
        "informational": "#439FE0"       # Blue
    }
    
    color = colors.get(classification['category'], "#439FE0")
    
    # Build Slack message
    payload = {
        "attachments": [{
            "color": color,
            "title": f"{classification['emoji']} {classification['title']}",
            "text": classification['message'],
            "fields": [
                {
                    "title": "Category",
                    "value": classification['category'].replace('_', ' ').title(),
                    "short": True
                },
                {
                    "title": "Severity",
                    "value": classification['severity'].upper(),
                    "short": True
                },
                {
                    "title": "Recommendation",
                    "value": classification['recommendation'],
                    "short": False
                }
            ],
            "footer": "Analytics Intelligence",
            "footer_icon": "https://platform.slack-edge.com/img/default_application_icon.png",
            "ts": int(datetime.now().timestamp())
        }]
    }
    
    # Print formatted message
    print("\n📨 Slack Alert Preview:")
    print("=" * 70)
    print(f"{classification['emoji']} {classification['title']}")
    print(f"\nCategory: {classification['category']} | Severity: {classification['severity'].upper()}")
    print(f"\n{classification['message']}")
    print(f"\nRecommendation: {classification['recommendation']}")
    print("=" * 70)
    
    # Send to Slack if webhook provided
    if webhook_url and webhook_url.startswith('https://hooks.slack.com'):
        try:
            response = requests.post(webhook_url, json=payload)
            if response.status_code == 200:
                print("\n✓ Alert sent to Slack successfully!")
                return True
            else:
                print(f"\n⚠️ Slack webhook returned status {response.status_code}")
                return False
        except Exception as e:
            print(f"\n⚠️ Error sending to Slack: {e}")
            return False
    else:
        print("\n⚠️ No Slack webhook configured (that's okay - this is just a preview!)")
        return None

print("✓ Slack alerter function defined!")

---

## Send Alert: iOS Tracking Break

**🎯 BIG MOMENT: Let's send the iOS alert to Slack!**

*(If you don't have a webhook, you'll still see a preview)*

In [None]:
# Use the iOS classification from earlier
finding = """
iOS scroll_depth events dropped to zero after 2pm on October 3rd, 2025.
Expected ~2,100 events/hour, found 842 at 2pm then 0 after.
Affects all iOS users and blocks engagement metrics.
"""

classification = classify_finding(finding)

print("📤 Sending alert to Slack...\n")

send_slack_alert(classification, SLACK_WEBHOOK)

print("\n💡 Check your Slack channel! (if webhook is configured)")
print("   You should see a red alert with the iOS tracking issue!")

---

## Send Alert: Newsletter Opportunity

**Now let's send the good news!**

In [None]:
finding = """
Newsletter signups increased 45% on October 4th (25,375 vs 17,500).
Significantly above weekly baseline.
Positive momentum worth investigating.
"""

classification = classify_finding(finding)

print("📤 Sending opportunity alert to Slack...\n")

send_slack_alert(classification, SLACK_WEBHOOK)

print("\n🎉 This should appear in green! Same system, different tone.")

---

# Part 3: Full Automation 🚀

**Now let's tie it all together:**
1. Generate SQL from description
2. Run query
3. Classify results
4. Send Slack alert

**All in one function!**

In [None]:
def run_check(check_name, check_description, slack_webhook=None):
    """
    Complete check workflow:
    1. Generate SQL from description
    2. Execute query
    3. Classify results if found
    4. Send Slack alert
    """
    print(f"\n{'='*70}")
    print(f"🔍 Running Check: {check_name}")
    print(f"{'='*70}")
    
    # Step 1: Generate SQL
    print("\n1️⃣ Generating SQL with GPT...")
    
    # Get schema
    table = client.get_table(TABLE_REF)
    schema_info = "\n".join([f"- {field.name}: {field.field_type}" for field in table.schema])
    
    # Generate SQL (using new API)
    prompt = f"""Generate BigQuery SQL for: {check_description}
    
Table: `{TABLE_REF}`
Schema:
{schema_info}

Return only SQL, no explanation."""
    
    response = client_openai.chat.completions.create(
        model="gpt-3.5-turbo",  # Cheaper for SQL generation
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )
    
    sql = response.choices[0].message.content.strip()
    if sql.startswith('```'):
        sql = sql.split('```')[1]
        if sql.startswith('sql\n'):
            sql = sql[4:]
    sql = sql.strip()
    
    print(f"   ✓ SQL generated ({len(sql)} chars)")
    
    # Step 2: Execute
    print("\n2️⃣ Executing query...")
    try:
        df = client.query(sql).to_dataframe()
        print(f"   ✓ Query complete ({len(df)} rows)")
    except Exception as e:
        print(f"   ✗ Query failed: {e}")
        return None
    
    # Step 3: Classify if results found
    if len(df) > 0:
        print("\n3️⃣ Classifying findings...")
        finding_summary = f"{check_name}: Found {len(df)} rows. Sample:\n{df.head(3).to_string()}"
        classification = classify_finding(finding_summary)
        print(f"   ✓ Classified as: {classification['category']} ({classification['severity']})")
        
        # Step 4: Send alert
        print("\n4️⃣ Sending Slack alert...")
        send_slack_alert(classification, slack_webhook)
        
        return classification
    else:
        print("\n   ℹ️ No issues found - check passed!")
        return None

print("✓ Full automation function defined!")
print("\nReady to run complete checks! 🎉")

---

## Run All Checks

**🎯 CLIMAX: Let's run all 6 checks and watch the alerts fly!**

This will:
- Generate SQL for each check
- Execute queries
- Classify findings
- Send Slack alerts

**All automatically!**

In [None]:
# Define all checks
checks = [
    {
        "name": "Missing Consent State",
        "description": "Find events where consent_state is NULL. Group by date/platform, show percentage missing. Flag if >10%."
    },
    {
        "name": "iOS Scroll Tracking Break",
        "description": "Find hours where iOS scroll_depth events < 1000. Show date, hour, count."
    },
    {
        "name": "PII in URLs",
        "description": "Find page_location containing 'email=' or '@'. Show URL and count. Limit 10."
    },
    {
        "name": "Duplicate Events",
        "description": "Find duplicate user_pseudo_id + event_timestamp. Show count > 1. Limit 20."
    },
    {
        "name": "Newsletter Signup Spike",
        "description": "Find days where newsletter_signup count increased >30% from previous day."
    },
    {
        "name": "New Referrer Source",
        "description": "Find referrers in recent days (>= 20251005) not in earlier days."
    }
]

print("🚀 Running all 6 checks...")
print("\nThis will take ~60 seconds (6 SQL generations + executions)\n")

results = []
for i, check in enumerate(checks, 1):
    print(f"\n{'='*70}")
    print(f"Check {i}/{len(checks)}: {check['name']}")
    print(f"{'='*70}")
    
    result = run_check(
        check_name=check['name'],
        check_description=check['description'],
        slack_webhook=SLACK_WEBHOOK
    )
    
    if result:
        results.append(result)

print(f"\n\n{'='*70}")
print("✅ ALL CHECKS COMPLETE!")
print(f"{'='*70}")
print(f"\nFindings: {len(results)}")
for r in results:
    print(f"  {r['emoji']} {r['title']} ({r['category']})")

if SLACK_WEBHOOK.startswith('https://hooks.slack.com'):
    print(f"\n💬 Check your Slack channel - you should have {len(results)} alerts!")
else:
    print(f"\n💡 Configure SLACK_WEBHOOK to see alerts in Slack!")

---

# 🎉 Workshop Complete!

## What You Built

A complete **Analytics Intelligence System** that:

1. ✅ **Generates SQL** from plain English (no SQL expertise needed)
2. ✅ **Finds problems** (missing consent, tracking breaks, PII leaks, duplicates)
3. ✅ **Finds opportunities** (traffic spikes, new referrers)
4. ✅ **Classifies findings** with AI (critical vs minor, problem vs opportunity)
5. ✅ **Sends Slack alerts** automatically
6. ✅ **Runs on schedule** (ready for production!)

---

## What You Discovered

**In 90 minutes, you found:**
- 4+ problems (GDPR risk, PII leak, duplicates, data quality issues)
- 2 opportunities (newsletter spike, new traffic source)

**Without writing a single line of SQL!**

---

## Deployment Options

**To run this in production:**

### Option 1: GitHub Actions (Recommended)
- Free for public repos
- Runs on schedule (every 6 hours)
- 10-minute setup
- See workshop materials for instructions

### Option 2: Google Cloud Functions
- Native BigQuery integration
- Serverless, auto-scaling
- Generous free tier

### Option 3: Cron Job
- Simplest if you have a server
- One line in crontab
- Zero additional cost

---

## Cost Summary

**This workshop:**
- BigQuery: $0 (free tier)
- OpenAI: ~$0.50 (GPT-4 + GPT-3.5-turbo)
- Slack: $0
- **Total: < $1**

**Production (running 4x/day):**
- BigQuery: ~$5/month
- OpenAI (GPT-3.5-turbo): ~$3/month
- Compute: $0 (GitHub Actions)
- **Total: < $10/month**

**ROI:** Catch one tracking break early → save thousands in lost data

---

## Next Steps

### Today
1. ⭐ Save these notebooks
2. 📋 Review the workshop materials
3. 📧 Reach out with questions: **brdavis@ap.org**

### This Week
1. Get OpenAI API key (platform.openai.com)
2. Load your data to BigQuery
3. Run notebooks with your data
4. Customize checks for your tracking plan

### This Month
1. Deploy to GitHub Actions
2. Configure Slack alerts
3. Add your custom checks
4. Catch your first issue!
5. Share your success story!

---

## Resources

**Workshop Materials:**
- All 3 notebooks (you have them!)
- Additional documentation available on request

**Support:**
- Email: **brdavis@ap.org**
- Follow-up questions welcome!

---

## Thank You! 🙏

You just learned how to:
- Build analytics monitoring with AI
- Find problems before they cost thousands
- Discover opportunities while they're happening
- Deploy production systems for < $10/month

**That's powerful!** 💪

---

**Presented by:**
Bryan Davis  
Director of Product, Data & Analytics  
The Associated Press  
**brdavis@ap.org**

---

**Questions?** Email me: **brdavis@ap.org**

**Now go build something awesome!** 🚀