# Friend-Foe Analysis - Quick Start Demo

This notebook demonstrates the complete pipeline for analyzing visual differences between "Us" (ingroup) and "Them" (outgroup) in Nazi propaganda films.

## Pipeline Overview

1. **Frame Extraction** - Extract frames from video files
2. **Annotation** - Label frames as 'us' or 'them'
3. **Feature Extraction** - Extract visual features (lighting, composition, etc.)
4. **Classification** - Train ML model to distinguish groups
5. **Analysis** - Analyze feature importance

In [None]:
# Setup
import sys
sys.path.append('../src')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Import our modules
from frame_extraction import extract_frames_uniform, get_video_info
from annotation import FrameAnnotator, create_annotation_template, show_frame
from feature_extraction import VisualFeatureExtractor, extract_features_from_directory
from model import FriendFoeClassifier, train_and_evaluate_pipeline

# Visualization settings
sns.set_style('whitegrid')
%matplotlib inline

## Step 1: Frame Extraction

Extract frames from video files. For the POC, we'll extract a small number of uniformly distributed frames.

In [None]:
# Example: Extract frames from a video file
# Replace with your actual video path

VIDEO_PATH = "path/to/your/video.mp4"  # Change this!

# Check if video exists
if Path(VIDEO_PATH).exists():
    # Get video info
    info = get_video_info(VIDEO_PATH)
    print("Video info:", info)
    
    # Extract 50 uniformly distributed frames
    output_dir = "../data/raw/demo_frames"
    frames = extract_frames_uniform(VIDEO_PATH, output_dir, num_frames=50)
    print(f"Extracted {len(frames)} frames")
else:
    print(f"Video not found: {VIDEO_PATH}")
    print("Please update VIDEO_PATH with your actual video file path")

## Step 2: Create Annotation Template

Create a CSV template for annotating frames.

In [None]:
# Create annotation template
frame_dir = "../data/raw/demo_frames"
annotation_file = "../data/annotated/demo_annotations.csv"

if Path(frame_dir).exists():
    create_annotation_template(frame_dir, annotation_file)
else:
    print(f"Frame directory not found: {frame_dir}")
    print("Please extract frames first (Step 1)")

## Step 2b: Interactive Annotation (Simple Version)

Annotate frames interactively. For each frame:
- Label: 'us', 'them', 'neutral', or 'unclear'
- Optionally add notes

In [None]:
# Load annotator
annotator = FrameAnnotator(annotation_file)

# Get list of frames to annotate
frame_dir = Path("../data/raw/demo_frames")
if frame_dir.exists():
    frames = sorted(list(frame_dir.glob("*.jpg")))
    print(f"Found {len(frames)} frames to annotate")
    
    # Example: annotate first frame
    if len(frames) > 0:
        show_frame(frames[0])
        
        # Manually annotate (you would do this for each frame)
        # annotator.add_annotation(frames[0], label='us', confidence=0.9, notes='Heroic pose')
        # annotator.save()
else:
    print("Frame directory not found")

### Manual Annotation Instructions

For the POC, you can manually edit the CSV file:

1. Open `data/annotated/demo_annotations.csv` in Excel/LibreOffice
2. For each frame, fill in the `label` column:
   - `us` - Ingroup (idealized Germans, heroes, Volksgemeinschaft)
   - `them` - Outgroup (Jews, Communists, enemies)
   - `neutral` - Neither (background, landscape, etc.)
   - `unclear` - Can't determine
3. Save the CSV file

**Note:** For training, you need at least 20-30 frames labeled as 'us' and 'them'

## Step 3: Feature Extraction

Extract visual features from all frames.

In [None]:
# Extract features
frame_dir = "../data/raw/demo_frames"
features_csv = "../data/features/demo_features.csv"

if Path(frame_dir).exists():
    features_df = extract_features_from_directory(frame_dir, features_csv)
    print("\nFeature extraction complete!")
    print(f"Extracted {len(features_df)} feature vectors with {len(features_df.columns)-1} features each")
    
    # Show sample features
    print("\nSample features:")
    display(features_df.head())
else:
    print(f"Frame directory not found: {frame_dir}")

## Step 4: Train Classifier

Train a Random Forest classifier to distinguish 'us' from 'them'.

In [None]:
# Check if we have annotations
features_csv = "../data/features/demo_features.csv"
annotations_csv = "../data/annotated/demo_annotations.csv"

if Path(features_csv).exists() and Path(annotations_csv).exists():
    # Load and check annotations
    annotations_df = pd.read_csv(annotations_csv)
    label_counts = annotations_df['label'].value_counts()
    print("Label distribution:")
    print(label_counts)
    
    # Check if we have enough labeled data
    us_count = len(annotations_df[annotations_df['label'] == 'us'])
    them_count = len(annotations_df[annotations_df['label'] == 'them'])
    
    if us_count >= 10 and them_count >= 10:
        print(f"\nSufficient labeled data: {us_count} 'us' samples, {them_count} 'them' samples")
        print("Proceeding with training...\n")
        
        # Train and evaluate
        output_dir = "../results"
        classifier, results = train_and_evaluate_pipeline(
            features_csv,
            annotations_csv,
            test_size=0.2,
            output_dir=output_dir
        )
    else:
        print(f"\nInsufficient labeled data:")
        print(f"  'us' samples: {us_count} (need at least 10)")
        print(f"  'them' samples: {them_count} (need at least 10)")
        print("\nPlease annotate more frames in the CSV file.")
else:
    print("Features or annotations not found. Please complete previous steps.")

## Step 5: Analyze Results

Examine feature importance to understand which visual characteristics distinguish the groups.

In [None]:
# Load feature importance results
importance_file = "../results/feature_importance.csv"

if Path(importance_file).exists():
    importance_df = pd.read_csv(importance_file)
    
    print("Top 10 Most Important Features:")
    print(importance_df.head(10))
    
    # Visualize
    plt.figure(figsize=(10, 6))
    top_features = importance_df.head(15)
    sns.barplot(data=top_features, y='feature', x='importance')
    plt.title('Most Important Visual Features for Friend-Foe Distinction')
    plt.xlabel('Importance')
    plt.tight_layout()
    plt.show()
else:
    print("Results not found. Please train the model first (Step 4).")

## Interpretation Guide

### Key Features to Analyze:

**Lighting Features:**
- `low_key_ratio` - High values = dark, dramatic lighting (often for villains)
- `high_key_ratio` - High values = bright, even lighting (often for heroes)
- `contrast` - High contrast = dramatic, low contrast = soft

**Composition Features:**
- `vertical_symmetry` - Symmetrical compositions often convey power/order
- `center_brightness` - Center focus draws attention to subject
- `edge_density` - More edges = busy/chaotic vs clean/simple

**Color Features:**
- `saturation_mean` - Desaturated (grayscale-like) vs vibrant colors
- `hue_mean` - Warm tones (red/yellow) vs cool tones (blue/green)

### Expected Findings:

Based on propaganda research, "Us" (ingroup) typically features:
- High-key lighting (bright, even)
- Symmetrical compositions
- Low-angle shots (convey power)
- Clean, orderly framing

"Them" (outgroup) typically features:
- Low-key lighting (dark, shadowy)
- Asymmetrical compositions
- High-angle shots (diminish subject)
- Cluttered, chaotic framing

## Next Steps

1. **Expand dataset**: Add more films, more frames
2. **Improve annotations**: More careful labeling, inter-annotator agreement
3. **Advanced features**: Add face detection, pose estimation, scene detection
4. **Deep learning**: Try pre-trained CNNs (VGG, ResNet) for feature extraction
5. **Qualitative analysis**: Select example frames that show clear visual patterns
6. **Write paper**: Document findings in the structured report format