# 🔍 RawNet2 for Audio Deepfake Detection (ASVspoof 2019 LA)

This notebook demonstrates the use of RawNet2 (from the ASVspoof 2021 baseline) to detect synthetic speech using the ASVspoof 2019 Logical Access (LA) dataset.

## ✅ Setup
The official RawNet2 baseline was cloned from the ASVspoof 2021 challenge GitHub repo. All dependencies were installed manually, including `torch`, `librosa`, `soundfile`, and `scikit-learn`.

The dataset used was **ASVspoof2019 LA**, structured as follows:
```bash
Desktop/data/LA/
├── ASVspoof2019_LA_train/
├── ASVspoof2019_LA_dev/
├── ASVspoof2019_LA_eval/
├── ASVspoof_LA_cm_protocols/
```

## 🏋️‍♀️ Training Summary
- **Model**: RawNet2
- **Device**: CPU
- **Training samples**: 25,380
- **Validation samples**: 24,844
- **Training time**: Manually stopped due to long runtime on CPU
- **Observed Accuracy (early)**: ~79.38% after initial batches

## 🧪 Evaluation Result
Full evaluation was not completed due to compute limits. However, based on literature and official baselines, **RawNet2 typically achieves an Equal Error Rate (EER) of ~0.02 on ASVspoof2019 LA dev set.**

> ⚠️ _This is a benchmark value from the ASVspoof 2021 baseline paper and not from my run._

## 🔍 Observations & Challenges
- Dataset setup and protocol path alignment was the most time-consuming part
- PyYAML version issues and path formatting required minor debugging
- Training on CPU was slow; only early results were observed
- Evaluation not feasible in time, so benchmark values were cited

## ✅ Model Strengths & ❌ Weaknesses
**Strengths:**
- End-to-end model with raw waveform input
- Lightweight architecture; real-time potential
- Proven strong baseline performance on ASVspoof2019 LA

**Weaknesses:**
- Generalization to unseen synthesis techniques may require augmentation
- Training time on CPU is slow without optimization

## 🔁 Comparison with Other Approaches

**RawNet2 (Implemented)**
- End-to-end, raw waveform model
- Simple pipeline, open-source, effective

**M2S-ADD (Not Implemented)**
- Uses stereo conversion to expose deepfake inconsistencies
- Promising for subtle signal artifacts, but code unavailable

**SONAR (Not Implemented)**
- Benchmark suite using foundation models (like Whisper)
- Generalizes well but is compute-heavy and not a single model

## 🚀 Production Considerations
- With further optimization or distillation, RawNet2 could run in real-time
- Suitable for stream-based monitoring systems (e.g., call center deepfake detection)
- Needs robustness improvements for noisy or cross-channel data