# Black Box Audio Protocol (BBAP) - Colab Runner

This notebook sets up and runs the BBAP pipeline on Google Colab. 
It handles:
1. Environment Setup (Dependencies & System Tools)
2. Google Drive Mounting
3. Hugging Face Authentication
4. Code Synchronization (Cloning from GitHub)
5. Execution

In [None]:
# @title 1. Environment Setup
# @markdown Install system dependencies and Python libraries.

!apt-get update -y
!apt-get install -y ffmpeg

# Install core dependencies
!pip install git+https://github.com/m-bain/whisperx.git
!pip install pyannote.audio torch numpy scipy python-dotenv pyngrok

In [None]:
# @title 2. Mount Google Drive
# @markdown This lets us read audio files and save results to your Drive.
from google.colab import drive
import os

drive.mount('/content/drive')

# Ensure the project root exists
PROJECT_ROOT_PATH = "/content/drive/My Drive/yuzhe"
if not os.path.exists(PROJECT_ROOT_PATH):
    os.makedirs(PROJECT_ROOT_PATH, exist_ok=True)
    print(f"Created project directory: {PROJECT_ROOT_PATH}")

In [None]:
# @title 3. Hugging Face Login
# @markdown Enter your HF Token (with READ permissions) to download Pyannote models.
# @markdown 
# @markdown **Crucial:** Ensure you have accepted the user conditions for:
# @markdown - [pyannote/embedding](https://hf.co/pyannote/embedding)
# @markdown - [pyannote/speaker-diarization-3.1](https://hf.co/pyannote/speaker-diarization-3.1)

from huggingface_hub import login

login()

In [None]:
# @title 4. Setup Codebase
# @markdown We will clone the latest version of the code.

import os

# Check if code already exists to avoid re-cloning conflicts
if not os.path.exists("/content/bbap"):
    !git clone https://github.com/xiaoweili/bbap.git /content/bbap
else:
    %cd /content/bbap
    !git pull

%cd /content/bbap

In [None]:
# @title 5. Configuration (.env)
# @markdown Configure your run settings here.
hf_token_val = "" # @param {type:"string"}
# @markdown Leave empty to use the token you just logged in with, or paste it here to be explicit.

whisper_model = "medium" # @param ["tiny", "base", "small", "medium", "large-v3"]
whisper_language = "" # @param {type:"string"}

# Write the .env file
env_content = f"""
HF_TOKEN={hf_token_val}
WHISPER_MODEL={whisper_model}
WHISPER_LANGUAGE={whisper_language}
LOCAL_MODE=False
BASE_DIR=/content/drive/My Drive
PROJECT_ROOT_NAME=yuzhe
INPUT_FOLDER_NAME=Voice Record Pro
SIMILARITY_THRESHOLD=0.65
CLUSTER_MERGE_THRESHOLD=0.85
"""

with open(".env", "w") as f:
    f.write(env_content)

print("âœ… Configuration saved to .env")

In [None]:
# @title 6. Run Pipeline
# @markdown Start the daily processing batch.

!python main.py