# FrogID Audio Inference Pipeline

This notebook demonstrates how to process a single audio file and generate species predictions using a trained FrogID model.

**Pipeline Steps:**
1. **Setup**: Configure environment and model parameters
2. **Load Pipeline**: Get trained model pipeline from MLflow registry
3. **Load Model**: Retrieve trained model from registry
4. **Process Audio**: Load, preprocess, and extract features from audio file
5. **Predict**: Generate species predictions with confidence scores


## 1. Setup

In [None]:
%load_ext autoreload
%autoreload 2

import os
import sys
from pathlib import Path
# Setup path
notebook_path =  '/Workspace/' + os.path.dirname(dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get())
os.chdir(notebook_path)
os.chdir('..')
sys.path.append("../..")

from utils.pipeline import get_pipeline_config, instantiate_pipeline


In [None]:
from utils.environment_setup import setup_mlflow

# Configuration
AUDIO_FILE_PATH = "/Users/zacharyzhang/Documents/am/frogid-ml-15species-dbx/data/audio_files/5673.wav"
SCHEMA = "frogid_ml"
MODEL_NAME = "birdnet-ave-3s-chunk-embeddings"
MODEL_ALIAS = "champion"

# Setup environment
IS_DATABRICKS = "DATABRICKS_RUNTIME_VERSION" in os.environ
ROOT_DIR = Path(os.getcwd()).parent.parent
sys.path.insert(0, str(ROOT_DIR))
setup_mlflow(ROOT_DIR, IS_DATABRICKS)

print(f"📋 Configuration set - Model: {MODEL_NAME}, Alias: {MODEL_ALIAS}")
print(f"📁 Audio file: {AUDIO_FILE_PATH}")


## 2. Load Pipeline Configuration

In [None]:
model_name = f"{SCHEMA}.{MODEL_NAME}"
config = get_pipeline_config(
    # run_id = "542f86b2cb8348549eb8f583ea34f976"
    model_name=model_name,
    model_alias=MODEL_ALIAS,
    # model_version="2"  # Uncomment to use specific version instead of alias
)

print(f"✅ Successfully loaded pipeline config!")

pipeline = instantiate_pipeline(config)


## 3. Load Model


In [None]:
model = pipeline.model_evaluator.load_model_from_mlflow_registry(model_name=model_name, model_version_alias=MODEL_ALIAS)

print(f"✅ Model loaded successfully")
model.summary()

## 4. Process Audio

In [None]:
waveform = pipeline.data_preprocessor.loader.load(AUDIO_FILE_PATH)
X_features = pipeline.data_preprocessor.process_audio_for_inference(waveform)


## 5. Generate Predictions


In [None]:
print(f"🔮 Generating predictions for {len(X_features)} chunks...")
class_labels_to_species = pipeline.data_selector.class_labels_to_species

pipeline.model_evaluator.predict(
    model=model,
    features=X_features,
    class_label_to_species_mapping=class_labels_to_species
)