<a href="https://colab.research.google.com/github/thc1006/whisper-colab-tpu-transcriber/blob/main/%5B0606%5Dwhisper_colab_tpu_v2_8_transcriber.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# -*- coding: utf-8 -*-
"""[更新版] whisper_colab_tpu_v2_8_transcriber.ipynb

Automatically generated by Colab.

Original file is located at
    https://colab.research.google.com/drive/118C_15XiygjnDnuGRykySIW0mZqPPagp
"""

# @title
# ----------------------------
# Cell 1: Install packages and UI setup
# ----------------------------
print("🚀 Starting Cell 1: Install packages and UI setup...")

import sys
import os
print(f"🐍 Python version: {sys.version}")
# Confirmed Python 3.11 (cp311 wheels will be used)

# Based on your error log, torch_xla 2.7.0 was listed as available.
# We will try to install torch_xla and let it pull its specific, compatible
# torch, torchvision, and torchaudio versions directly from the XLA releases.
TARGET_TORCH_XLA_VERSION = "2.7.0"

print(f"🎯 Target TorchXLA version: {TARGET_TORCH_XLA_VERSION}")

# 1. Uninstall potentially conflicting packages
print("🔄 Uninstalling torch, torch_xla, torchvision, torchaudio, and fastai to ensure a clean environment...")
!pip uninstall -y torch torch_xla torchvision torchaudio fastai 2>/dev/null || true
print("✅ Uninstallation attempt complete.")

# 2. Install TorchXLA and its XLA-compatible PyTorch/TorchVision/TorchAudio
# This command focuses on installing torch_xla from its specific repository,
# which should ensure that it pulls compatible versions of torch, torchvision, and torchaudio
# that were built together and are ABI-compatible.
PYTORCH_XLA_RELEASES_INDEX = "https://storage.googleapis.com/pytorch-xla-releases/index.html"
# The libtpu-releases index is also important for the underlying TPU libraries.
LIBTPU_RELEASES_INDEX = "https://storage.googleapis.com/libtpu-releases/index.html"

print(f"🔄 Installing TorchXLA {TARGET_TORCH_XLA_VERSION} and its compatible PyTorch dependencies...")
print(f"   This will use Python 3.11 (cp311) compatible wheels from: {PYTORCH_XLA_RELEASES_INDEX} and {LIBTPU_RELEASES_INDEX}")

# This single command should fetch torch_xla and its specific, compatible torch, torchvision, torchaudio.
# It's crucial that pip resolves these from the XLA indices.
!pip install -q \
    torch_xla=={TARGET_TORCH_XLA_VERSION} \
    -f {PYTORCH_XLA_RELEASES_INDEX} \
    -f {LIBTPU_RELEASES_INDEX}

print("✅ PyTorch/TorchXLA installation attempt complete.")
print("🔍 Verifying installed versions (after restart, these will be effective):")
# These commands might show versions before restart if run immediately.
# The true test is importing after restarting the session.
!pip show torch torch_xla torchvision torchaudio | grep -E "^(Name|Version):" || echo "Verification step: Some packages might not be fully listed until after restart."


# 3. Install Transformers and other utilities
print("🔄 Installing Hugging Face Transformers and other utilities (sentencepiece, librosa, soundfile, ipywidgets, accelerate)...")
!pip install -q "transformers>=4.39.0,<4.43.0" sentencepiece librosa soundfile ipywidgets "accelerate>=0.25.0"
print("✅ Utilities installation complete.")

# 4. Install FFmpeg for audio processing
print("🔄 Updating apt and installing ffmpeg...")
!apt-get update -qq > /dev/null && apt-get install -y -qq ffmpeg > /dev/null
print("✅ ffmpeg installation complete.")

print("\n👍 Cell 1 package installation process finished.")
print("‼️ IMPORTANT: You MUST restart the Colab session now for these changes to take effect.")
print("   Go to 'Runtime' > 'Restart runtime' in the Colab menu. ")
print("   After restarting, re-run all cells starting from this Cell 1 (the UI part below will then execute).")

# UI part - This will effectively run after the restart when the user re-runs Cell 1
import ipywidgets as widgets
from IPython.display import display, HTML, clear_output

# Clear previous output from installations for a cleaner UI display when re-run after restart
clear_output(wait=True)
print("🔄 Session likely restarted (or this is the first run of UI part). Displaying UI configuration options...")
print(f"🐍 Python version: {sys.version}")  # Re-print python version for context

display(HTML("""
<style>
    .widget-label { min-width: 20ex !important; }
    .widget-dropdown > select { background-color: #f0f0f0; border-radius: 4px; }
    .widget-text input[type="text"] { background-color: #f0f0f0; border-radius: 4px; }
    .widget-button { background-color: #4CAF50 !important; color: white !important; border-radius: 5px; }
    .widget-inttext input[type="number"] { background-color: #f0f0f0; border-radius: 4px; }
    .settings-box { padding: 15px; border: 1px solid #ccc; border-radius: 8px; background-color: #fafafa; box-shadow: 2px 2px 5px rgba(0,0,0,0.1); }
    .settings-box .widget-box { margin-bottom: 10px; }
    .settings-box .widget-html-value b { font-size: 1.1em; color: #2c3e50; margin-top: 12px; display: block; border-bottom: 1px solid #eee; padding-bottom: 5px;}
</style>
"""))

# --- FIX: Changed model options to use full, valid Hugging Face model IDs ---
# This prevents errors from trying to load a model that doesn't exist.
model_options = [
    ("Tiny (fastest, lower accuracy)", "openai/whisper-tiny"),
    ("Base (fast, basic accuracy)", "openai/whisper-base"),
    ("Small (recommended, balanced speed and accuracy)", "openai/whisper-small"),
    ("Medium (slower, higher accuracy)", "openai/whisper-medium"),
    ("Large-v1 (v1, high accuracy)", "openai/whisper-large"),
    ("Large-v2 (v2, even higher accuracy)", "openai/whisper-large-v2"),
    ("Large-v3 (v3, newest, best accuracy)", "openai/whisper-large-v3"),
]
# The default is set to large-v3, the best available official model.
model_widget = widgets.Dropdown(options=model_options, value="openai/whisper-large-v3", description="Whisper model:")
# --- End of FIX ---

language_options = [
    ("Auto detect (auto)", "auto"),
    ("Chinese (zh)", "zh"),
    ("English (en)", "en"),
    ("Japanese (ja)", "ja"),
    ("Korean (ko)", "ko"),
    ("Cantonese (yue)", "yue"),
    ("Other (custom)", "custom")
]
language_dropdown_widget = widgets.Dropdown(options=language_options, value="auto", description="Transcription language:")
language_text_widget = widgets.Text(value="", placeholder="If 'Other' selected, enter ISO code (e.g., de, fr)")

def on_language_change(change):
    if change['type'] == 'change' and change['name'] == 'value':
        if change.new == "custom":
            language_text_widget.layout.display = "flex"
        else:
            language_text_widget.layout.display = "none"
language_dropdown_widget.observe(on_language_change, names='value')
language_text_widget.layout.display = "none"

task_widget = widgets.Dropdown(options=["transcribe", "translate"], value="transcribe", description="Task:")

precision_options = [
    ("BF16 (TPU recommended, accelerated)", "bf16"),
    ("FP32 (standard precision, CPU/GPU)", "fp32")
]
precision_widget = widgets.Dropdown(options=precision_options, value="bf16", description="Compute precision:")

chunk_length_s_widget = widgets.IntText(value=28, description="Audio chunk length (s):", style={'description_width': 'initial'}, min=1, max=30)
stride_length_s_left_widget = widgets.IntText(value=5, description="Left overlap (s):", style={'description_width': 'initial'}, min=0)
stride_length_s_right_widget = widgets.IntText(value=5, description="Right overlap (s):", style={'description_width': 'initial'}, min=0)

settings_box_layout = widgets.Layout(display='flex', flex_flow='column', align_items='stretch', width='auto')
settings_box = widgets.VBox([
    model_widget,
    language_dropdown_widget,
    language_text_widget,
    task_widget,
    precision_widget,
    widgets.HTML("<b>Advanced Long-Form Processing Settings:</b>"),
    chunk_length_s_widget,
    stride_length_s_left_widget,
    stride_length_s_right_widget
], layout=settings_box_layout)

display(HTML("<h2>Speech Transcription Settings</h2>"), settings_box)

print("\n✅ Cell 1 UI setup complete. Please confirm the settings above, and then run the next Cell.")
print("   If you just installed packages and were prompted to restart, please 'Runtime -> Restart runtime' before continuing.")

🔄 Session likely restarted (or this is the first run of UI part). Displaying UI configuration options...
🐍 Python version: 3.11.13 (main, Jun  4 2025, 08:57:29) [GCC 11.4.0]


VBox(children=(Dropdown(description='Whisper model:', index=6, options=(('Tiny (fastest, lower accuracy)', 'op…


✅ Cell 1 UI setup complete. Please confirm the settings above, and then run the next Cell.
   If you just installed packages and were prompted to restart, please 'Runtime -> Restart runtime' before continuing.


In [None]:
# @title
# -------------------------------------------------
# Cell 2: Load model, initialize Pipeline, and XLA warm-up
# -------------------------------------------------
print("🚀 Starting Cell 2: Load model, initialize Pipeline, and XLA warm-up...")

# 1. Import necessary libraries
import torch
import warnings
import time
import numpy as np
import gc

# Attempt to import torch_xla; if it fails, prompt the user to check installation and restart
try:
    import torch_xla
    import torch_xla.core.xla_model as xm
    import torch_xla.debug.metrics as met
    print("✅ torch_xla modules imported successfully.")
    print(f"   Torch Version: {torch.__version__}")
    print(f"   Torch XLA Version: {torch_xla.__version__}")
except ImportError as e:
    print(f"❌ Failed to import torch_xla modules! Error details: {e}")
    print("   Please ensure you have successfully executed the PyTorch/XLA installation in Cell 1,")
    print("   and have 'Restart Runtime' after installation (Runtime -> Restart runtime).")
    print("   If the problem persists, check the installation logs in Cell 1 for errors and verify version compatibility.")
    raise
except Exception as e:
    print(f"❌ An unexpected error occurred while importing torch_xla modules: {e}")
    raise

from transformers import WhisperProcessor, WhisperForConditionalGeneration, pipeline

# Silence some non-essential warnings
warnings.filterwarnings("ignore", message=".*TorchScript only supports basic types list, tuple, dict.*")
warnings.filterwarnings("ignore", message=".*PySoundFile failed.*")
warnings.filterwarnings("ignore", message=".*Due to a bug fix.*")
warnings.filterwarnings("ignore", message=".*Passing `max_length` to BeamSearchScorer is deprecated*")

# 2. Read user settings from Cell 1
print("⚙️ Reading user settings...")
# --- FIX: Directly use the full model ID from the widget value ---
MODEL_NAME = model_widget.value
# --- End of FIX ---

_selected_language_option = language_dropdown_widget.value
if _selected_language_option == "custom":
    selected_language = language_text_widget.value.strip().lower()
    if not selected_language:
        print("   ⚠️ Custom language is empty; defaulting to auto-detect.")
        selected_language = "auto"
else:
    selected_language = _selected_language_option

selected_task = task_widget.value
selected_precision = precision_widget.value
chunk_length = max(1, chunk_length_s_widget.value) if chunk_length_s_widget.value > 0 else 30
stride_left = max(0, stride_length_s_left_widget.value)
stride_right = max(0, stride_length_s_right_widget.value)

print(f"   Model: {MODEL_NAME}, Language: {selected_language}, Task: {selected_task}, Precision: {selected_precision}")
print(f"   Long audio parameters -> Chunks: {chunk_length}s, Overlap: [{stride_left}s, {stride_right}s]")

# 3. Set TPU device and compute precision
tpu_device_acquisition_successful = False
try:
    tpu_cores = xm.xrt_world_size()
    print(f"🌍 Detected {tpu_cores} XLA device cores.")
    device = xm.xla_device()
    print(f"✅ TPU device acquired successfully: {device} (representing all {tpu_cores} cores)")
    tpu_device_acquisition_successful = True
except Exception as e:
    print(f"⚠️ Unable to acquire TPU device, error: {e}")
    print("   Please ensure the Colab Runtime is set to TPU (TPU v2). Subsequent processing will fall back to CPU.")
    device = torch.device("cpu")

if selected_precision == "bf16" and tpu_device_acquisition_successful:
    torch_dtype = torch.bfloat16
    print("   Compute precision set to: BF16 (suitable for TPU)")
elif selected_precision == "bf16" and not tpu_device_acquisition_successful:
    torch_dtype = torch.float32
    print("   ⚠️ CPU does not natively support BF16; compute precision adjusted to FP32.")
else:
    torch_dtype = torch.float32
    print(f"   Compute precision set to: FP32 (for {device.type})")

# 4. Load Whisper Processor
processor = None
print(f"\n🔄 Loading Whisper Processor for {MODEL_NAME}...")
try:
    processor = WhisperProcessor.from_pretrained(MODEL_NAME)
    print("✅ Processor loaded successfully!")
except Exception as e:
    print(f"❌ Failed to load Processor: {e}. Please check if the model name ({MODEL_NAME}) is correct or your internet connection.")
    processor = None

# 5. Load Whisper model and move to TPU
model = None
if processor:
    print(f"🔄 Loading Whisper model {MODEL_NAME} (dtype: {torch_dtype}) and moving to device {device}...")
    try:
        model = WhisperForConditionalGeneration.from_pretrained(
            MODEL_NAME,
            torch_dtype=torch_dtype,
            low_cpu_mem_usage=True if "large" in MODEL_NAME else False,
        ).to(device)
        model.eval()
        print("✅ Model loaded and moved to device successfully!")
    except Exception as e:
        print(f"❌ Failed to load or move model to device: {e}")
        if "out of memory" in str(e).lower() or "OOM" in str(e).upper():
            print("   💡 Tip: It may be due to insufficient device memory. Try:")
            print("      1. Use a smaller Whisper model (e.g., small, base).")
            print("      2. Ensure compute precision is BF16 (if on TPU).")
            print("      3. 'Restart Runtime' to free all resources, then retry.")
        model = None
else:
    print("⚠️ Skipping model load since Processor is unavailable.")

# 6. Initialize ASR Pipeline
asr_pipeline = None
if model and processor:
    print("\n🔄 Initializing ASR Pipeline...")
    try:
        asr_pipeline = pipeline(
            "automatic-speech-recognition",
            model=model,
            tokenizer=processor.tokenizer,
            feature_extractor=processor.feature_extractor,
            torch_dtype=torch_dtype,
            device=device,
        )
        print("✅ ASR Pipeline initialization successful!")
    except Exception as e:
        print(f"❌ ASR Pipeline initialization failed: {e}")
else:
    print("⚠️ Skipping Pipeline initialization due to model or Processor load failure.")

# 7. XLA Warm-up
if asr_pipeline and tpu_device_acquisition_successful:
    print("\n🔥 Starting XLA warm-up (processing 2 seconds of silence to compile computation graph)...")
    print(f"   Using model: {MODEL_NAME}, Task: {selected_task}, Language (during warm-up): {'auto' if selected_language == 'auto' else selected_language}")

    warmup_chunk_length = chunk_length
    warmup_stride_config = [stride_left, stride_right] if stride_left >= 0 and stride_right >= 0 else None

    print(f"   Warm-up parameters -> Chunks: {warmup_chunk_length}s, Overlap: {warmup_stride_config}")

    dummy_audio_np = np.zeros(16000 * 2, dtype=np.float32)

    generate_pipeline_kwargs_warmup = {"task": selected_task}
    if selected_language.lower() != "auto":
        generate_pipeline_kwargs_warmup["language"] = selected_language

    t_start_warmup = time.time()
    try:
        with torch.no_grad():
            print("   🚀 Performing first warm-up call (XLA compilation in progress, this step may take some time, please wait patiently)...")
            _ = asr_pipeline(
                dummy_audio_np,
                generate_kwargs=generate_pipeline_kwargs_warmup,
                chunk_length_s=warmup_chunk_length,
                stride_length_s=warmup_stride_config,
            )
            xm.mark_step()
            print("   ✅ First warm-up call complete.")

        t_elapsed_warmup = time.time() - t_start_warmup
        print(f"✅ XLA warm-up completed successfully! It took {t_elapsed_warmup:.2f} seconds.")
        print("   TPU is ready, you can run Cell 3 to process your audio files.")
        if tpu_device_acquisition_successful:
            print(f"   📊 TPU memory usage:\n{met.metrics_report()}")
    except Exception as e:
        print(f"❌ XLA warm-up failed: {e}")
        print("   😭 An error occurred during warm-up. Possible reasons:")
        print("      1. PyTorch/XLA version is incompatible with Colab TPU environment (check Cell 1 installation and restart).")
        print("      2. Model is too large, TPU memory is insufficient (OOM).")
        print("      3. The selected language/task/model combination makes XLA compilation difficult.")
        print("   Suggested actions:")
        print("      - Carefully check the package installation logs in Cell 1 to ensure no errors.")
        print("      - Make sure to 'Restart Runtime' after installing packages in Cell 1.")
        print("      - Try using a smaller model (such as 'tiny' or 'base') for testing.")
        import traceback
        traceback.print_exc()
elif not tpu_device_acquisition_successful and asr_pipeline:
    print("\nℹ️ Running on CPU, skipping XLA warm-up.")
else:
    print("\n⚠️ Pipeline was not successfully initialized or it's not a TPU environment, skipping XLA warm-up. Please check error messages in this Cell.")

if 'dummy_audio_np' in locals():
    del dummy_audio_np
gc.collect()
if tpu_device_acquisition_successful:
    xm.wait_device_ops()

🚀 Starting Cell 2: Load model, initialize Pipeline, and XLA warm-up...
✅ torch_xla modules imported successfully.
   Torch Version: 2.7.1+cu126
   Torch XLA Version: 2.7.0
⚙️ Reading user settings...
   Model: openai/whisper-large-v3, Language: auto, Task: transcribe, Precision: bf16
   Long audio parameters -> Chunks: 28s, Overlap: [5s, 5s]
⚠️ Unable to acquire TPU device, error: module 'torch_xla.core.xla_model' has no attribute 'xrt_world_size'
   Please ensure the Colab Runtime is set to TPU (TPU v2). Subsequent processing will fall back to CPU.
   ⚠️ CPU does not natively support BF16; compute precision adjusted to FP32.

🔄 Loading Whisper Processor for openai/whisper-large-v3...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/340 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/283k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.48M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/34.6k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.07k [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


✅ Processor loaded successfully!
🔄 Loading Whisper model openai/whisper-large-v3 (dtype: torch.float32) and moving to device cpu...


config.json:   0%|          | 0.00/1.27k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/3.90k [00:00<?, ?B/s]

✅ Model loaded and moved to device successfully!

🔄 Initializing ASR Pipeline...
✅ ASR Pipeline initialization successful!

ℹ️ Running on CPU, skipping XLA warm-up.


In [None]:
# @title
# --------------------------------------------
# Cell 3: Upload audio files and perform long-form transcription
# --------------------------------------------
print("🚀 Starting Cell 3: Upload audio files and perform long-form transcription...")

# 1. Import necessary modules
from google.colab import files
import librosa
import time
import os
import gc
import numpy as np

# 2. Check if Pipeline is ready
if 'asr_pipeline' not in globals() or asr_pipeline is None:
    print("❌ ASR Pipeline is not initialized or initialization failed. Please successfully run Cell 1 and Cell 2 first.")
    print("   Note: If Cell 1 or Cell 2 indicates you need to restart runtime, be sure to do that before returning.")
else:
    print("✅ ASR Pipeline is ready.")

    # 3. Prompt user to upload audio files
    print("\n📤 Please upload audio files (mp3 / wav / m4a / ogg / flac ...)")
    print("   You can select multiple files at once.")
    try:
        uploaded_files = files.upload()
        if not uploaded_files:
            print(" 🤔 No files were uploaded.")
        else:
            print(f"📂 Successfully uploaded {len(uploaded_files)} files.")
    except Exception as e:
        print(f"❌ Error occurred during file upload: {e}")
        uploaded_files = {}

    # 4. Define logging function
    def show_log(msg, prefix=""):
        print(f"{prefix}{time.strftime('[%Y-%m-%d %H:%M:%S]')} {msg}")

    # 5. Process each file
    if uploaded_files:
        # Re-read settings from the UI widgets in Cell 1
        _current_language_option = language_dropdown_widget.value
        if _current_language_option == "custom":
            current_language = language_text_widget.value.strip().lower()
            if not current_language:
                current_language = "auto"
        else:
            current_language = _current_language_option
        current_task = task_widget.value
        current_chunk_length = max(1, chunk_length_s_widget.value) if chunk_length_s_widget.value > 0 else 30
        current_stride_left = max(0, stride_length_s_left_widget.value)
        current_stride_right = max(0, stride_length_s_right_widget.value)
        # Get the friendly name for the output file
        current_model_name = [name for name, model_id in model_widget.options if model_id == model_widget.value][0].split(" ")[0].lower()


        # Simplified generate_kwargs to resolve potential conflicts
        generate_pipeline_kwargs_main = {"task": current_task}
        if current_language.lower() != "auto":
            generate_pipeline_kwargs_main["language"] = current_language

        main_stride_config = [current_stride_left, current_stride_right] if current_stride_left >= 0 and current_stride_right >= 0 else None

        show_log(f"📝 Starting to process {len(uploaded_files)} audio files...")
        show_log(f"   Settings -> Language: {current_language}, Task: {current_task}", prefix="  ")
        show_log(f"   Audio processing -> Chunk: {current_chunk_length}s, Overlap: {main_stride_config}", prefix="  ")

        total_audio_duration_processed = 0
        total_transcription_time = 0

        for i, (fname_original, file_content) in enumerate(uploaded_files.items()):
            show_log(f"--- [{i+1}/{len(uploaded_files)}] Starting to process file: {fname_original} ---", prefix="➡️ ")

            safe_fname = "".join(c if c.isalnum() or c in ('.', '_', '-') else '_' for c in fname_original)
            temp_audio_path = f"./{safe_fname}"

            try:
                with open(temp_audio_path, "wb") as f:
                    f.write(file_content)
                show_log(f"Temporary file written: {temp_audio_path}", prefix="  ")

                audio_duration_seconds = 0.0
                try:
                    y, sr = librosa.load(temp_audio_path, sr=16000, mono=True)
                    audio_duration_seconds = float(librosa.get_duration(y=y, sr=sr))
                    show_log(f"Audio duration: {audio_duration_seconds:.2f} seconds (resampled to 16kHz)", prefix="  ")
                    total_audio_duration_processed += audio_duration_seconds
                    del y
                    gc.collect()
                except Exception as e_librosa:
                    show_log(f"⚠️ Failed to get audio info with librosa: {e_librosa}. Will continue to attempt transcription.", prefix="  ")

                show_log(f"🤖 Transcribing with ASR Pipeline (model: {model_widget.value})...", prefix="  ")
                t_transcribe_start = time.time()

                with torch.no_grad():
                    output = asr_pipeline(
                        temp_audio_path,
                        chunk_length_s=current_chunk_length,
                        stride_length_s=main_stride_config,
                        generate_kwargs=generate_pipeline_kwargs_main,
                        return_timestamps=False,
                    )

                if 'tpu_device_acquisition_successful' in globals() and tpu_device_acquisition_successful and 'xm' in globals():
                    xm.mark_step()

                t_transcribe_elapsed = time.time() - t_transcribe_start
                total_transcription_time += t_transcribe_elapsed

                transcription_text = output["text"] if isinstance(output, dict) and "text" in output else str(output)
                show_log(f"✅ Transcription completed, took {t_transcribe_elapsed:.2f} seconds.", prefix="  ")
                if audio_duration_seconds > 0.001:
                    rtf = t_transcribe_elapsed / audio_duration_seconds
                    show_log(f"   Real-Time Factor (RTF): {rtf:.3f} (lower is faster, <1 means faster than real-time)", prefix="  ")

                preview_length = 250
                preview = transcription_text[:preview_length] + ("..." if len(transcription_text) > preview_length else "")
                print(f"\n📜 Transcript Preview (first {preview_length} characters):\n\"{preview}\"")

                base_fname_no_ext, _ = os.path.splitext(safe_fname)
                lang_suffix = current_language if current_language.lower() != "auto" else "auto"
                out_filename = f"{base_fname_no_ext}_transcript_{current_model_name}_{lang_suffix}.txt"

                save_path = f"/content/{out_filename}"
                with open(save_path, "w", encoding="utf-8") as f:
                    f.write(transcription_text)
                show_log(f"💾 Full transcript saved to (Colab file system): {save_path}", prefix="  ")

            except Exception as e_file_proc:
                show_log(f"❌ A serious error occurred while processing file {fname_original}: {e_file_proc}", prefix="  ")
                import traceback
                traceback.print_exc()
            finally:
                if os.path.exists(temp_audio_path):
                    try:
                        os.remove(temp_audio_path)
                    except Exception as e_del:
                        show_log(f"⚠️ Failed to delete temporary file {temp_audio_path}: {e_del}", prefix="  ")

                gc.collect()
                if 'tpu_device_acquisition_successful' in globals() and tpu_device_acquisition_successful and 'xm' in globals():
                    xm.wait_device_ops()
                show_log(f"--- File {fname_original} processing complete ---\n", prefix="⬅️ ")

        show_log("🎉🎉🎉 All audio files have been processed! 🎉🎉🎉", prefix="🏁 ")
        if total_audio_duration_processed > 0.001 and total_transcription_time > 0:
            overall_rtf = total_transcription_time / total_audio_duration_processed
            show_log(f"Total audio duration: {total_audio_duration_processed:.2f} seconds", prefix="📊 ")
            show_log(f"Total transcription time: {total_transcription_time:.2f} seconds", prefix="📊 ")
            show_log(f"Overall Real-Time Factor (RTF): {overall_rtf:.3f}", prefix="📊 ")
        show_log("Please go to the left 'Files' panel in Colab (folder icon) to download the *_transcript.txt files.", prefix="🏁 ")
        if 'tpu_device_acquisition_successful' in globals() and tpu_device_acquisition_successful and 'met' in globals() and 'xm' in globals():
            show_log(f"Final TPU memory usage:\n{met.metrics_report()}", prefix="📊 ")

    elif not uploaded_files and 'asr_pipeline' in globals() and asr_pipeline is not None:
        show_log("🤔 No files selected for transcription. If you have uploaded, please check the file list.")

🚀 Starting Cell 3: Upload audio files and perform long-form transcription...
✅ ASR Pipeline is ready.

📤 Please upload audio files (mp3 / wav / m4a / ogg / flac ...)
   You can select multiple files at once.


Saving Failures.m4a to Failures.m4a
📂 Successfully uploaded 1 files.
[2025-06-05 20:08:42] 📝 Starting to process 1 audio files...
  [2025-06-05 20:08:42]    Settings -> Language: auto, Task: transcribe
  [2025-06-05 20:08:42]    Audio processing -> Chunk: 28s, Overlap: [5, 5]
➡️ [2025-06-05 20:08:42] --- [1/1] Starting to process file: Failures.m4a ---
  [2025-06-05 20:08:42] Temporary file written: ./Failures.m4a


	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


  [2025-06-05 20:08:46] Audio duration: 1583.49 seconds (resampled to 16kHz)
  [2025-06-05 20:08:46] 🤖 Transcribing with ASR Pipeline (model: openai/whisper-large-v3)...




  [2025-06-05 20:24:45] ✅ Transcription completed, took 958.41 seconds.
  [2025-06-05 20:24:45]    Real-Time Factor (RTF): 0.605 (lower is faster, <1 means faster than real-time)

📜 Transcript Preview (first 250 characters):
" As you see in the picture. An example of such strict serializability is etcd, which is essentially a key-value store that uses Raft consensus algorithm under the hood to store and serve requests without breaking the consistency. etcd is a critical c..."
  [2025-06-05 20:24:45] 💾 Full transcript saved to (Colab file system): /content/Failures_transcript_large-v3_auto.txt
⬅️ [2025-06-05 20:24:45] --- File Failures.m4a processing complete ---

🏁 [2025-06-05 20:24:45] 🎉🎉🎉 All audio files have been processed! 🎉🎉🎉
📊 [2025-06-05 20:24:45] Total audio duration: 1583.49 seconds
📊 [2025-06-05 20:24:45] Total transcription time: 958.41 seconds
📊 [2025-06-05 20:24:45] Overall Real-Time Factor (RTF): 0.605
🏁 [2025-06-05 20:24:45] Please go to the left 'Files' panel in Colab (