<a href="https://colab.research.google.com/github/tanyaclement/audio-class/blob/main/pyannote.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[`pyannote.audio`](https://github.com/pyannote/pyannote-audio) is an open-source toolkit written in Python for **speaker diarization**.

Based on [`PyTorch`](https://pytorch.org) machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines.

`pyannote.audio` also comes with pretrained [models](https://huggingface.co/models?other=pyannote-audio-model) and [pipelines](https://huggingface.co/models?other=pyannote-audio-pipeline) covering a wide range of domains for voice activity detection, speaker segmentation, overlapped speech detection, speaker embedding reaching state-of-the-art performance for most of them.

**This notebook will teach you how to apply those pretrained pipelines on your own data.**

Make sure you run it using a GPU (or it might otherwise be slow...)

## Installation

In [None]:
!pip install -qq pyannote.audio==3.1.1
!pip install -qq ipython==7.34.0

This nice visualization is brought to you by [`pyannote.core`](http://pyannote.github.io/pyannote-core/) and basically indicates when each speaker speaks.

In [None]:
from pyannote.audio import Audio
from IPython.display import Audio as IPythonAudio
waveform, sr = Audio(mono="downmix").crop(DEMO_FILE, EXCERPT)
IPythonAudio(waveform.flatten(), rate=sr)

# Processing your own audio file (optional)

In case you just want to go ahead with the demo file, skip this section entirely.

In case you want to try processing your own audio file, proceed with running this section. It will offer you to upload an audio file (preferably a `wav` file but all formats supported by [`SoundFile`](https://pysoundfile.readthedocs.io/en/latest/) should work just fine).

## Upload audio file

In [None]:
from pyannote.core import Segment, notebook # Import 'notebook' from pyannote.core
import google.colab

own_file, _ = google.colab.files.upload().popitem()
OWN_FILE = {'audio': own_file}
notebook.reset() # Now 'notebook' is defined and can be used

# load audio waveform and play it
waveform, sample_rate = Audio(mono="downmix")(OWN_FILE)
IPythonAudio(data=waveform.squeeze(), rate=sample_rate, autoplay=True)

# Speaker diarization with `pyannote.pipeline`

We are about to run a full speaker diarization pipeline, that includes speaker segmentation, speaker embedding, and a final clustering step. **Brace yourself!**

To load the speaker diarization pipeline,

* accept the user conditions on [hf.co/pyannote/speaker-diarization-3.1](https://hf.co/pyannote/speaker-diarization-3.1)
* accept the user conditions on [hf.co/pyannote/segmentation-3.0](https://hf.co/pyannote/segmentation-3.0)
* login using `notebook_login` below

In [None]:
from huggingface_hub import notebook_login
notebook_login()

In [None]:
# load pretrained pipeline
from pyannote.audio import Pipeline

# Specify the number of expected speakers
num_speakers = 2  # Replace with your desired number

# Ensure that use_auth_token is correctly passed to from_pretrained
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token="ADD-OWN-TOKEN"
)

# Add a check to see if pipeline is loaded correctly.
if pipeline is None:
    raise ValueError("Pipeline failed to load. Check your internet connection and authentication token.")

# send pipeline to GPU (when available)
import torch
if torch.cuda.is_available():
    pipeline.to(torch.device("cuda"))

# run the pipeline (with progress bar and num_speakers)
from pyannote.audio.pipelines.utils.hook import ProgressHook
with ProgressHook() as hook:
    diarization = pipeline(DEMO_FILE, hook=hook, num_speakers=num_speakers)


That's it? Yes, that's it :-)

In [None]:
diarization

In [None]:

# Convert diarization to a tab-delimited string
def diarization_to_tsv(diarization, file_name='diarization_results.txt'):
  """Converts a pyannote.core.Annotation object to a tab-delimited string.

  Args:
    diarization: A pyannote.core.Annotation object containing the diarization results.
    file_name: The name of the output file. Defaults to 'diarization_results.txt'.

  Returns:
    None. Saves the tab-delimited string to a file.
  """

  # The with open statement is now inside the function,
  # using the file_name parameter.
  with open(file_name, 'w') as f:
    for segment, track, label in diarization.itertracks(yield_label=True):
      start_time = round(segment.start)  # Round start time to nearest whole number
      end_time = round(segment.end)  # Round end time to nearest whole number
      speaker_label = label
      f.write(f"{start_time}\t{end_time}\t{speaker_label}\n") # Removed '.3f' for whole numbers

# ... (After running the diarization pipeline) ...

# Save the diarization results to a tab-delimited file
diarization_to_tsv(diarization) # Call the function with the diarization object

# Download the file in Google Colab
from google.colab import files
files.download('diarization_results.txt')