# Individually identifying songs of Great Tits (*Parus major*)

The Great Tit (*Parus major*) is known for its varied song repertoire. Seventy song types are known, with each individual's repertoire including up to eight song types. 

<center><img src="https://i.guim.co.uk/img/media/b98326a736e0c4e5d88846102bef16414b6450a5/0_0_4960_2976/master/4960.jpg?width=1200&height=900&quality=85&auto=format&fit=crop&s=b693c659d0b233d394e2fa3e28863701" alt="Great Tit (Parus major) singing" width="300" /></center>

In this exercise we will be working with the [Wytham Great Tit Song Dataset](https://nilomr.github.io/great-tit-hits/), which includes a large amount of Great Tit songs recorded in the wild. Each burst of song is identified by the individual that sung it as well as by the song type.

We will inspect a handful of examples from each individual, with all examples belonging to the same song type for that individual. Using a pretrained convolutional neural network, we will generate feature embeddings for the spectrogram of each song, then use t-SNE to cluster these embeddings. We will see that the convolutional neural network picks up on enough distinctive features of each song to allow us to cluster them well by individual.

## Import packages

This notebook should be run in an environment containing OpenSoundscape (see instructions for your system [here](https://opensoundscape.readthedocs.io/en/latest/index.html)) and `tensorflow-cpu`.

Alternatively, you may install them on a Google Colab notebook using the cell below:

In [None]:
# if this is a Google Colab notebook, install opensoundscape in the runtime environment
if 'google.colab' in str(get_ipython()):
  %pip install opensoundscape==0.10.1
  %pip install tensorflow-cpu

The following cell make take a while...!

In [None]:
import pandas as pd
import json
from pathlib import Path
from opensoundscape.audio import Audio
from opensoundscape.spectrogram import Spectrogram
import torch

## Download files

These files are located on our Google Drive folder here: https://drive.google.com/drive/u/0/folders/1-XOEcKHOSiDSIlSHElOeJY5Jza4Hc-7v but you likely already downloaded them if you followed the steps in the README.

They should be downloaded and placed in the `data` folder of this repository.

In [None]:
wav_folder = "../data/Merino-Recalde-Wytham-Great-Tit-Dataset-2023-subset/wytham_songs/"
csv_path = "../data/Merino-Recalde-Wytham-Great-Tit-Dataset-2023-subset/wytham_annotations.csv"

## Load data

Start with a list of wavfiles.

In [None]:
# Wavfiles from subset of the Wytham Great Tit dataset
wavfiles = list(Path(wav_folder).glob("*.wav"))
wavfiles = sorted(wavfiles)

Now, get the annotations for these wavfiles and check out what this dataframe looks like.

In [None]:
annots_df = pd.read_csv(csv_path, index_col=0)
annots_df.head()

The `ID` column contains individual IDs. How many individual IDs are in this dataset?

In [None]:
print("Number of individuals:", len(annots_df.ID.unique()))

The `class_id` contains the class of the song. Note that this subset of data was specifically chosen to only have one class_id per individual (see the end of this notebook for more discussion on that)!

If you download the original Great Tit dataset (https://osf.io/n8ac9/) and want to reproduce this subsetting, only looking at songs from an individual that are of the same song type as each other, you could use the following code to produce a dataframe of the most common song type per individual:
```
subset_df = annots_df.value_counts(["ID", "class_id"]).reset_index().drop_duplicates("ID")
```


Let's look at a subsample of songs from the 20 most common song types in our dataset.

In [None]:
# Get the 20 most common song types
top_song_types = annots_df.value_counts(["class_id"]).reset_index()[:20]

# Get all annotations for these songtypes
all_annots_songtypes = annots_df[annots_df.class_id.isin(top_song_types.class_id.values)]

# Randomly sample 20 recordings per song type"
test_songs = all_annots_songtypes.groupby("class_id").sample(30, random_state=3)

# Make sure there are 20 per song type
test_songs.class_id.value_counts()

Let's visualize an example using OpenSoundscape:

In [None]:
# Get a single song ID
idx = 1
song_id = test_songs.index[idx]
annots = test_songs.loc[song_id]
wavfile = Path(wav_folder + song_id + ".wav")

# Load the audio of this song
individ_audio = Audio.from_file(wavfile)

# View and listen to the song
Spectrogram.from_audio(individ_audio).plot()
individ_audio.show_widget()

There's a lot of noise in the recording. One way we could explore noise reduction is to bandpass the file with the annotated frequency limits...

In [None]:
individ_audio = Audio.from_file(wavfile).bandpass(annots.lower_freq, annots.upper_freq, order=4)
Spectrogram.from_audio(individ_audio).plot()

But for simplicity, in this notebook, we will just use the raw audio.

## Generate embeddings

Now let's load a model we can use to generate unsupervised feature embeddings for recordings

In [None]:
import torch
m = torch.hub.load('kitzeslab/bioacoustics-model-zoo', 'BirdNET',trust_repo=True)

In order to use this model, we need to create a list of files in a pandas DataFrame. Each row supplies one 3s long clip.
* `filename` of each recording to predict on
* `start_time` and `end_time` within each filename.
    * Generally, clips should be 3s long for this model.
    * If you are working with long audio files and want to generate embeddings from the whole audio file, you will want to include many rows for one `filename` each with a different `start_time` and `end_time`
    * You can generate embeddings with overlap (e.g. for a clip from 0s to 3s, a clip from 1s to 4s, a clip from 2s to 5s, etc.) by creating rows for this.
 
Note that many of the clips in this dataset are not near 3s long, so this is just a first approximation.

In [None]:
test_wavfiles = [wav_folder + song_id + ".wav" for song_id in test_songs.index]

# Create a formatted dataframe as required by OpenSoundscape
wavfiles_df = pd.DataFrame(test_wavfiles)
wavfiles_df.columns = ["file"]
wavfiles_df["start_time"] = 0
wavfiles_df["end_time"] = 3

# Need to set these column names as the index for OpenSoundscape
wavfiles_df = wavfiles_df.set_index(["file", "start_time", "end_time"])

Use the model to generate embeddings for all of the recordings. 

This cell will take a little while to run; on the author's system, it took about 40 seconds to run (6-7 seconds per 100 files).

In [None]:
%%time
%%capture --no-stdout
# This will raise many warnings due to the differing 
# lengths of the wavfiles, so in this cell we catch all output
embeddings = m.generate_embeddings(wavfiles_df)

Use t-SNE to dimensionally reduce the embeddings. t-SNE is a method for dimensionally reducing high-dimensional data. It is a stochastic, iterative process. Parameters you can modify:
* You can change the `random_state` below to initialize t-SNE at a different random starting point.
* You can also increase the number of iterations (`n_iter`), which generally improves the tightness of clusters.
* You can change the `perplexity`, but since we know there are 30 datapoints per class, 30 is probably a good number. In your own applications, if you're not sure how many datapoints there are per class, you can try different numbers to see which produces the best results.

In [None]:
from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, random_state=42, perplexity=30, n_iter=5000)
embeddings_tsne = tsne.fit_transform(embeddings)

Plot the dimensionally reduced embeddings, colored by the ID of each individual:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import cm
fig, ax = plt.subplots()

individual_ids = test_songs.ID
color_map = iter(cm.viridis(np.linspace(0, 1, len(individual_ids.unique()))))
for i, individual_idx in enumerate(np.unique(individual_ids)):
    color = next(color_map)
    embeddings_individual = embeddings_tsne[individual_ids == individual_idx]
    ax.scatter(embeddings_individual[:, 0], y=embeddings_individual[:, 1], label=individual_idx, color=color, alpha=0.5)
ax.legend(ncols=2, bbox_to_anchor=(1, 1), title="Individual ID")
plt.title("t-SNE dimensionality reduction of embeddings of\n songs of individual Great Tits (Parus major)")
plt.show()

This looks fantastic, though not perfect!

Let's see what each of these individual's songs actually look like!

In [None]:
songs_plot_per_individual = 4
fig, axs = plt.subplots(20, songs_plot_per_individual, figsize=(10,20))
for individual_idx, (individual_id, df) in enumerate(test_songs.groupby(["ID"])):
    for song_idx, song_id in enumerate(df.index[:songs_plot_per_individual]):
        # Get the axis in the correct position
        _ = plt.sca(axs[individual_idx, song_idx])
        ax = plt.gca()
        
        # Get the spectrogram and plot it
        s = Spectrogram.from_audio(Audio.from_file(wav_folder + song_id + ".wav")).bandpass(2000,7000).spectrogram
        _ = plt.imshow(s, aspect=0.3, cmap="Greys")
        ax.invert_yaxis() # When using imshow, have to invert this
        
        # Nice formatting
        # Hide X and Y axes label marks
        ax.xaxis.set_tick_params(labelbottom=False)
        ax.yaxis.set_tick_params(labelleft=False)
        # Hide X and Y axes tick marks
        ax.set_xticks([])
        ax.set_yticks([])
        if song_idx == 0:
            ax.set_title(individual_id[0])
plt.tight_layout()

# Potential next steps

## Improving clustering

One of the biggest challenges of clustering the sounds of wild animals is that other noises overlapping the sound can affect clustering performance. Some options for noise reduction include:
* Modifying the pipeline to bandpass the recordings to the annotated range
* Applying noise reduction using the Python `noisereduce` package (doesn't work as well on short recordings) or the [Google MixIT bird separation model](https://github.com/kitzeslab/bioacoustics-model-zoo?tab=readme-ov-file#mixit-bird-separationmodel) implemented in OpenSoundscape

## Inspecting similar song types across individuals

The subset of recordings includes the song most commonly sung for 42 individuals in "part 1" of the Great Tit dataset. Over 70 different song types are known, with an individual singing up to eight different song types. While "song types" are each very different from each other, two different individuals singing the same song type might actually sound quite similar.

A future approach could be to select a song type that many individuals use, and see if individuals can still be identified by the particular quirks in how they sing that song type!