# Machine learning: prediction

Machine learning-trained algorithms can predict whether bioacoustic recordings contain a sound of interest. For instance, an algorithm trained how to detect the sound of a Wood Thrush can be used to predict where Wood Thrushes vocalize in a set of autonomous recordings. 

Note: 
- The Kitzes Lab, the developers of OpenSoundscape, pre-trained a series of [baseline machine learning models](https://pitt.box.com/s/a6jeamnew098vp5a9a7m1h9j5rce6t6y) that can be used to predict the presence of [506 species of common North American birds](https://pitt.app.box.com/s/d0snd1tyilscksbxc36q2slz6s4aa2ag). These are our "beta" models and are for demonstration purposes only, not for research use. We hope to make our more accurate models available soon. 
- If we are to use these machine learning models for research, contact the [Kitzes Lab](https://kitzeslab.org).
- There are example models provided by the Kitzes lab here:


## Resample the data
We want to resample the data to 44.1kHz

## Import modules
Import the following modules to run a pre-trained machine learning learning classifier. First, from OpenSoundscape we will need two classes (`Audio` and `SingleTargetAudioDataset`) and three functions (`run_command`, `lowercase_annotations`, and `predict`).

In [1]:
from opensoundscape.audio import Audio
from opensoundscape.datasets import SingleTargetAudioDataset
from opensoundscape.helpers import run_command
from opensoundscape.raven import lowercase_annotations
from opensoundscape.torch.predict import predict

Import the following machine learning-related modules. OpenSoundscape uses PyTorch to do machine learning.

In [2]:
import torch
import torch.nn
import torchvision.models
import torch.utils.data
import torchvision.transforms

Lastly, use a few miscellaneous functions.

In [3]:
import yaml
import os.path
import pandas as pd
from pathlib import Path
from math import floor

## Download model

To use the model, it must be downloaded onto your computer and loaded with the same specifications it was created with.

Download the example model for Wood Thrush, *Hylocichla mustelina*. First, create a folder called `"prediction_example"` to store the model and its data in.

To do: 
- Access the trained model

In [4]:
folder_name = "prediction_example"
folder_path = Path(folder_name)
if not folder_path.exists(): folder_path.mkdir()

## Load model
At its core, a machine learning model consists of two things: its architecture and its weights. 

The architecture is the complex structure of the model, which in this case, is a convolutional neural network. Convolutional neural networks are a particular set of algorithms especially suited to extracting and interpreting features from images, such as combinations of lines, dots, and edges. In this case, we use a `resnet18` convolutional neural network. After feature extraction, the convolutional neural network's features are passed to a classifier. The classifier decides how to weight each feature in predicting the final class identity. The model was trained with a `Linear` classifier.

Create the architecture of the model. First, designate the model as a `resnet18` CNN.

In [5]:
model = torchvision.models.resnet18(pretrained=False)

Then, add the `fc` layers. "FC" stands for "fully connected". To set up the proper architecture, we need to specify the correct number of input features, output features, and classifier type. 

The number of input features to the FC is equal to the number of features extracted from the convolutional neural network and passed to the the FC layer: `model.fc.in_features`

In [6]:
num_cnn_features = model.fc.in_features

The models were trained to predict two classes (bird present and bird absent), so the number of output features of the FC layer is 2.

In [7]:
num_classes = 2

Finally, the classifier type is a `torch.nn.Linear` classifier.

In [8]:
model.fc = torch.nn.Linear(
    in_features = num_cnn_features,
    out_features = num_classes)

The weights of the model are distinguished from its architecture because, while the architecture is decided by humans, the weights of the architecture are learned during the machine learning process. When downloading the machine learning model, you downloaded the weights. 

First, use `torch.load` to get the model weights from the filename.

In [9]:
epoch9 = torch.load('/home/e4e/Desktop/model_train_results/epoch-9.tar')
weights = epoch9['model_state_dict'] # the last epoch


Checking the key of the weights of the epoch compared to the example weights. 

In [10]:
print('type(weights)', type(weights))


type(weights) <class 'collections.OrderedDict'>


Now, load the weights into the architecture we have created. After this the model is almost ready to use.

In [11]:
model.load_state_dict(weights)

<All keys matched successfully>

## Prepare prediction files

To actually use the model, we need to download and prepare a set of recordings. The model was trained to make predictions on spectrograms made from 10 second-long recordings (from 'train-xenocanto-binary-bird-classifier' file), so we will have to split the recordings up and transform them into spectrograms.

As example data, we have provided a 1 minute-long soundscape which contains Wood Thrush vocalizations. 

For reference check out this [link](https://github.com/kitzeslab/opensoundscape/blob/master/docs/predict.ipynb)

The example soundscape must be split up into soundscapes of the same size as the ones the model was trained on. In this case, the soundscapes should be 10s long.

In [12]:
data_directory = folder_path.joinpath("/home/e4e/Desktop/binary_bird_small_testset/original_data")
print(data_directory)

/home/e4e/Desktop/binary_bird_small_testset/original_data


Now, we want to set a list of directories for each file. If you are on a mac then make sure to use the folloding bash command: find . -name ".DS_Store" -delete" in order to delete the .DS_Store file. 

In [13]:
all_file_dir = []
all_file_names = [] # get rid of the file extension
for filename in os.listdir(data_directory): 
    all_file_names.append(os.path.splitext(filename)[0])
    filepath = os.path.join(data_directory, filename)
    all_file_dir.append(filepath)


To check the first couple of file directory strings and the filenames. 

In [14]:
print(all_file_dir[:10])
print(all_file_names[:10])

['/home/e4e/Desktop/binary_bird_small_testset/original_data/_8MvhMlbwiE_40.000_clip_log.csv', '/home/e4e/Desktop/binary_bird_small_testset/original_data/XC22193 - Red-necked Woodpecker - Campephilus rubricollis_5s-10s.wav', '/home/e4e/Desktop/binary_bird_small_testset/original_data/XC22099 - White-bellied Tody-Tyrant - Hemitriccus griseipectus_5s-10s.wav', '/home/e4e/Desktop/binary_bird_small_testset/original_data/XC22035 - White-bellied Tody-Tyrant - Hemitriccus griseipectus_clip_log.csv', '/home/e4e/Desktop/binary_bird_small_testset/original_data/XC22211 - White-bellied Tody-Tyrant - Hemitriccus griseipectus_5s-10s.wav', '/home/e4e/Desktop/binary_bird_small_testset/original_data/_8MvhMlbwiE_40.000.wav', '/home/e4e/Desktop/binary_bird_small_testset/original_data/_6spzSHLoY0_30.000.wav', '/home/e4e/Desktop/binary_bird_small_testset/original_data/XC22211 - White-bellied Tody-Tyrant - Hemitriccus griseipectus_20s-25s.wav', '/home/e4e/Desktop/binary_bird_small_testset/original_data/XC2201

Next, load the files as an `Audio` object and then split the recording in the audio object into segments of a specified length. Here, we split the clips into a 10s length, specify the directory they should be saved in (the `split_directory` just created) and specify the prefix of the file.

In [15]:
split_directory = "/home/e4e/Desktop/binary_bird_small_testset/split_data_files"

In [16]:
for i in range(len(all_file_dir)): 
    base_file = Audio.from_file(all_file_dir[i])
    
    base_file.split_and_save(
        clip_length=5,
        destination = split_directory,
        name = all_file_names[i]
        dry = True # do not save the split wav files
    ) 
    

NoBackendError: 

Note that split_and_save outputs one .csv file for each file. So, let us compile all the .csv files into one in this for loop.

In [None]:
import glob
os.chdir(split_directory)

extension = 'csv'
csv_filenames = [i for i in glob.glob('*.{}'.format(extension))]
csv_filenames = [f for f in csv_filenames if not f.endswith('combined_csv.csv')]

combined_csv = pd.concat([pd.read_csv(f, index_col='file')
    for f in csv_filenames], ignore_index=False
)

print(combined_csv.columns)
print(combined_csv['file'])

#export to csv
combined_csv.to_csv("combined_csv.csv", index=True)


In [None]:
print(combined_csv)

The output of this function, `split_files`, is a dataframe which identifies the filename of each split and its start and end times:

In [None]:
combined_csv[:10]
#print(combined_csv['file_path'])

In [None]:
combined_csv.head()

Prepend the "file" index with the directory in which the files are found so that the computer can refer to the files:

In [None]:
print(combined_csv.index[:10])
print(combined_csv[:10])

In [None]:
# combined_csv['file_path']
print(type(combined_csv.index))
print(combined_csv.index) #need to change into chronological order
print(type(combined_csv))
print(combined_csv)

combined_csv['file_path'] = [split_directory.joinpath(csv) for csv in combined_csv.index]


## Create a Dataset

Now that the data are split, we can create a "dataset" from them. 

To create a dataset requires that we create a dictionary that associates numeric labels with the class names: 1 is for predicting a bird's presence; 0 is for predicting a bird's absence. In more recent versions of the model, this label dictionary is packaged with the model download.

In [None]:
label_dict = {0:'bird-absent', 1:'bird-present'}

Now we create a `SingleTargetAudioDataset`. This structure enables the data to be transformed into spectrograms when `test_dataset` is accessed.

In [None]:
test_dataset = SingleTargetAudioDataset(
    combined_csv,
    filename_column = "file_path",
    label_dict = label_dict,
    save_dir=split_directory,
)

The `test_dataset` is a list of dictionaries. Each element of the list contains a dictionary for one of the files to predict on.

In [None]:
len(test_dataset)

Each dictionary in `test_dataset` has one or two keys. In all cases, the dictionary has a key `'X'` which refers to the spectrogram. If a dataset is created with true labels, the dictionary also has a `'y'` key which links to the true label. Because it is unknown which of these files contain Wood Thrush songs, no true labels were given when creating the dataset.

The spectrogram itself is stored as a PyTorch tensor. For example, here is the tensor of the first spectrogram:

In [None]:
first_tensor = test_dataset[0]['X']
first_tensor

To view this spectrogram, use PyTorch's `transforms.ToPILImage()` function. This function returns a transformer. Call the transformer on the first tensor to display the spectrogram as an image.

In [None]:
transformer = torchvision.transforms.ToPILImage()
transformer(first_tensor)

## Use model on prediction files

Finally, the model can be used for prediction. Use OpenSoundscape's `predict` function to call the model on the test dataset. The `label_dict` created above is used to make the classes interpretable; otherwise, the classes would just be numbered.

Official documentation of train function from opensoundscape is [here](https://github.com/kitzeslab/opensoundscape/blob/master/opensoundscape/torch/predict.py)

Notes: 
What is tensor?
"A PyTorch Tensor is basically the same as a numpy array: it does not know anything about deep learning or computational graphs or gradients, and is just a generic n-dimensional array to be used for arbitrary numeric computation." [source](https://pytorch.org/tutorials/beginner/examples_tensor/two_layer_net_tensor.html)

Probability score: 
- Input a binary pytorch model
- Uses enumerate(dataloader) for multi-process data loading
- Uses softmax function which outputs numbers that represent probabilities where each number's value is between 0 and 1

Questions: 
- What is the range of numbers here and what do they represent? 
- What is Dataloader?

In [None]:
prediction_df = predict(model, test_dataset, apply_softmax=True, label_dict=label_dict)

In [None]:
prediction_df

Let us save the results of the model being used for prediction as a csv.

In [None]:
prediction_df.to_csv('prediction_results.csv')

Finally, this command "cleans up" by deleting all the downloaded files and results. Only run this if you are ready to remove the results of this analysis.

In [None]:
import shutil
shutil.rmtree(folder_path)