In [1]:
import os, sys
sys.path.append("../../")
from common.base import MetaData, Processor
from common.predictors import TensorflowFSDSINet

## Download Pretrained Models

In this step, we download and save four machine learning models and their associated metadata files. These models are used for audio feature extraction and classification. We define the URLs for the models and metadata files at the beginning of the code, and we use the `os.path.join()` function to define the file paths where the downloaded files will be saved.

Before downloading the files, we check if they already exist in the specified file paths using the `os.path.exists()` function. If any of the files don't exist, we use the `wget` command to download the files from the URLs and save them to the specified file paths. We use the `-q` option to suppress output from the `wget` command, so the code doesn't print any messages to the console.

In [2]:
# embedding model
embedding_model_url = ''
embedding_model_json_url = ''
embedding_model_file = os.path.join('../..', 'models', os.path.basename(embedding_model_url))
embedding_model_json = os.path.join('../..', 'models', os.path.basename(embedding_model_json_url))

# classification model
model_url = 'https://essentia.upf.edu/models/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-tlpf_aps-1.pb'
model_json_url = 'https://essentia.upf.edu/models/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-tlpf_aps-1.json'
model_file = os.path.join('../..', 'models', os.path.basename(model_url))
model_json = os.path.join('../..', 'models', os.path.basename(model_json_url))


In [3]:
has_embedding_model = True if(embedding_model_json_url != '' and embedding_model_url != '') else False
if(not os.path.exists(model_json)):
    !wget -q $model_json_url -P ../../models
if(not os.path.exists(model_file)):
    !wget -q $model_url -P ../../models
if(has_embedding_model):
    if(not os.path.exists(embedding_model_file)):
        !wget -q $embedding_model_url -P ../../models
    if(not os.path.exists(embedding_model_json)):
        !wget -q $embedding_model_json_url -P ../../models

## Load Audio

Audio file can be a single file or a directory of sounds

In [4]:
# open a file dialog to select an audio file or set open_file_dialog to False and set file_path to the audio file
from common.base import file_dialog as fd 
open_file_dialog = False
file_path = '/Users/dtef/Repos/Sync/Audio/Sounds/Sound Libraries/Tef/OK 01/Burning Tornadic Debris Stereo_01/short'
if(open_file_dialog):
    audio_file = fd()
else: audio_file = file_path
    
processor = Processor(audio_file_path=audio_file)
# gets a list of audio files in the selected directory
audio_files = processor.audio_files

## Metadata

Next we have to initialise a `MetaData` object with the `model_json` argument, which is assumed to be a file path to a JSON file containing metadata for the TensorFlow model. The `MetaData` object is used to extract the schema and layer information for the model. We do this here since we don't want to initialise a unique instance of the metadata for each audio file we are processing. 

The `get_schema()` method of the `MetaData` object is used to extract the schema for the model. The schema is a dictionary that contains information about the model's input and output layers, including their names, shapes, and data types. The schema is printed to the console using the `print()` function.

The `get_layer()` method of the `MetaData` object is used to extract the input and output layers for the model. The `model_input_layer` and `model_output_layer` variables are assigned the input and output layers, respectively. These variables can be used later in the code to feed data into the model and extract the model's output.



####    Embeddings metadata

In [5]:
if(has_embedding_model):
    embeddings_metadata = MetaData(embedding_model_json)
    embeddings_schema = embeddings_metadata.get_schema()

    embeddings_input_layer = embeddings_metadata.get_layer('input')
    print("Detected input layer: ", embeddings_input_layer)
    embeddings_output_layer = embeddings_metadata.get_layer('output', 'embeddings')
    print("Detected output layer: ", embeddings_output_layer)

    print(embeddings_schema)

####  Predictor's metadata

In [6]:
classifier_metadata = MetaData(model_json)
classifier_schema = classifier_metadata.get_schema()

classifier_input_layer = classifier_metadata.get_layer('input')
print("Detected input layer: ", classifier_input_layer)
classifier_output_layer = classifier_metadata.get_layer('output', 'predictions')
print("Detected output layer: ", classifier_output_layer)

print(classifier_schema)

Detected input layer:  x
Detected output layer:  model/predictions/Sigmoid
{'inputs': [{'name': 'x', 'type': 'float', 'shape': ['batch_size', 96, 101, 1]}], 'outputs': [{'name': 'model/predictions/Sigmoid', 'type': 'float', 'shape': ['batch_size', 200], 'op': 'fully connected', 'output_purpose': 'predictions'}, {'name': 'model/global_max_pooling1d/Max', 'type': 'float', 'shape': ['batch_size', 512], 'op': 'max pooling', 'output_purpose': 'embeddings'}]}


# Classify

Since our model is trained on the VGGish embeddings, we need to use the same schema. We could use the `TensorflowPredictVGGish` algorithm because it generates the required mel-spectrogram. This approach won't need any embeddings. But for better learning transfer and noise reduction, the embeddings can be inputed to `TensorflowPredict2D`. 

The extractor and classifier used bellow exist in the classifiers.py file. This to reduce the size of the notebook

In [7]:

classifier = TensorflowFSDSINet(model_file, classifier_input_layer, classifier_output_layer)
stats = processor.classify_audio(classifier, classifier_metadata)

Burning_short_9.wav  Label:  Accelerating and revving and vroom  Probability:  0.0%
Burning_short_9.wav  Label:  Accordion  Probability:  0.0%
Burning_short_9.wav  Label:  Acoustic guitar  Probability:  0.0%
Burning_short_9.wav  Label:  Aircraft  Probability:  0.2%
Burning_short_9.wav  Label:  Alarm  Probability:  0.2%
Burning_short_9.wav  Label:  Animal  Probability:  16.3%
Burning_short_9.wav  Label:  Applause  Probability:  0.0%
Burning_short_9.wav  Label:  Bark  Probability:  0.0%
Burning_short_9.wav  Label:  Bass drum  Probability:  0.0%
Burning_short_9.wav  Label:  Bass guitar  Probability:  0.0%
Burning_short_9.wav  Label:  Bathtub (filling or washing)  Probability:  1.3%
Burning_short_9.wav  Label:  Bell  Probability:  0.1%
Burning_short_9.wav  Label:  Bicycle  Probability:  1.0%
Burning_short_9.wav  Label:  Bicycle bell  Probability:  0.0%
Burning_short_9.wav  Label:  Bird  Probability:  4.9%
Burning_short_9.wav  Label:  Bird vocalization and bird call and bird song  Probabili

[   INFO   ] TensorflowPredict: Successfully loaded graph file: `../../models/fsd-sinet-vgg42-tlpf_aps-1.pb`
2023-08-28 12:24:46.882916: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:353] MLIR V1 optimization pass is not enabled
2023-08-28 12:24:46.886438: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
[   INFO   ] TensorflowPredict: Successfully loaded graph file: `../../models/fsd-sinet-vgg42-tlpf_aps-1.pb`
[   INFO   ] TensorflowPredict: Successfully loaded graph file: `../../models/fsd-sinet-vgg42-tlpf_aps-1.pb`
[   INFO   ] TensorflowPredict: Successfully loaded graph file: `../../models/fsd-sinet-vgg42-tlpf_aps-1.pb`
[   INFO   ] TensorflowPredict: Successfully loaded graph file: `../../models/fsd-sinet-vgg42-tlpf_aps-1.pb`
[   INFO   ] TensorflowPredict: Successfully loaded graph file: `../../models/fsd-sinet-vgg42-tlpf_aps-1.pb`


Burning_short_3.wav  Label:  Accelerating and revving and vroom  Probability:  0.2%
Burning_short_3.wav  Label:  Accordion  Probability:  0.0%
Burning_short_3.wav  Label:  Acoustic guitar  Probability:  0.0%
Burning_short_3.wav  Label:  Aircraft  Probability:  0.8%
Burning_short_3.wav  Label:  Alarm  Probability:  0.1%
Burning_short_3.wav  Label:  Animal  Probability:  11.6%
Burning_short_3.wav  Label:  Applause  Probability:  0.1%
Burning_short_3.wav  Label:  Bark  Probability:  0.0%
Burning_short_3.wav  Label:  Bass drum  Probability:  0.0%
Burning_short_3.wav  Label:  Bass guitar  Probability:  0.0%
Burning_short_3.wav  Label:  Bathtub (filling or washing)  Probability:  0.0%
Burning_short_3.wav  Label:  Bell  Probability:  1.2%
Burning_short_3.wav  Label:  Bicycle  Probability:  0.3%
Burning_short_3.wav  Label:  Bicycle bell  Probability:  0.0%
Burning_short_3.wav  Label:  Bird  Probability:  4.1%
Burning_short_3.wav  Label:  Bird vocalization and bird call and bird song  Probabili

[   INFO   ] TensorflowPredict: Successfully loaded graph file: `../../models/fsd-sinet-vgg42-tlpf_aps-1.pb`
[   INFO   ] TensorflowPredict: Successfully loaded graph file: `../../models/fsd-sinet-vgg42-tlpf_aps-1.pb`
[   INFO   ] TensorflowPredict: Successfully loaded graph file: `../../models/fsd-sinet-vgg42-tlpf_aps-1.pb`
[   INFO   ] TensorflowPredict: Successfully loaded graph file: `../../models/fsd-sinet-vgg42-tlpf_aps-1.pb`


## Export Stats

Finaly, to export the results, we iterate over the list of audio files, extract their classification statistics, and save the statistics to a JSON files. If there is an analysis file dedicated to that audio file with the naming scheme defined bellow, we just add or update the values to the dictionary. At the end, we prompt the os to open the directory containing the files.

In [8]:
stats_folder = 'Audio Event'
stats_parent = 'FSD-SINet'
processor.export_data(stats, stats_folder, stats_parent)
