# Explanations for differences between models

The metrics of my models are worse than the metrics of the models described in the paper [MIMII Dataset: Sound Dataset for
Malfunctioning Industrial Machine Investigation and Inspection](https://www.arxiv-vanity.com/papers/1909.09347/) and in the [DCASE 2020](http://dcase.community/challenge2020/task-unsupervised-detection-of-anomalous-sounds) challenge.

Let's find for some differences between models. We can check whether Tensorflow and Librosa computes melspectrogram the same way.

## Libraries

Let's load the libraries for this notebook.
- **Tensorflow dataset** to load the dataset
- **Tensorflow** to compute the melspectrogram
- **Librosa** to compute the melspectrogram

In [1]:
import tensorflow_datasets as tfds
import tensorflow as tf
import librosa
import numpy as np
import os

In [2]:
print(f"Tensorflow datasets: {tfds.__version__}")
print(f"Tensorflow: {tf.__version__}")
print(f"Librosa: {librosa.__version__}")
print(f"Numpy: {np.__version__}")

Tensorflow datasets: 4.3.0
Tensorflow: 2.4.1
Librosa: 0.8.0
Numpy: 1.19.5


## Comparison

Let's compare how Tensorflow and Librosa load the files.

### Load audio - Librosa

Let's load an audio sample in Librosa and compare it to Tensorflow.

In [3]:
audio_path = os.path.join("..", "audio", "normal_id_00_00000001.wav")
y, sr = librosa.load(audio_path, sr=16_000)

In [4]:
y.shape

(160000,)

In [5]:
y

array([ 0.00753784,  0.00952148,  0.00723267, ...,  0.00387573,
       -0.00448608, -0.00930786], dtype=float32)

### Read audio - Tensorflow

Let's load all audios from the train set and use the same file.

In [6]:
import pump

ds, info = tfds.load('pump', data_dir='../dataset', with_info=True, split="train")

In [7]:
y2 = ds.filter(lambda x: (x["audio/id"] == "0001") & (x["audio/machine"] == "00") & (x["label"] == 0)).take(1)

While Librosa converts the audio file to [-1,1], Tensorflow keeps the audio as it is, so we need to convert the audio file manually.

In [8]:
def get_audio(item):
    audio = tf.cast(item["audio"], dtype=tf.float32)
    return audio / 2**15

In [9]:
for item in tfds.as_numpy(y2.map(get_audio)):
    print(item.shape)

(160000,)


In [10]:
for item in tfds.as_numpy(y2.map(get_audio)):
    print(item)

[ 0.00753784  0.00952148  0.00723267 ...  0.00387573 -0.00448608
 -0.00930786]


### Melspectrogram - Librosa

Librosa has a function to compute melspectrogram.

In [11]:
mel = librosa.feature.melspectrogram(y, sr=sr, n_fft=1024, hop_length=512, win_length=1024)
mel = mel.T

In [12]:
print(mel.shape)

(313, 128)


In [13]:
print(mel)

[[0.00101106 0.00114903 0.01032275 ... 0.00013775 0.00030552 0.00011243]
 [0.00108913 0.05064675 0.07002663 ... 0.00029311 0.00026508 0.00019343]
 [0.00242773 0.01337171 0.04502779 ... 0.00035344 0.00019846 0.00027182]
 ...
 [0.00131706 0.01724362 0.02030062 ... 0.00034941 0.00044532 0.00034768]
 [0.00017864 0.00132901 0.00230213 ... 0.00020018 0.0005161  0.0001482 ]
 [0.00027024 0.0052782  0.01378242 ... 0.00060662 0.00054907 0.00031097]]


### Melspectrogram - Tensorflow

Tensorflow doesn't have a function to compute melspectrograms, so we need to compute it manually in two steps:
- Compute the spectrogram
- Transform the spectrogram to melspectrogram

In [14]:
A = tf.signal.linear_to_mel_weight_matrix(
    num_mel_bins=128, num_spectrogram_bins=512+1, sample_rate=16_000, dtype=tf.float32
)

def get_mel(item):
    audio = tf.cast(item["audio"], dtype=tf.float32)
    audio = audio / 2**15
    # Step 1: Spectrogram
    audio = tf.signal.stft(audio, frame_length=1024, frame_step=512, pad_end=True)
    audio = tf.abs(audio)

    # Step 2: Mel-spectrogram
    melgrams = tf.tensordot(
            tf.square(audio), A, axes=1
    )
    item["audio"] = melgrams

    return item

In [15]:
for item in tfds.as_numpy(y2.map(get_mel)):
    print(item["audio"].shape)

(313, 128)


In [16]:
for item in tfds.as_numpy(y2.map(get_mel)):
    mel2 = item["audio"]
    
print(mel2)

[[0.10853923 0.15231696 0.3422672  ... 0.11376878 0.0879589  0.11626314]
 [0.05918069 0.06871993 0.11805227 ... 0.02720353 0.04023996 0.11747861]
 [0.09808303 0.09334564 0.09733989 ... 0.13244948 0.14757505 0.12772648]
 ...
 [0.00873664 0.02683789 0.09729988 ... 0.11934654 0.05534454 0.05542446]
 [0.07337632 0.065545   0.05230678 ... 0.16141652 0.14283478 0.16025962]
 [0.00270342 0.00262386 0.00292703 ... 0.00168895 0.00881386 0.0111219 ]]


In [17]:
diff = mel - mel2

In [18]:
print(diff)

[[-0.10752817 -0.15116793 -0.33194444 ... -0.11363102 -0.08765338
  -0.11615071]
 [-0.05809156 -0.01807318 -0.04802564 ... -0.02691042 -0.03997488
  -0.11728518]
 [-0.09565531 -0.07997393 -0.0523121  ... -0.13209604 -0.1473766
  -0.12745465]
 ...
 [-0.00741958 -0.00959427 -0.07699926 ... -0.11899713 -0.05489922
  -0.05507677]
 [-0.07319769 -0.064216   -0.05000465 ... -0.16121633 -0.14231868
  -0.16011143]
 [-0.00243318  0.00265434  0.01085539 ... -0.00108233 -0.00826479
  -0.01081093]]


In [19]:
print(np.abs(diff.mean()))

0.12750958


Melspectrograms show big differences.