# Input Data Pipeline Testing/Debugging

References that may be helpful while following along:
http://cs230.stanford.edu/blog/datapipeline/ (Very good)

https://www.tensorflow.org/guide/data

https://towardsdatascience.com/how-to-build-efficient-audio-data-pipelines-with-tensorflow-2-0-b3133474c3c1


In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
mkdir grace_wavs

In [5]:
ls

[0m[01;34mdrive[0m/  [01;34mgrace_wavs[0m/  [01;34msample_data[0m/


In [6]:
cd drive/My Drive/amazing_grace/

/content/drive/.shortcut-targets-by-id/108Ujtp8i29zJ2_EUinbLFEbkDJZjf7jb/amazing_grace


In [7]:
pip install pydub

Collecting pydub
  Downloading https://files.pythonhosted.org/packages/7b/d1/fbfa79371a8cd9bb15c2e3c480d7e6e340ed5cc55005174e16f48418333a/pydub-0.24.1-py2.py3-none-any.whl
Installing collected packages: pydub
Successfully installed pydub-0.24.1


In [8]:
import os
import argparse
import pydub
from pydub import AudioSegment

In [9]:
pip install ffmpeg

Collecting ffmpeg
  Downloading https://files.pythonhosted.org/packages/f0/cc/3b7408b8ecf7c1d20ad480c3eaed7619857bf1054b690226e906fdf14258/ffmpeg-1.4.tar.gz
Building wheels for collected packages: ffmpeg
  Building wheel for ffmpeg (setup.py) ... [?25l[?25hdone
  Created wheel for ffmpeg: filename=ffmpeg-1.4-cp36-none-any.whl size=6083 sha256=d305059887a2015556655c0237efaebb64ff73ff2f6bbf0c67cb38b63e1c3d46
  Stored in directory: /root/.cache/pip/wheels/b6/68/c3/a05a35f647ba871e5572b9bbfc0b95fd1c6637a2219f959e7a
Successfully built ffmpeg
Installing collected packages: ffmpeg
Successfully installed ffmpeg-1.4


In [11]:
cd ..

/content/drive/.shortcut-targets-by-id/108Ujtp8i29zJ2_EUinbLFEbkDJZjf7jb


In [12]:
cd ..

/content/drive/.shortcut-targets-by-id


In [13]:
cd ..

/content/drive


In [14]:
pip install torch




In [16]:
cd My Drive/amazing_grace

/content/drive/.shortcut-targets-by-id/108Ujtp8i29zJ2_EUinbLFEbkDJZjf7jb/amazing_grace


Here, we are converting a few of the M4A amazing grace files to .wav format. We will also use the librosa->numpy array->float tensor method but this is just us storing a few wav files in case we need to use that format to find out if the input pipeline is working.

In [17]:
import sys
i = 0
for audio_file in os.listdir('./'):
  #print(audio_file)
  if i == 3:
    break
  wav_filename = "test"+os.path.splitext(os.path.basename(audio_file))[0] + ".wav"
  x = AudioSegment.from_file(audio_file).export(wav_filename, 
                                      format="wav")
  os.system("cp './%s' '/content/grace_wavs'"%(wav_filename,))
  i += 1

In [18]:
cd ..

/content/drive/.shortcut-targets-by-id/108Ujtp8i29zJ2_EUinbLFEbkDJZjf7jb


In [19]:
cd ..

/content/drive/.shortcut-targets-by-id


In [20]:
cd ..

/content/drive


In [21]:
cd ..

/content


In [22]:
ls

[0m[01;34mdrive[0m/  [01;34mgrace_wavs[0m/  [01;34msample_data[0m/


In [23]:
cd grace_wavs

/content/grace_wavs


We've verified below that our wav files got copied over.

In [24]:
ls

test101016666_37475717.wav  test101035573_143446212.wav
test101016666_42558595.wav


This is where we get our first error. The function tf.audio.decode_wav (which is used in the above audio pipeline tutorial as well as the equivalent of the tf.decode_img function used in the Stanford tutorial) can only read in 16-bit audio files. I searched up about this error and on GitHub someone had a similar issue with 24-bit files but seems like no fix has been made and it may just be incompatible with TensorFlow. Reference: https://github.com/tensorflow/tensorflow/issues/30877. **For now, we will move on to trying to avoid using decode_wav and instead creating tensors ourselves using the numpy arrays from librosa (this is also causing issues however, scroll down)**

In [33]:
import tensorflow as tf

raw_audio = tf.io.read_file('./test101016666_37475717.wav')
audio, sr = tf.audio.decode_wav(raw_audio)

InvalidArgumentError: ignored

In [34]:
pip install librosa



In [35]:
import tensorflow as tf
import librosa


We will now navigate back to the directory with m4a files to try and use librosa from there.

In [36]:
cd ..

/content


In [38]:
cd drive/My Drive/amazing_grace

/content/drive/.shortcut-targets-by-id/108Ujtp8i29zJ2_EUinbLFEbkDJZjf7jb/amazing_grace


Verify that librosa.load() works, which it does.

In [40]:
audio,sr = librosa.load('100950773_22975448.m4a')
print(audio)

[-2.7465820e-04  0.0000000e+00  4.8828125e-04 ...  0.0000000e+00
 -3.0517578e-05  0.0000000e+00]


Just some testing with numpy -> Tensor

In [54]:
input = tf.constant(audio)
output = input[0]
print(input)
print(output.numpy())

tf.Tensor(
[-2.7465820e-04  0.0000000e+00  4.8828125e-04 ...  0.0000000e+00
 -3.0517578e-05  0.0000000e+00], shape=(4214912,), dtype=float32)
-0.0002746582


Parse function for later, does nothing right now

In [85]:
def parse_function(filename, label):
    print(filename)
    #print(filename.numpy())
    return filename,label

**Step** Now, we'll construct a dataset from a few of our audio files and try to do some basic processing (reference Stanford tutorial: Building an image data pipeline part).

In [79]:
filenames = ['108582254_16389370.m4a','128410511_32753534.m4a','160388927_66611192.m4a'] #pick 3 files for testing
labels = ['grace','grace','grace']

In [87]:
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.shuffle(len(filenames))

Playing with tensor -> numpy (not necessarily relevant)

In [89]:
for element in dataset:
  print(element)
  print(element[0].numpy()) #not sure why the 'b' is printing
  print(str(element[0].numpy())[1:]) #extract the 'b'

(<tf.Tensor: shape=(), dtype=string, numpy=b'128410511_32753534.m4a'>, <tf.Tensor: shape=(), dtype=string, numpy=b'grace'>)
b'128410511_32753534.m4a'
'128410511_32753534.m4a'
(<tf.Tensor: shape=(), dtype=string, numpy=b'160388927_66611192.m4a'>, <tf.Tensor: shape=(), dtype=string, numpy=b'grace'>)
b'160388927_66611192.m4a'
'160388927_66611192.m4a'
(<tf.Tensor: shape=(), dtype=string, numpy=b'108582254_16389370.m4a'>, <tf.Tensor: shape=(), dtype=string, numpy=b'grace'>)
b'108582254_16389370.m4a'
'108582254_16389370.m4a'


*Error:* This is where another issue arises -- we need to call dataset.map() after constructing a dataset with files and labels. However, after that point dataset contains only tensors and librosa deals with m4a/wav files directly not tensors. Below we try using librosa on a Tensor which obviously won't work.

I thought I could access the value of the Tensor (filename), do librosa.load(), and then reconstruct a Tensor to return (decode_wav would do all this in one step), but there are issues with being able to access the file once its a tensor. 

If this is a little confusing see the same "Building an image data pipeline" section of the Stanford tutorial.

In [92]:
def parse_function(filename, label):
    audio_string = tf.io.read_file(filename)

    arr = librosa.load(audio_string)[0] #this obviously won't work, just here for explanation purposes
    parsed = tf.constant(arr)
    return parsed, label

In [93]:
dataset = dataset.map(parse_function, num_parallel_calls=4)

TypeError: ignored

**STEP** Another step I took was trying to load in numpy arrays directly into the Dataset at the start, which was a valid option (commonly using .npz files but this is valid too)

In [94]:
audio1 = librosa.load('100950773_22975448.m4a')[0]
audio2 = librosa.load('100985136_55416202.m4a')[0]
audio3 = librosa.load('100992845_200619044.m4a')[0]
print(audio1)
print(audio2)
print(audio3)

[-2.7465820e-04  0.0000000e+00  4.8828125e-04 ...  0.0000000e+00
 -3.0517578e-05  0.0000000e+00]
[-5.1879883e-04  6.1035156e-05  2.1362305e-04 ... -3.0517578e-05
  6.1035156e-05  6.1035156e-05]
[-5.4931641e-04  1.8310547e-04  1.2207031e-04 ...  0.0000000e+00
  0.0000000e+00  3.0517578e-05]


In [95]:
audio_examples = [audio1,audio2,audio3]
audio_labels = ['grace','grace','grace']
print(audio_examples)

[array([-2.7465820e-04,  0.0000000e+00,  4.8828125e-04, ...,
        0.0000000e+00, -3.0517578e-05,  0.0000000e+00], dtype=float32), array([-5.1879883e-04,  6.1035156e-05,  2.1362305e-04, ...,
       -3.0517578e-05,  6.1035156e-05,  6.1035156e-05], dtype=float32), array([-5.4931641e-04,  1.8310547e-04,  1.2207031e-04, ...,
        0.0000000e+00,  0.0000000e+00,  3.0517578e-05], dtype=float32)]


*ERROR:* But then here, I get this error, which based on a little searching I think most likely has to do with the uneven sizes of the arrays from librosa.load. (e.g. on one file an array may be of length 3, another may have length 10)

In [98]:
train_dataset = tf.data.Dataset.from_tensor_slices((audio_examples, audio_labels))

ValueError: ignored