# Extract MFCC features
This notebook shows how to use pykaldi to extract MFCC features from a wav file.
It also serves as an example for numpy and scikit-learn integration.

We begin by importing all the necesary components from pykaldi, numpy and sklearn. If you haven't done so, you'll need to install them in your system. You can install numpy and scikit using __pip__ (i.e., `pip install numpy scikit-learn`). For installation of pykaldi, please follow the [instructions](https://github.com/pykaldi/pykaldi#installation)

In [1]:
from kaldi.feat.mfcc import Mfcc, MfccOptions
from kaldi.matrix import SubVector, SubMatrix
from kaldi.util.options import ParseOptions
from kaldi.util.table import SequentialWaveReader
from kaldi.util.table import MatrixWriter
from numpy import mean
from sklearn.preprocessing import scale

We create a test scp file with a dummy key and the location of our test wav file. Make sure the location of the test file matches with the location of your pykaldi installation.

In [2]:
with open("testfile.scp", "w") as outpt:
    outpt.write("TEST /pykaldi/tools/kaldi/src/feat/test_data/test.wav")

PyKaldi option parsing API is slightly different from the underlying Kaldi option parsing API. Command-line options for the main script are registered by calling type-specific registration methods that accept name, default value and help string arguments e.g. \lstinline{min-duration} in the example. The `parse_args` method of a PyKaldi `ParseOptions` instance returns a simple namespace object containing the parsed option values for the main script. Parsed values for other options are directly written into the appropriate fields of associated options instances, e.g. `mfcc_opts` in the example.

In [3]:
usage = """Extract MFCC features.
           Usage:  example.py [opts...] <rspec> <wspec>
        """

po = ParseOptions(usage)
po.register_float("min-duration", 0.0,
                  "minimum segment duration")
mfcc_opts = MfccOptions()
mfcc_opts.frame_opts.samp_freq = 8000
mfcc_opts.register(po)

opts = po.parse_args()

In typical Kaldi fashion, input/output tables are constructed with read/write specifiers, strings that describe how the data should be read/written.

In [4]:
rspec, wspec = "scp:testfile.scp", "ark,t:test_mfcc.ark"

Kaldi objects are first class objects in PyKaldi. This allows them to be created, instanciated and passed as arguments to other objects or to functions.

In [5]:
# Create MFCC object and obtain sample frequency
mfcc = Mfcc(mfcc_opts)
sf = mfcc_opts.frame_opts.samp_freq

PyKaldi table readers/writers implement the context manager interface, hence they do not need to be closed when they are used in a `with` statement. PyKaldi table writers also support a pseudo-dictionary interface for writing given key value pairs. Since PyKaldi matrices implement NumPy array interface, they can be passed to functions expecting Numpy array arguments, such as `mean` and `scale`, without explicit conversion. The NumPy arrays returned from functions can be easily converted back to Kaldi vector and matrix types by constructing new `SubVector` and `SubMatrix` objects which share the underlying memory buffers with the source arrays whenever possible, i.e. no data is copied unless necessary.

In [6]:
with SequentialWaveReader(rspec) as reader, \
             MatrixWriter(wspec) as writer:
            
    for key, wav in reader:
        if wav.duration < opts.min_duration:
            continue
                    
        assert(wav.samp_freq >= sf)
        assert(wav.samp_freq % sf == 0)

        print(">>> print(wav.sample_freq)")
        print(wav.samp_freq)
        print()
        
        s = wav.data()
        print(">>> print(s)")
        print(s)
        print()
        
        # downsample to sf [default=8kHz]
        s = s[:,::int(wav.samp_freq / sf)]

        # mix-down stereo to mono
        m = SubVector(mean(s, axis=0))

        # compute MFCC features
        f = mfcc.compute_features(m, sf, 1.0)

        # standardize features
        f = SubMatrix(scale(f))
        print(">>> print(f)")
        print(f)
        print()
        
        # write features to archive
        writer[key] = f

>>> print(wav.sample_freq)
16000.0

>>> print(s)

 11891  28260      0  ...     356    360    362
[kaldi.matrix.Matrix of size 1x23001]


>>> print(f)

 0.9177 -0.6260 -0.0099  ...  -0.8744  0.4648  0.8280
-1.4267  0.5150 -1.0192  ...  -2.6820  0.5182  1.2632
-1.2150  0.7154 -0.8618  ...  -0.4959  0.6084  0.7360
          ...             ⋱             ...          
-1.6294  0.1008  0.5520  ...  -1.3087  1.6499  1.2011
-1.6358 -0.6769 -0.0806  ...  -1.3102  0.1379 -0.0234
-1.9316 -0.0121  0.2750  ...  -0.3298  2.5171  1.1672
[kaldi.matrix.SubMatrix of size 142x13]


