# Feature analysis for storm petrel recordings from St Helena

In this notebook we're going to look at distribution of numerical features we have calculated for selected recordings from St Helena. The purpose is to to determine which seem to best describe storm petrels against *noise*, i.e. any non-petrel sound. 

In [1]:
import os
import sys
import pandas as pd

module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
import plotly
import plotly.plotly as py
import plotly.graph_objs as go

import audioexplorer.audio_io as aio
import audioexplorer.features as features

Load the data and separate petrels and noise

In [2]:
df = pd.read_csv('s3://stormpetrels/samples/labels/samples_all.csv')
df_petrel = df[df['petrel'] == 1]
df_noise = df[df['petrel'] == 0]

In [3]:
features.FEATURES

{'freq': 'Frequency statistics',
 'pitch': 'Pitch statistics',
 'Chroma': 'Chroma',
 'LPC': 'LPC',
 'LSF': 'LSF',
 'MFCC': 'MFCC',
 'OBSI': 'OBSI',
 'SpectralCrestFactorPerBand': 'Crest factors',
 'SpectralFlatness': 'Flatness',
 'SpectralFlux': 'Flux',
 'SpectralRolloff': 'Rolloff',
 'SpectralVariation': 'Variation'}

Let's use better names so that it's claer what we're computing.

In [4]:
features.FEATURES.update({
    'LPC': 'Linear Predictor Coefficients',
    'LSF': 'Line Spectral Frequency',
    'MFCC': 'Mel-frequencies cepstrum coefficients',
    'OBSI': 'Octave band signal intensity'
})

Let `0` denote *noise* and `1` *petrels*.

In [8]:
df['petrel'].value_counts()

1    2239
0    1576
Name: petrel, dtype: int64

We're going to use `plotly` library to produce boxplots. Follow [getting started guide](https://plot.ly/python/getting-started/) to install and set up the library (free account is needed).

In [12]:
def plot_features(df, selection):
    df_petrel = df[df['petrel'] == 1]
    df_noise = df[df['petrel'] == 0]
    title = features.FEATURES[selection]
    sel_columns = [column_name for column_name in df.columns.values if selection in column_name]
    fig = plotly.tools.make_subplots(rows=len(sel_columns), subplot_titles=tuple(sel_columns))

    marker_petrels = {'color': '#FF4136', 'size': 2}
    marker_noise = {'color': '#0000FF', 'size': 2}

    for idx, name in enumerate(sel_columns):
        trace_petrel = go.Box(x=df_petrel[name], name='petrels', marker=marker_petrels)
        trace_noise = go.Box(x=df_noise[name], name='noise', marker=marker_noise)
        fig.append_trace(trace_petrel, idx+1, 1)
        fig.append_trace(trace_noise, idx+1, 1)

    fig['layout'].update(height=len(sel_columns) * 400, width=1200, title='Storm Petrel: ' + title, showlegend=False)
    return fig

Let's plot them all and put in public space

In [24]:
selection = 'freq'
fig = plot_features(df, selection)
py.plot(fig, filename='Storm Petrels ' + selection)

This is the format of your plot grid:
[ (1,1) x1,y1 ]
[ (2,1) x2,y2 ]
[ (3,1) x3,y3 ]
[ (4,1) x4,y4 ]
[ (5,1) x5,y5 ]
[ (6,1) x6,y6 ]
[ (7,1) x7,y7 ]



'https://plot.ly/~tracewsl/311?share_key=604K67TKBBwoLRPqZMrelQ'

In [15]:
selection = 'pitch'
fig = plot_features(df, selection)
py.plot(fig, filename=selection)

This is the format of your plot grid:
[ (1,1) x1,y1 ]
[ (2,1) x2,y2 ]
[ (3,1) x3,y3 ]
[ (4,1) x4,y4 ]
[ (5,1) x5,y5 ]
[ (6,1) x6,y6 ]
[ (7,1) x7,y7 ]



'https://plot.ly/~tracewsl/313?share_key=WoXXOcXzpNk1dGHRPJiya5'

In [16]:
selection = 'Chroma'
fig = plot_features(df, selection)
py.plot(fig, filename=selection)

This is the format of your plot grid:
[ (1,1) x1,y1 ]   
[ (2,1) x2,y2 ]   
[ (3,1) x3,y3 ]   
[ (4,1) x4,y4 ]   
[ (5,1) x5,y5 ]   
[ (6,1) x6,y6 ]   
[ (7,1) x7,y7 ]   
[ (8,1) x8,y8 ]   
[ (9,1) x9,y9 ]   
[ (10,1) x10,y10 ]
[ (11,1) x11,y11 ]
[ (12,1) x12,y12 ]



'https://plot.ly/~tracewsl/315?share_key=BIa9dCSsJ74rBJuQFUJN3H'

In [33]:
selection = 'LPC'
fig = plot_features(df, selection)
py.plot(fig, filename=selection)

This is the format of your plot grid:
[ (1,1) x1,y1 ]



'https://plot.ly/~tracewsl/331?share_key=QbevAWkqRJqPozUBubNac1'

In [17]:
selection = 'LSF'
fig = plot_features(df, selection)
py.plot(fig, filename=selection)

This is the format of your plot grid:
[ (1,1) x1,y1 ]   
[ (2,1) x2,y2 ]   
[ (3,1) x3,y3 ]   
[ (4,1) x4,y4 ]   
[ (5,1) x5,y5 ]   
[ (6,1) x6,y6 ]   
[ (7,1) x7,y7 ]   
[ (8,1) x8,y8 ]   
[ (9,1) x9,y9 ]   
[ (10,1) x10,y10 ]



'https://plot.ly/~tracewsl/317?share_key=08PLdeSz12ozUBZcGWpmUg'

In [18]:
selection = 'MFCC'
fig = plot_features(df, selection)
py.plot(fig, filename=selection)

This is the format of your plot grid:
[ (1,1) x1,y1 ]   
[ (2,1) x2,y2 ]   
[ (3,1) x3,y3 ]   
[ (4,1) x4,y4 ]   
[ (5,1) x5,y5 ]   
[ (6,1) x6,y6 ]   
[ (7,1) x7,y7 ]   
[ (8,1) x8,y8 ]   
[ (9,1) x9,y9 ]   
[ (10,1) x10,y10 ]
[ (11,1) x11,y11 ]
[ (12,1) x12,y12 ]
[ (13,1) x13,y13 ]



'https://plot.ly/~tracewsl/319?share_key=5SdA05qgOjP8NA1nK954gW'

In [34]:
selection = 'OBSI'
fig = plot_features(df, selection)
py.plot(fig, filename=selection)

This is the format of your plot grid:
[ (1,1) x1,y1 ]
[ (2,1) x2,y2 ]
[ (3,1) x3,y3 ]
[ (4,1) x4,y4 ]
[ (5,1) x5,y5 ]
[ (6,1) x6,y6 ]
[ (7,1) x7,y7 ]
[ (8,1) x8,y8 ]
[ (9,1) x9,y9 ]



'https://plot.ly/~tracewsl/333?share_key=fu233IqjwRZ1qqjcbrdygh'

In [19]:
selection = 'SpectralCrestFactorPerBand'
fig = plot_features(df, selection)
py.plot(fig, filename=selection)

This is the format of your plot grid:
[ (1,1) x1,y1 ]   
[ (2,1) x2,y2 ]   
[ (3,1) x3,y3 ]   
[ (4,1) x4,y4 ]   
[ (5,1) x5,y5 ]   
[ (6,1) x6,y6 ]   
[ (7,1) x7,y7 ]   
[ (8,1) x8,y8 ]   
[ (9,1) x9,y9 ]   
[ (10,1) x10,y10 ]
[ (11,1) x11,y11 ]
[ (12,1) x12,y12 ]
[ (13,1) x13,y13 ]
[ (14,1) x14,y14 ]
[ (15,1) x15,y15 ]
[ (16,1) x16,y16 ]
[ (17,1) x17,y17 ]
[ (18,1) x18,y18 ]
[ (19,1) x19,y19 ]



'https://plot.ly/~tracewsl/321?share_key=lApd3QLTyp9s3XA36uHNbi'

In [20]:
selection = 'SpectralFlatness'
fig = plot_features(df, selection)
py.plot(fig, filename=selection)

This is the format of your plot grid:
[ (1,1) x1,y1 ]



'https://plot.ly/~tracewsl/323?share_key=Z8wX3wdrksuFeAcQwk0Egi'

In [21]:
selection = 'SpectralFlux'
fig = plot_features(df, selection)
py.plot(fig, filename=selection)

This is the format of your plot grid:
[ (1,1) x1,y1 ]



'https://plot.ly/~tracewsl/325?share_key=yYUfHOjwzXF1CkOhghjELA'

In [22]:
selection = 'SpectralRolloff'
fig = plot_features(df, selection)
py.plot(fig, filename=selection)

This is the format of your plot grid:
[ (1,1) x1,y1 ]



'https://plot.ly/~tracewsl/327?share_key=iYdp06DseJNwqnWCnUZBbv'

In [23]:
selection = 'SpectralVariation'
fig = plot_features(df, selection)
py.plot(fig, filename=selection)

This is the format of your plot grid:
[ (1,1) x1,y1 ]



'https://plot.ly/~tracewsl/329?share_key=Xk58B0O0iBBy4o0SvbJQKJ'

In [34]:
nice_features =  ['freq_Q25', 'freq_Q75', 'freq_mean', 'freq_median',
       'freq_mode', 'freq_peak', 'offset', 'onset', 'pitch_IQR',
       'pitch_Q25', 'pitch_Q75', 'pitch_max', 'pitch_mean',
       'pitch_median', 'pitch_min', 'yaafe_LPC', 'yaafe_LSF.0', 'yaafe_LSF.1',
       'yaafe_LSF.2', 'yaafe_LSF.3', 'yaafe_LSF.4', 'yaafe_LSF.5',
       'yaafe_LSF.6', 'yaafe_LSF.7', 'yaafe_LSF.8', 'yaafe_LSF.9',
       'yaafe_MFCC.0', 'yaafe_MFCC.1', 'yaafe_MFCC.10', 'yaafe_MFCC.11',
       'yaafe_MFCC.12', 'yaafe_MFCC.2', 'yaafe_MFCC.3', 'yaafe_MFCC.4',
       'yaafe_MFCC.5', 'yaafe_MFCC.6', 'yaafe_MFCC.7', 'yaafe_MFCC.8',
       'yaafe_MFCC.9', 'yaafe_OBSI.0', 'yaafe_OBSI.1', 'yaafe_OBSI.2',
       'yaafe_OBSI.3', 'yaafe_OBSI.4', 'yaafe_OBSI.5', 'yaafe_OBSI.6',
       'yaafe_OBSI.7', 'yaafe_OBSI.8', 'yaafe_SpectralFlatness',
       'yaafe_SpectralFlux', 'yaafe_SpectralRolloff',
       'yaafe_SpectralVariation', 'petrel', 'filename']

In [35]:
nice_features

['freq_Q25',
 'freq_Q75',
 'freq_mean',
 'freq_median',
 'freq_mode',
 'freq_peak',
 'offset',
 'onset',
 'pitch_IQR',
 'pitch_Q25',
 'pitch_Q75',
 'pitch_max',
 'pitch_mean',
 'pitch_median',
 'pitch_min',
 'yaafe_LPC',
 'yaafe_LSF.0',
 'yaafe_LSF.1',
 'yaafe_LSF.2',
 'yaafe_LSF.3',
 'yaafe_LSF.4',
 'yaafe_LSF.5',
 'yaafe_LSF.6',
 'yaafe_LSF.7',
 'yaafe_LSF.8',
 'yaafe_LSF.9',
 'yaafe_MFCC.0',
 'yaafe_MFCC.1',
 'yaafe_MFCC.10',
 'yaafe_MFCC.11',
 'yaafe_MFCC.12',
 'yaafe_MFCC.2',
 'yaafe_MFCC.3',
 'yaafe_MFCC.4',
 'yaafe_MFCC.5',
 'yaafe_MFCC.6',
 'yaafe_MFCC.7',
 'yaafe_MFCC.8',
 'yaafe_MFCC.9',
 'yaafe_OBSI.0',
 'yaafe_OBSI.1',
 'yaafe_OBSI.2',
 'yaafe_OBSI.3',
 'yaafe_OBSI.4',
 'yaafe_OBSI.5',
 'yaafe_OBSI.6',
 'yaafe_OBSI.7',
 'yaafe_OBSI.8',
 'yaafe_SpectralFlatness',
 'yaafe_SpectralFlux',
 'yaafe_SpectralRolloff',
 'yaafe_SpectralVariation',
 'petrel',
 'filename']