## オーディオデータによる分類問題
犬と猫の鳴き声のデータを用いて、犬と猫に分類する。
モデルはRandomForestを使用した。

In [1]:
import os
import re
import numpy as np
import pandas as pd
import scipy.io.wavfile as sw
import librosa
import python_speech_features as psf
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.[0m
  from numba.decorators import jit as optional_jit
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.[0m
  from numba.decorators import jit as optional_jit


## データセットの読み込み

In [2]:
base_dir = "./audio-cats-and-dogs/cats_dogs"
file_names = os.listdir(base_dir)
final_dataset = pd.DataFrame()

for file_name in file_names:
    y, sr = librosa.core.load(os.path.join(base_dir, file_name))
    fr = librosa.feature.melspectrogram(y=y, sr=sr)
    y = librosa.feature.mfcc(y, sr, n_mfcc=50)
    for i in range(y.shape[1]):
        features = pd.DataFrame(y[:, i].reshape(1, 50))
        features["Target"] = file_name
        final_dataset = final_dataset.append(features)

index = 50
for i in range(0,len(final_dataset)):
    final_dataset.iloc[i,index] = final_dataset.iloc[i,index].replace('.wav', '')
    final_dataset.iloc[i,index] = re.sub(r'[0-9]+', '',final_dataset.iloc[i,index])
    final_dataset.iloc[i,index] = final_dataset.iloc[i,index].replace('_', '')
    final_dataset.iloc[i,index] = final_dataset.iloc[i,index].replace('barking', '0')
    final_dataset.iloc[i,index] = final_dataset.iloc[i,index].replace('cat', '1')
    final_dataset.iloc[i,index] = final_dataset.iloc[i,index].replace('dog', '0')
    final_dataset.iloc[i,index] = final_dataset.iloc[i,index].replace('00', '0')

## 学習
Data Augmentationなしの場合で学習

In [3]:
fd=final_dataset
fd = fd.rename(columns = {'y' : 'target'})
y=fd.iloc[:,-1]
X=fd.iloc[:,0:26]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1)

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

type(y_train)
X_train=pd.DataFrame(X_train)
y_train=pd.DataFrame(y_train)
X_test=pd.DataFrame(X_test)

model = RandomForestClassifier()
model1 = model.fit(X_train, y_train)
model1.score(X_train,y_train)
predicted=model.predict(X_test)
accuracy_score(y_test,predicted)



0.9723193824629116

## Data Augmentation
音声データに対して、ランダムで微小なノイズを加えることで、データを水増しする。

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1)

wn = np.random.randn(X_train.shape[0], X_train.shape[1])
X_train = np.concatenate((X_train, X_train + 0.005 * wn,  X_train + 0.015 * wn), axis=0)
y_train = np.concatenate((y_train, y_train, y_train), axis=0)

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

type(y_train)
X_train=pd.DataFrame(X_train)
y_train=pd.DataFrame(y_train)
X_test=pd.DataFrame(X_test)

model = RandomForestClassifier()
model1 = model.fit(X_train, y_train)
model1.score(X_train,y_train)
predicted=model.predict(X_test)
accuracy_score(y_test,predicted)



0.9762996019780484

## 結果
わずかではあるが、Data Augmentationを用いて精度を向上させることができた。