# **Homework 2-1 Phoneme Classification**

* Slides: https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/hw/HW02/HW02.pdf
* Video (Chinese): https://youtu.be/PdjXnQbu2zo
* Video (English): https://youtu.be/ESRr-VCykBs


## The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT)
The TIMIT corpus of reading speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems.

This homework is a multiclass classification task, 
we are going to train a deep neural network classifier to predict the phonemes for each frame from the speech corpus TIMIT.

link: https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

## Preparing Data
Load the training and testing data from the `.npy` file (NumPy array).

In [16]:
import numpy as np
import os

print('Loading data ...')

data_root= os.environ['USERPROFILE']+ '/desktop/ML/timit_11/'
train = np.load(data_root + 'train_11.npy')
train = np.reshape(train, (1229932 - 307483 , 11, 39))
train_label = np.load(data_root + 'train_label_11.npy')
test = np.load(data_root + 'test_11.npy')
test = np.reshape(test, (307483, 11, 39))

print('Size of training data: {}'.format(train.shape))
print('Size of testing data: {}'.format(test.shape))

Loading data ...
Size of training data: (922449, 11, 39)
Size of testing data: (307483, 11, 39)


## Create Dataset

In [17]:
import torch
from torch.utils.data import Dataset

class TIMITDataset(Dataset):
    def __init__(self, X, y=None):
        self.data = torch.from_numpy(X).float()
        if y is not None:
            y = y.astype(np.int)
            self.label = torch.LongTensor(y)
        else:
            self.label = None

    def __getitem__(self, idx):
        if self.label is not None:
            return self.data[idx], self.label[idx]
        else:
            return self.data[idx]

    def __len__(self):
        return len(self.data)


Split the labeled data into a training set and a validation set, you can modify the variable `VAL_RATIO` to change the ratio of validation data.

In [18]:
VAL_RATIO = 0.01

percent = int(train.shape[0] * (1 - VAL_RATIO))
train_x, train_y, val_x, val_y = train[:percent], train_label[:percent], train[percent:], train_label[percent:]
print('Size of training set: {}'.format(train_x.shape))
print('Size of validation set: {}'.format(val_x.shape))

Size of training set: (913224, 11, 39)
Size of validation set: (9225, 11, 39)


Create a data loader from the dataset, feel free to tweak the variable `BATCH_SIZE` here.

In [19]:
BATCH_SIZE = 256

from torch.utils.data import DataLoader

train_set = TIMITDataset(train_x, train_y)
val_set = TIMITDataset(val_x, val_y)
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True) #only shuffle the training data
val_loader = DataLoader(val_set, batch_size=BATCH_SIZE, shuffle=False)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  y = y.astype(np.int)


Cleanup the unneeded variables to save memory.<br>

**notes: if you need to use these variables later, then you may remove this block or clean up unneeded variables later<br>the data size is quite huge, so be aware of memory usage in colab**

In [20]:
import gc

del train, train_label, train_x, train_y, val_x, val_y
gc.collect()

362

## Create Model

Define model architecture, you are encouraged to change and experiment with the model architecture.

In [21]:
import torch as t
import torch.nn as nn

class LSTM(nn.Module):
#建立LSTM class
    def __init__(self, input_dim = 39, hidden_dim = 1024, layer_dim = 4, output_dim = 39):
        super(LSTM,self).__init__()
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim
        self.output_dim = output_dim
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first = True, dropout = 0.5, bidirectional = True)
        self.linear = nn.Linear(hidden_dim * 2, output_dim)

    def forward(self, x):
        out, _ = self.lstm(x)
        out = self.linear(out[:,5,:])
        return out

## Training

In [22]:
#check device
def get_device():
    return 'cuda' if torch.cuda.is_available() else 'cpu'
get_device()

'cuda'

Feel free to change the training parameters here.

In [23]:
from torchensemble import BaggingClassifier

# get device 
device = get_device()
print(f'DEVICE: {device}')

# training parameters

# the path where checkpoint saved
model_path = './model.ckpt'

# create model, define a loss function, and optimizer
para = {'input_dim': 39, 'hidden_dim': 512, 'layer_dim': 4, 'output_dim': 39}
model = BaggingClassifier(
    estimator=LSTM,
    n_estimators = 1, 
    estimator_args = para, 
    cuda = True
)
model.set_optimizer("Adam", lr = 5e-4, weight_decay = 5e-5)
model.set_scheduler("CosineAnnealingLR", T_max = 11)
model.fit(train_loader = train_loader, epochs = 11)

DEVICE: cuda
Estimator: 000 | Epoch: 000 | Batch: 000 | Loss: 3.67237 | Correct: 3/166
Estimator: 000 | Epoch: 000 | Batch: 100 | Loss: 1.88528 | Correct: 64/152
Estimator: 000 | Epoch: 000 | Batch: 200 | Loss: 1.61322 | Correct: 74/149
Estimator: 000 | Epoch: 000 | Batch: 300 | Loss: 1.47590 | Correct: 87/165
Estimator: 000 | Epoch: 000 | Batch: 400 | Loss: 1.27589 | Correct: 104/163
Estimator: 000 | Epoch: 000 | Batch: 500 | Loss: 1.25772 | Correct: 99/165
Estimator: 000 | Epoch: 000 | Batch: 600 | Loss: 1.42973 | Correct: 92/161
Estimator: 000 | Epoch: 000 | Batch: 700 | Loss: 1.17999 | Correct: 103/163
Estimator: 000 | Epoch: 000 | Batch: 800 | Loss: 1.20388 | Correct: 96/162
Estimator: 000 | Epoch: 000 | Batch: 900 | Loss: 1.19240 | Correct: 102/163
Estimator: 000 | Epoch: 000 | Batch: 1000 | Loss: 0.99661 | Correct: 110/161
Estimator: 000 | Epoch: 000 | Batch: 1100 | Loss: 1.06780 | Correct: 112/170
Estimator: 000 | Epoch: 000 | Batch: 1200 | Loss: 1.07882 | Correct: 106/164
Esti

Estimator: 000 | Epoch: 002 | Batch: 3500 | Loss: 0.65258 | Correct: 121/156
Estimator: 000 | Epoch: 003 | Batch: 000 | Loss: 0.67233 | Correct: 126/162
Estimator: 000 | Epoch: 003 | Batch: 100 | Loss: 0.71006 | Correct: 122/164
Estimator: 000 | Epoch: 003 | Batch: 200 | Loss: 0.56575 | Correct: 125/161
Estimator: 000 | Epoch: 003 | Batch: 300 | Loss: 0.65840 | Correct: 124/157
Estimator: 000 | Epoch: 003 | Batch: 400 | Loss: 0.73372 | Correct: 122/157
Estimator: 000 | Epoch: 003 | Batch: 500 | Loss: 0.62920 | Correct: 129/163
Estimator: 000 | Epoch: 003 | Batch: 600 | Loss: 0.64083 | Correct: 131/167
Estimator: 000 | Epoch: 003 | Batch: 700 | Loss: 0.81449 | Correct: 116/162
Estimator: 000 | Epoch: 003 | Batch: 800 | Loss: 0.53450 | Correct: 131/161
Estimator: 000 | Epoch: 003 | Batch: 900 | Loss: 0.54585 | Correct: 128/156
Estimator: 000 | Epoch: 003 | Batch: 1000 | Loss: 0.62484 | Correct: 128/161
Estimator: 000 | Epoch: 003 | Batch: 1100 | Loss: 0.49292 | Correct: 136/162
Estimator

Estimator: 000 | Epoch: 005 | Batch: 3400 | Loss: 0.49705 | Correct: 141/171
Estimator: 000 | Epoch: 005 | Batch: 3500 | Loss: 0.44623 | Correct: 137/162
Estimator: 000 | Epoch: 006 | Batch: 000 | Loss: 0.37597 | Correct: 139/158
Estimator: 000 | Epoch: 006 | Batch: 100 | Loss: 0.45023 | Correct: 137/164
Estimator: 000 | Epoch: 006 | Batch: 200 | Loss: 0.56400 | Correct: 133/164
Estimator: 000 | Epoch: 006 | Batch: 300 | Loss: 0.58576 | Correct: 118/162
Estimator: 000 | Epoch: 006 | Batch: 400 | Loss: 0.38497 | Correct: 140/159
Estimator: 000 | Epoch: 006 | Batch: 500 | Loss: 0.41355 | Correct: 145/165
Estimator: 000 | Epoch: 006 | Batch: 600 | Loss: 0.36104 | Correct: 136/161
Estimator: 000 | Epoch: 006 | Batch: 700 | Loss: 0.46293 | Correct: 132/161
Estimator: 000 | Epoch: 006 | Batch: 800 | Loss: 0.26554 | Correct: 152/168
Estimator: 000 | Epoch: 006 | Batch: 900 | Loss: 0.44932 | Correct: 139/164
Estimator: 000 | Epoch: 006 | Batch: 1000 | Loss: 0.46430 | Correct: 140/164
Estimator

Estimator: 000 | Epoch: 008 | Batch: 3300 | Loss: 0.31701 | Correct: 136/156
Estimator: 000 | Epoch: 008 | Batch: 3400 | Loss: 0.31879 | Correct: 147/163
Estimator: 000 | Epoch: 008 | Batch: 3500 | Loss: 0.39955 | Correct: 137/166
Estimator: 000 | Epoch: 009 | Batch: 000 | Loss: 0.35837 | Correct: 145/165
Estimator: 000 | Epoch: 009 | Batch: 100 | Loss: 0.30288 | Correct: 140/157
Estimator: 000 | Epoch: 009 | Batch: 200 | Loss: 0.26989 | Correct: 143/163
Estimator: 000 | Epoch: 009 | Batch: 300 | Loss: 0.23941 | Correct: 143/160
Estimator: 000 | Epoch: 009 | Batch: 400 | Loss: 0.28652 | Correct: 144/163
Estimator: 000 | Epoch: 009 | Batch: 500 | Loss: 0.23561 | Correct: 150/165
Estimator: 000 | Epoch: 009 | Batch: 600 | Loss: 0.23460 | Correct: 143/154
Estimator: 000 | Epoch: 009 | Batch: 700 | Loss: 0.33101 | Correct: 147/165
Estimator: 000 | Epoch: 009 | Batch: 800 | Loss: 0.27656 | Correct: 137/158
Estimator: 000 | Epoch: 009 | Batch: 900 | Loss: 0.27756 | Correct: 141/161
Estimator

In [24]:
import torch
torch.save(model.state_dict(), './model.ckpt')

## Testing

Create a testing dataset, and load model from the saved checkpoint.

In [25]:
# create testing dataset
test_set = TIMITDataset(test, None)
test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=False)

Make prediction and output the result.

In [26]:
predict = []
#model.eval() # set the model to evaluation mode
with torch.no_grad():
    for i, data in enumerate(test_loader):
        # inputs = data
        # inputs = inputs.to(device)
        outputs = model.predict(data)
        _, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probability

        for y in test_pred.cpu().numpy():
            predict.append(y)

In [27]:
# post-processing with the nearby 2 labels
print(len(predict))
for i in range(1, len(predict) - 1):
    if predict[i + 1] == predict[i - 1] and predict[i - 1] != predict[i]:
        predict[i] = predict[i - 1]
print(len(predict))

307483
307483


In [28]:
# put the result to csv

with open('prediction.csv', 'w') as f:
    f.write('Id,Class\n')
    for i, y in enumerate(predict):
        f.write('{},{}\n'.format(i, y))