# **Homework 2-1 Phoneme Classification**

* Slides: https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/hw/HW02/HW02.pdf
* Video (Chinese): https://youtu.be/PdjXnQbu2zo
* Video (English): https://youtu.be/ESRr-VCykBs


## The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT)
The TIMIT corpus of reading speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems.

This homework is a multiclass classification task, 
we are going to train a deep neural network classifier to predict the phonemes for each frame from the speech corpus TIMIT.

link: https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

## Download Data
Download data from google drive, then unzip it.

You should have `timit_11/train_11.npy`, `timit_11/train_label_11.npy`, and `timit_11/test_11.npy` after running this block.<br><br>
`timit_11/`
- `train_11.npy`: training data<br>
- `train_label_11.npy`: training label<br>
- `test_11.npy`:  testing data<br><br>

**notes: if the google drive link is dead, you can download the data directly from Kaggle and upload it to the workspace**




In [1]:
!nvidia-smi

Fri Mar 17 02:54:00 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    54W / 400W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
!pip install --upgrade --no-cache-dir gdown
!gdown --id '1iDmtJ8vg-SF8dC0r0AoOg4n5UdVOj9To' --output data.zip
!unzip data.zip
!ls 


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting gdown
  Downloading gdown-4.6.4-py3-none-any.whl (14 kB)
Installing collected packages: gdown
  Attempting uninstall: gdown
    Found existing installation: gdown 4.4.0
    Uninstalling gdown-4.4.0:
      Successfully uninstalled gdown-4.4.0
Successfully installed gdown-4.6.4
Downloading...
From: https://drive.google.com/uc?id=1iDmtJ8vg-SF8dC0r0AoOg4n5UdVOj9To
To: /content/data.zip
100% 272M/272M [00:06<00:00, 44.5MB/s]
Archive:  data.zip
   creating: timit_11/
  inflating: __MACOSX/._timit_11     
  inflating: timit_11/train_11.npy   
  inflating: __MACOSX/timit_11/._train_11.npy  
  inflating: timit_11/test_11.npy    
  inflating: __MACOSX/timit_11/._test_11.npy  
  inflating: timit_11/train_label_11.npy  
  inflating: __MACOSX/timit_11/._train_label_11.npy  
data.zip  __MACOSX  sample_data  timit_11


## Preparing Data
Load the training and testing data from the `.npy` file (NumPy array).

In [3]:
import numpy as np

print('Loading data ...')

data_root='./timit_11/'
train = np.load(data_root + 'train_11.npy')
train_label = np.load(data_root + 'train_label_11.npy')
test = np.load(data_root + 'test_11.npy')

print('Size of training data: {}'.format(train.shape))
print('Size of testing data: {}'.format(test.shape))

Loading data ...
Size of training data: (922449, 429)
Size of testing data: (307483, 429)


## Create Dataset

In [4]:
import torch
from torch.utils.data import Dataset

class TIMITDataset(Dataset):
    def __init__(self, X, y=None):
        self.data = torch.from_numpy(X).float()
        if y is not None:
            y = y.astype(np.int)
            self.label = torch.LongTensor(y)
        else:
            self.label = None

    def __getitem__(self, idx):
        if self.label is not None:
            return self.data[idx], self.label[idx]
        else:
            return self.data[idx]

    def __len__(self):
        return len(self.data)


Split the labeled data into a training set and a validation set, you can modify the variable `VAL_RATIO` to change the ratio of validation data.

In [5]:
VAL_RATIO = 0

percent = int(train.shape[0] * (1 - VAL_RATIO))
train_x, train_y, val_x, val_y = train[:percent], train_label[:percent], train[percent:], train_label[percent:]
print('Size of training set: {}'.format(train_x.shape))
print('Size of validation set: {}'.format(val_x.shape))

Size of training set: (922449, 429)
Size of validation set: (0, 429)


Create a data loader from the dataset, feel free to tweak the variable `BATCH_SIZE` here.

In [6]:
BATCH_SIZE = 2048

from torch.utils.data import DataLoader

train_set = TIMITDataset(train_x, train_y)
val_set = TIMITDataset(val_x, val_y)
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True) #only shuffle the training data
val_loader = DataLoader(val_set, batch_size=BATCH_SIZE, shuffle=False)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  y = y.astype(np.int)


Cleanup the unneeded variables to save memory.<br>

**notes: if you need to use these variables later, then you may remove this block or clean up unneeded variables later<br>the data size is quite huge, so be aware of memory usage in colab**

In [7]:
import gc

del train, train_label, train_x, train_y, val_x, val_y
gc.collect()

0

## Create Model

Define model architecture, you are encouraged to change and experiment with the model architecture.

In [14]:
import torch
import torch.nn as nn

class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(429, 2048),
            nn.LeakyReLU(),
            nn.BatchNorm1d(2048),
            nn.Dropout(0.5),
            nn.Linear(2048, 2048),
            nn.LeakyReLU(),
            nn.BatchNorm1d(2048),
            nn.Dropout(0.5),
            nn.Linear(2048, 2048),
            nn.LeakyReLU(),
            nn.BatchNorm1d(2048),
            nn.Dropout(0.5),
            nn.Linear(2048, 1024),
            nn.LeakyReLU(),
            nn.BatchNorm1d(1024),
            nn.Dropout(0.5),
            nn.Linear(1024, 512),
            nn.LeakyReLU(),
            nn.BatchNorm1d(512),
            nn.Dropout(0.5),
            nn.Linear(512, 256),
            nn.LeakyReLU(),
            nn.BatchNorm1d(256),
            nn.Dropout(0.5),
            nn.Linear(256, 39)
        )
        self.criterion = nn.CrossEntropyLoss()

    def forward(self, x):
        return self.net(x)

    def cal_loss(self, pred, target, l1):
      l1_regulation = torch.tensor(0.).to(device)
      for parameter in model.parameters():
        l1_regulation += torch.sum(torch.abs(parameter))
      return self.criterion(pred, target) + (l1 * l1_regulation)

## Training

In [15]:
#check device
def get_device():
  return 'cuda' if torch.cuda.is_available() else 'cpu'

Fix random seeds for reproducibility.

In [16]:
# fix random seed
def same_seeds(seed):
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)  
    np.random.seed(seed)  
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

Feel free to change the training parameters here.

In [21]:
# fix random seed for reproducibility
same_seeds(0)

# get device 
device = get_device()
print(f'DEVICE: {device}')

# training parameters
num_epoch = 250               # number of training epoch
learning_rate = 0.0001       # learning rate
l2 = 1e-4
l1 = 1e-5

# the path where checkpoint saved
model_path = './model.ckpt'

# create model, define a loss function, and optimizer
model = Classifier().to(device)
criterion = nn.CrossEntropyLoss() 
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

DEVICE: cuda


In [22]:
# start training

best_acc = 0.0
for epoch in range(num_epoch):
    train_acc = 0.0
    train_loss = 0.0
    val_acc = 0.0
    val_loss = 0.0

    # training
    model.train() # set the model to training mode
    for i, data in enumerate(train_loader):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad() 
        outputs = model(inputs) 
        batch_loss = model.cal_loss(outputs, labels, l1=l1)
        #batch_loss = criterion(outputs, labels)
        _, train_pred = torch.max(outputs, 1) # get the index of the class with the highest probability
        batch_loss.backward() 
        optimizer.step() 

        train_acc += (train_pred.cpu() == labels.cpu()).sum().item()
        train_loss += batch_loss.item()

    # validation
    if len(val_set) > 0:
        model.eval() # set the model to evaluation mode
        with torch.no_grad():
            for i, data in enumerate(val_loader):
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                batch_loss = criterion(outputs, labels) 
                _, val_pred = torch.max(outputs, 1) 
            
                val_acc += (val_pred.cpu() == labels.cpu()).sum().item() # get the index of the class with the highest probability
                val_loss += batch_loss.item()

            print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f} | Val Acc: {:3.6f} loss: {:3.6f}'.format(
                epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader), val_acc/len(val_set), val_loss/len(val_loader)
            ))

            # if the model improves, save a checkpoint at this epoch
            if val_acc > best_acc:
                best_acc = val_acc
                torch.save(model.state_dict(), model_path)
                print('saving model with acc {:.3f}'.format(best_acc/len(val_set)))
    else:
        print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f}'.format(
            epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader)
        ))

# if not validating, save the last epoch
if len(val_set) == 0:
    torch.save(model.state_dict(), model_path)
    print('saving model at last epoch')


[001/250] Train Acc: 0.385302 Loss: 3.774057
[002/250] Train Acc: 0.522625 Loss: 3.091101
[003/250] Train Acc: 0.567984 Loss: 2.827559
[004/250] Train Acc: 0.595037 Loss: 2.627228
[005/250] Train Acc: 0.613059 Loss: 2.458342
[006/250] Train Acc: 0.627666 Loss: 2.311500
[007/250] Train Acc: 0.638813 Loss: 2.185113
[008/250] Train Acc: 0.647801 Loss: 2.076537
[009/250] Train Acc: 0.655257 Loss: 1.988262
[010/250] Train Acc: 0.661449 Loss: 1.912144
[011/250] Train Acc: 0.666893 Loss: 1.849963
[012/250] Train Acc: 0.671618 Loss: 1.797631
[013/250] Train Acc: 0.675911 Loss: 1.755273
[014/250] Train Acc: 0.679741 Loss: 1.715894
[015/250] Train Acc: 0.682837 Loss: 1.685744
[016/250] Train Acc: 0.686518 Loss: 1.657172
[017/250] Train Acc: 0.689672 Loss: 1.632060
[018/250] Train Acc: 0.692683 Loss: 1.610316
[019/250] Train Acc: 0.694857 Loss: 1.592502
[020/250] Train Acc: 0.697940 Loss: 1.573131
[021/250] Train Acc: 0.700326 Loss: 1.557883
[022/250] Train Acc: 0.702642 Loss: 1.544698
[023/250] 

## Testing

Create a testing dataset, and load model from the saved checkpoint.

In [23]:
# create testing dataset
test_set = TIMITDataset(test, None)
test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=False)

# create model and load weights from checkpoint
model = Classifier().to(device)
model.load_state_dict(torch.load(model_path))

<All keys matched successfully>

Make prediction.

In [24]:
predict = []
model.eval() # set the model to evaluation mode
with torch.no_grad():
    for i, data in enumerate(test_loader):
        inputs = data
        inputs = inputs.to(device)
        outputs = model(inputs)
        _, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probability

        for y in test_pred.cpu().numpy():
            predict.append(y)

Post-Process

In [25]:
def post_process():
  count = 0
  for index in range(1, len(predict)-1):
    pre = predict[index-1]
    cur = predict[index]
    nex = predict[index+1]
    if pre == nex and pre != cur:
      print('index', index, 'correct', cur, 'to', pre)
      predict[index] = pre
      count += 1
  print('correction: %d, precentage: %.3f'%(count, count/len(predict)))
post_process()

index 163 correct 2 to 4
index 249 correct 21 to 35
index 403 correct 20 to 19
index 551 correct 11 to 7
index 601 correct 4 to 12
index 800 correct 4 to 10
index 807 correct 23 to 13
index 932 correct 30 to 27
index 953 correct 4 to 7
index 1018 correct 18 to 19
index 1111 correct 29 to 25
index 1156 correct 16 to 13
index 1193 correct 13 to 5
index 1281 correct 35 to 31
index 1394 correct 12 to 5
index 1413 correct 5 to 38
index 1425 correct 31 to 35
index 1525 correct 33 to 38
index 1538 correct 1 to 2
index 1601 correct 12 to 2
index 1603 correct 12 to 2
index 1713 correct 38 to 33
index 1775 correct 21 to 29
index 1876 correct 27 to 38
index 1895 correct 7 to 9
index 1925 correct 4 to 12
index 2115 correct 33 to 38
index 2143 correct 9 to 3
index 2211 correct 35 to 31
index 2316 correct 14 to 17
index 2336 correct 31 to 35
index 2349 correct 31 to 35
index 2578 correct 24 to 14
index 2621 correct 31 to 35
index 2623 correct 31 to 35
index 2663 correct 4 to 7
index 2669 correct 9 t

Write prediction to a CSV file.

After finish running this block, download the file `prediction.csv` from the files section on the left-hand side and submit it to Kaggle.

In [26]:
!mkdir -p 'result'
with open('result/prediction.csv', 'w') as f:
    f.write('Id,Class\n')
    for i, y in enumerate(predict):
        f.write('{},{}\n'.format(i, y))
!zip -r 'result/prediction.zip' 'result/prediction.csv'

  adding: result/prediction.csv (deflated 72%)
