<a href="https://colab.research.google.com/github/vlozg/aicovid/blob/main/%5BTorch002%5D_AICOVID_115M.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Trong thử nghiệm 001, mình thử chunking data và chạy ResNet18. Kết quả đạt được tập train và chunking validation khá ấn tượng (93%). Và khi test trên data không chunking thì kết quả được 94%. Tuy nhiên khi đem nộp tập test thì chỉ được 53%. Điều này đã cho thấy những vấn đề sau với mô hình:
- Chunking chỉ là phương pháp shift audio --> Mình nghĩ nó không có ý nghĩa nhiều khi dùng CNN vì tính chất spacial invariant.
- Dữ liệu chưa đủ lớn (7k chunks thì chưa ổn lắm đâu).
- Không thể áp dụng được major voting trên dữ liệu test chunk (bị lỗi liên quan tới boolean indexing với cuda).
- Validation dataloader là dữ liệu đã chunked --> Không phản ánh đúng thực tế khi sử dụng.

Trong thử nghiệm này mình sẽ:
- Cải thiện augmentation.
- Tạo các dataloader tốt hơn.
- Tổ chức lại notebook khoa học hơn.

Các trực quan khám phá được để ở trong notebook [Explore002](https://colab.research.google.com/drive/1Y8fV_L70dBqh8gv5qv08pHXpAcjbTUSu). Notebook này chỉ chứa đoạn code dùng để huấn luyện.

In [None]:
#@title Lấy xác thực google để upload/download file
#@markdown Vui lòng bấm vào link khi được yêu cầu và lấy mã để nhập vào

# Xác thực google để upload/download qua google drive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

auth.authenticate_user()
gauth =  GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

In [None]:
#@markdown Hàm quản lý upload file
def driveUpload(file_path, parent_id, file_name=None):
  if file_name == None:
    file_name = file_path.split('/')[-1]
  # Kiểm tra file tồn tại
  file_list = drive.ListFile({'q': f"'{parent_id}' in parents and title = '{file_name}'"}).GetList()
  if len(file_list) > 1:
    for file1 in file_list:
      print('title: %s, id: %s' % (file1['title'], file1['id']))
    raise NameError('More than 1 file with same name exist, please resolve this')
  
  elif len(file_list) == 0:
    # File chưa có thì tạo mới
    file = drive.CreateFile({'title': file_name, 
                             'parents': [{'id': parent_id}]})

  else:
    # Tồn tại duy nhất 1 file
    file = file_list[0]
  
  file.SetContentFile(file_path)
  file.Upload()

In [None]:
#@markdown Hàm quản lý download file theo tên
def driveDownload(file_name, parent_id):
  # Kiểm tra file tồn tại
  file_list = drive.ListFile({'q': f"'{parent_id}' in parents and title = '{file_name}'"}).GetList()
  if len(file_list) > 1:
    for file1 in file_list:
      print('title: %s, id: %s' % (file1['title'], file1['id']))
    raise NameError('More than 1 file with same name exist, please resolve this')
  
  elif len(file_list) == 0:
    raise NameError(f'File named {file_name} not exist')

  else:
    # Tồn tại duy nhất 1 file
    file = file_list[0]
  
  file.GetContentFile(file_name)

# Detect COVID-19 patients via forced-cough cell phone recording

- **Bài toán**: Nhận diện người nhiễm COVID-19 qua tiếng ho ép buộc
    - **Input**: Đoạn ghi âm tiếng ho, tuổi và giới tính
    - **Output**: Phân loại người nhiễm bệnh hay không

## Tìm hiểu bài toán 
Qua paper (https://dspace.mit.edu/bitstream/handle/1721.1/128954/09208795.pdf?sequence=1&isAllowed=y)

# Các biến thiết lập cho thử nghiệm

In [None]:
# Nếu muốn train mô hình thì set thành True
train_mode = True
experiment_id = '002'

In [None]:
# ID của folder lưu model trên drive
model_zoo = 'secret'
# ID của folder chứa submission
submission_folder = 'secret'
# Tên của file nén để nộp
zip_name = f'Torch_ver{experiment_id}'

In [None]:
val_split = 0.8

In [None]:
forced_sr = 8000 # -1 mean not enforced
n_mfcc_ceptrum = 200
n_delta_features = 1

# Import và cài đặt thư viện

In [None]:
# cài lib. note: cài xong phải restart runtime
try:
  import torchaudio
  import pytorch_lightning
except ImportError:
  !pip install torchaudio
  !pip install pytorch-lightning
  exit() # Tự động tắt process (vì lightning cần TF mới hơn nên phải restart)

In [None]:
# Quản lý file, folder
import os
from shutil import copyfile, rmtree
import gc 

# Xử lý audio
import torchaudio

# Hiện audio nghe thử
import IPython.display as ipd
from IPython.display import Audio, display
from tqdm import tqdm

import random
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision.models import resnet18

from torch.utils import data as ptdata
from torch.utils.data import Dataset, TensorDataset
from torch.utils.data import DataLoader

import pytorch_lightning as pl

import torchmetrics
from sklearn.metrics import confusion_matrix

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

# Tải dữ liệu

In [None]:
#@markdown Tải dữ liệu, bao gồm: warmup (public train, public test, private set)
%%capture
# download public train data
# official link: https://drive.google.com/file/d/1MPhz3zYl2yefCq-J5XySbFJt99BfKIZD/view
# personal link: https://drive.google.com/file/d/1hoGLxjLmPY-pX-jSVGIaWIZhovQBMKU1/view?usp=sharing
if not os.path.isfile('./aicv115m_public_train.zip'):
  !gdown --id 1MPhz3zYl2yefCq-J5XySbFJt99BfKIZD
  !unzip -o aicv115m_public_train.zip

# dowload public test data
# official link: https://drive.google.com/file/d/1UrMudzopA3CyR1Ih2J63Kfi2mY_0uhRK/view
# personal link: https://drive.google.com/file/d/1X7vOjHos9f9w48-iTWyu5JElFqCjcH_R/view?usp=sharing
if not os.path.isfile('./aicv115m_public_test.zip'):
  !gdown --id 1UrMudzopA3CyR1Ih2J63Kfi2mY_0uhRK
  !unzip -o aicv115m_public_test.zip

# dowload private test data
# personal link: https://drive.google.com/file/d/1Ec64sSm2dZqe3da_LVyE_jUBD0DnLyqB/view?usp=sharing
if not os.path.isfile('./aicv115m_private_test.zip'):
  !gdown --id 1Ec64sSm2dZqe3da_LVyE_jUBD0DnLyqB
  !unzip -o aicv115m_private_test.zip

# Setup thư mục chứa data và đọc meta

In [None]:
#@markdown Giải nén data
%%capture
!unzip -n aicv115m_public_train/train_audio_files_8k.zip
!unzip -n aicv115m_public_test/public_test_audio_files_8k.zip

train_path = 'train_audio_files_8k/'
test_path = 'public_test_audio_files_8k/'
private_test_path = 'aicv115m_private_test/private_test_audio_files_8k/'

In [None]:
print(f'Train path: {train_path}')
print(f'Test path: {test_path}')
print(f'Private test path: {private_test_path}')

Train path: train_audio_files_8k/
Test path: public_test_audio_files_8k/
Private test path: aicv115m_private_test/private_test_audio_files_8k/


In [None]:
#@markdown Đọc meta
train_meta = pd.read_csv('aicv115m_public_train/metadata_train_challenge.csv')
train_meta['file_path'] = train_path+train_meta['file_path']
test_meta = pd.read_csv('aicv115m_public_test/metadata_public_test.csv')
test_meta['file_path'] = test_path+test_meta['file_path']
private_test_meta = pd.read_csv('aicv115m_private_test/metadata_private_test.csv')
private_test_meta['file_path'] = private_test_path+private_test_meta['file_path']

In [None]:
display(train_meta.shape)
train_meta.head()

(1199, 5)

Unnamed: 0,uuid,subject_gender,subject_age,assessment_result,file_path
0,3284bcf1-2446-4f3a-ac66-14c76b294177,male,23.0,0,train_audio_files_8k/3284bcf1-2446-4f3a-ac66-1...
1,431334e1-5946-4576-bb51-8e342ccc22b4,,,0,train_audio_files_8k/431334e1-5946-4576-bb51-8...
2,1d6fac4b-1e7f-4bdc-81cd-3a720bfbb1e1,,,0,train_audio_files_8k/1d6fac4b-1e7f-4bdc-81cd-3...
3,c7ee0695-b2e7-4beb-b904-f1455c9609d9,male,49.0,0,train_audio_files_8k/c7ee0695-b2e7-4beb-b904-f...
4,dd541704-b696-4181-8fd8-816daac0fcf9,,,0,train_audio_files_8k/dd541704-b696-4181-8fd8-8...


In [None]:
display(test_meta.shape)
test_meta.head()

(350, 4)

Unnamed: 0,uuid,subject_gender,subject_age,file_path
0,66ef1f05-fbb0-44cb-8bdb-8eb4df83359a,female,28.0,public_test_audio_files_8k/66ef1f05-fbb0-44cb-...
1,73d13a12-f9bc-4554-af49-be24f6024a25,,,public_test_audio_files_8k/73d13a12-f9bc-4554-...
2,d27dbe98-e061-4018-9900-d1f1d47feab1,,,public_test_audio_files_8k/d27dbe98-e061-4018-...
3,43c30e4c-5d35-4ebc-8235-8920b7688550,female,,public_test_audio_files_8k/43c30e4c-5d35-4ebc-...
4,1952aa84-d077-495d-a1a9-9686a30722e0,female,,public_test_audio_files_8k/1952aa84-d077-495d-...


In [None]:
display(private_test_meta.shape)
private_test_meta.head()

(450, 4)

Unnamed: 0,uuid,subject_gender,subject_age,file_path
0,bce020a3-6ab7-46df-8a75-7f8009a1883e,,,aicv115m_private_test/private_test_audio_files...
1,efe397fd-5ff1-41d8-b991-b8acdafd663c,male,45.0,aicv115m_private_test/private_test_audio_files...
2,5954077a-4c41-4a2e-9cad-e3bb2d6402c4,female,27.0,aicv115m_private_test/private_test_audio_files...
3,2b330c25-0816-480a-bb87-9d3d0d632c0c,,,aicv115m_private_test/private_test_audio_files...
4,bfa78793-b3b8-42b8-bad0-77e3c55abfda,,,aicv115m_private_test/private_test_audio_files...


# Hàm xử lý âm thanh

In [None]:
#@markdown ## Các hàm vỏ bọc cho đọc file
#@markdown `read_audio(path)`: vỏ bọc cho `torchaudio.load(path)`.<br>
#@markdown `read_resample_audio(path)`: chỉ trả về wave vì sample rate đã được cố định.

'''
  Read audio from given path and return (wave, sample_rate)
'''
def read_audio(full_audio_path):
  return torchaudio.load(full_audio_path)

'''
  Read audio from given path, then resample if sample rate is not matched 
  and return wave.

  Tips: 
    you should provide resampler from torchaudio.transform
    when batch resampling with same params since this can
    give a huge speed up.
'''
def read_resample_audio(
    full_audio_path, resample,
    resampler=None
):
  wave, sr = torchaudio.load(full_audio_path)
  if resampler is not None:
      wave = resampler(wave)
  elif sr != resample:
      wave = torchaudio.functional.resample(wave, sr, resample)
  return wave

## Audio features

In [None]:
# Spectrogram transformation
n_fft = 2048
win_length = 160
hop_length = 80
n_mels = 200
n_mfcc = 200

In [None]:
#@markdown `spectrogram(waveform)` --> spec 
spectrogram = torchaudio.transforms.Spectrogram(
    n_fft=n_fft,
    win_length=win_length,
    hop_length=hop_length,
    center=True,
    normalized=True,
    pad_mode="reflect",
    power=2.0,
)

#@markdown `mel_spectrogram(waveform)` --> mel_spec 
mel_spectrogram = torchaudio.transforms.MelSpectrogram(
    sample_rate=8000,
    n_fft=n_fft,
    win_length=win_length,
    hop_length=hop_length,
    center=True,
    pad_mode="reflect",
    power=2.0,
    #norm='slaney',
    onesided=True,
    normalized=True,
    n_mels=n_mels,
    mel_scale="htk",
)

#@markdown `log_spectrogram(spec)` --> log(spec)
log_spectrogram = torchaudio.transforms.AmplitudeToDB(
    stype='power',
    top_db=80
)

## Augmentation cho audio
Bao gồm: thêm noise (nhiều mức độ), SpecAugment, chunking

In [None]:
#@markdown `AudioChunking(chunk_size=400, chunk_step=200)`
class AudioChunking(torch.nn.Module):
    def __init__(self,
                 chunk_size: int=400,
                 chunk_step: int=200) -> None:
        super(AudioChunking, self).__init__()
        self.chunk_size = chunk_size
        self.chunk_step = chunk_step
        
    def forward(self, spec: torch.Tensor) -> torch.Tensor:
        _, _, spec_len = spec.shape
        pad_size = self.chunk_size - spec_len%self.chunk_size
        pad_size = (pad_size//2, pad_size//2+pad_size%2)
        padded_spec = torch.nn.functional.pad(spec, pad_size, mode='constant', value=0)
        chunks = padded_spec.unfold(-1, self.chunk_size, self.chunk_step).permute(2,0,1,3)
        return chunks

In [None]:
#@markdown `SpecAugment(time_W=50, freq_W=50, T=80, F=80)`
def _h_poly(t):
    tt = t.unsqueeze(-2)**torch.arange(4, device=t.device).view(-1,1)
    A = torch.tensor([
        [1, 0, -3, 2],
        [0, 1, -2, 1],
        [0, 0, 3, -2],
        [0, 0, -1, 1]
    ], dtype=t.dtype, device=t.device)
    return A @ tt


def _cspline_interpolate(x, y, xs):
    '''
    Input x and y must be of shape (batch, n) or (n)
    '''
    m = (y[..., 1:] - y[..., :-1]) / (x[..., 1:] - x[..., :-1])
    m = torch.cat([m[...,[0]], (m[...,1:] + m[...,:-1]) / 2, m[...,[-1]]], -1)
    idxs = torch.searchsorted(x[..., 1:], xs)
    dx = (x.take_along_dim(idxs+1, dim=-1) - x.take_along_dim(idxs, dim=-1))
    hh = _h_poly((xs - x.take_along_dim(idxs, dim=-1)) / dx)
    return hh[...,0,:] * y.take_along_dim(idxs, dim=-1) \
        + hh[...,1,:] * m.take_along_dim(idxs, dim=-1) * dx \
        + hh[...,2,:] * y.take_along_dim(idxs+1, dim=-1) \
        + hh[...,3,:] * m.take_along_dim(idxs+1, dim=-1) * dx
        

class SpecAugment(torch.nn.Module):
  def __init__(
      self,
      time_W: int = 50,
      freq_W: int = 50,
      T: int = 80,
      F: int = 80,
      mT: int = 1,
      mF: int = 1
  ) -> None:
      super(SpecAugment, self).__init__()
      self.time_W = time_W
      self.freq_W = freq_W
      if time_W==0 and freq_W==0:
          self.cum_warping = lambda x: x
      elif time_W!=0 and freq_W==0:
          self.cum_warping = self.time_warping
      elif time_W==0 and freq_W!=0:
          self.cum_warping = self.freq_warping
      else:
          self.cum_warping = self.time_freq_warping
      self.time_masking = torchaudio.transforms.TimeMasking(time_mask_param=T)
      self.freq_masking = torchaudio.transforms.FrequencyMasking(freq_mask_param=F)


  def _get_warping_flow(self,
                        warp_p: torch.Tensor,
                        warp_d: torch.Tensor,
                        interp_len: int) -> torch.Tensor:
      '''
      Get interpolated flow
      Warning: This function doesn't check for batch size match between warp_p and warp_d
      '''
      device = warp_p.device
      batch_size = warp_p.shape[0]

      src_control_points = torch.stack([torch.tensor([0], device=device).expand(batch_size),
                                        warp_p, torch.tensor([interp_len-1], device=device).expand(batch_size)], dim=1)
      dest_control_points = torch.stack([torch.tensor([-1.], device=device).expand(batch_size),
                                        (warp_p-warp_d)*2/(interp_len-1)-1, torch.tensor([1], device=device).expand(batch_size)], dim=1)

      # Interpolate from 3 points to interp_len points
      src_interp_points = torch.linspace(0, interp_len-1, interp_len, device=device).unsqueeze(0).expand(batch_size, -1)
      dest_interp_points = _cspline_interpolate(src_control_points, dest_control_points, src_interp_points)

      return dest_interp_points


  def freq_warping(self, specs: torch.Tensor) -> torch.Tensor:
      '''
      Frequency warping augmentation, only return interpolated flow

      param:
        specs: spectrogram of size (batch, channel, freq_bin, length)
      '''
      W = self.freq_W
      device = specs.device
      batch_size, _, num_freqs, num_frames = specs.shape

      warp_p = torch.randint(W, num_freqs - W, (batch_size,), device=device)

      # Uniform distribution from (0,W) with chance to be up to W negative
      warp_d = torch.randint(-W, W, (batch_size,), device=device)
      
      dest_freq_points = self._get_warping_flow(warp_p, warp_d, num_freqs)
      dest_frame_points = torch.linspace(-1, 1, num_frames, device=device)

      grid = torch.cat(
          (dest_frame_points.view(-1,1).expand(batch_size,num_freqs,-1,-1),
          dest_freq_points.view(batch_size,-1,1,1).expand(-1,-1,num_frames,-1)), dim=-1)

      return torch.nn.functional.grid_sample(specs, grid, align_corners=True)


  def time_warping(self, specs: torch.Tensor) -> torch.Tensor:
      '''
      Time warping augmentation, only return interpolated flow

      param:
        specs: spectrogram of size (batch, channel, freq_bin, length)
      '''
      W = self.time_W
      device = specs.device
      batch_size, _, num_freqs, num_frames = specs.shape

      warp_p = torch.randint(W, num_frames - W, (batch_size,), device=device)

      # Uniform distribution from (0,W) with chance to be up to W negative
      warp_d = torch.randint(-W, W, (batch_size,), device=device)

      # Interpolate from 3 points to num_frames points
      dest_frame_points = self._get_warping_flow(warp_p, warp_d, num_frames)
      dest_freq_points = torch.linspace(-1, 1, num_freqs, device=device)

      grid = torch.cat(
          (dest_frame_points.view(batch_size,1,-1,1).expand(-1,num_freqs,-1,-1),
          dest_freq_points.view(-1,1,1).expand(batch_size,-1,num_frames,-1)), dim=-1)

      return torch.nn.functional.grid_sample(specs, grid, align_corners=True)


  def time_freq_warping(self,specs: torch.Tensor) -> torch.Tensor:
      '''
      Doing both time warping and frequency warping augmentation

      param:
        specs: spectrogram of size (batch, channel, freq_bin, length)
        W: strength of warp
      '''
      device = specs.device
      batch_size, _, num_freqs, num_frames = specs.shape

      time_warp_p = torch.randint(self.time_W, num_frames - self.time_W, (batch_size,), device=device)
      freq_warp_p = torch.randint(self.freq_W, num_freqs - self.freq_W, (batch_size,), device=device)

      # Uniform distribution from (0,W) with chance to be up to W negative
      time_warp_d = torch.randint(-self.time_W, self.time_W, (batch_size,), device=device)
      freq_warp_d = torch.randint(-self.freq_W, self.freq_W, (batch_size,), device=device)

      # Interpolate lên theo kích thước spec
      dest_freq_points = self._get_warping_flow(freq_warp_p, freq_warp_d, num_freqs)
      dest_frame_points = self._get_warping_flow(time_warp_p, time_warp_d, num_frames)

      grid = torch.cat(
          (dest_frame_points.view(batch_size,1,-1,1).expand(-1,num_freqs,-1,-1),
          dest_freq_points.view(batch_size,-1,1,1).expand(-1,-1,num_frames,-1)), dim=-1)

      return torch.nn.functional.grid_sample(specs, grid, align_corners=True)


  def forward(self, specs: torch.Tensor) -> torch.Tensor:
      aug_specs = self.cum_warping(specs)
      aug_specs = self.time_masking(aug_specs)
      aug_specs = self.freq_masking(aug_specs)
      return aug_specs

In [None]:
#@markdown Tải noise audio
import requests

!mkdir _sample_data
SAMPLE_NOISE_URL = "https://pytorch-tutorial-assets.s3.amazonaws.com/VOiCES_devkit/distant-16k/distractors/rm1/babb/Lab41-SRI-VOiCES-rm1-babb-mc01-stu-clo.wav"
SAMPLE_NOISE_PATH = os.path.join('_sample_data', "bg.wav")
SAMPLE_RIR_URL = "https://pytorch-tutorial-assets.s3.amazonaws.com/VOiCES_devkit/distant-16k/room-response/rm1/impulse/Lab41-SRI-VOiCES-rm1-impulse-mc01-stu-clo.wav"
SAMPLE_RIR_PATH = os.path.join('_sample_data', "rir.wav")

def _fetch_data():
  uri = [
    (SAMPLE_NOISE_URL, SAMPLE_NOISE_PATH),
    (SAMPLE_RIR_URL, SAMPLE_RIR_PATH)
  ]
  for url, path in uri:
    with open(path, 'wb') as file_:
      file_.write(requests.get(url).content)

_fetch_data()

mkdir: cannot create directory ‘_sample_data’: File exists


In [None]:
def _get_sample(path, resample=None):
  effects = [
    ["remix", "1"]
  ]
  if resample:
    effects.extend([
      ["lowpass", f"{resample // 2}"],
      ["rate", f'{resample}'],
    ])
  return torchaudio.sox_effects.apply_effects_file(path, effects=effects)

def get_noise_sample(*, resample=None):
  return _get_sample(SAMPLE_NOISE_PATH, resample=resample)

def get_rir_sample(*, resample=None, processed=False):
  rir_raw, sample_rate = _get_sample(SAMPLE_RIR_PATH, resample=resample)
  if not processed:
    return rir_raw, sample_rate
  rir = rir_raw[:, int(sample_rate*1.01):int(sample_rate*1.3)]
  rir = rir / torch.norm(rir, p=2)
  rir = torch.flip(rir, [1])
  return rir, sample_rate

In [None]:
import math

#@markdown `RoomReverb`, `NoiseInject`, `PhoneSim`
class RoomReverb(torch.nn.Module):
    def __init__(self, rir_list):
        super(RoomReverb, self).__init__()
        self.rirs = rir_list

    def _get_rir(self):
        if type(self.rirs) is list:
            return random.choice(self.rirs)
        else: 
            return next(self.rirs)

    def forward(self, wave: torch.Tensor):
        rir = self._get_rir()
        _wave = torch.nn.functional.pad(wave, (rir.shape[-1]-1, 0))
        _wave = torch.nn.functional.conv1d(_wave[None, ...], rir[None, ...])[0]
        return _wave


class NoiseInject(torch.nn.Module):
    def __init__(self, noise_list, snr_db):
        super(NoiseInject, self).__init__()
        self.noises = noise_list
        self.snr_db = snr_db

    def _get_noise(self):
        if type(self.noises) is list:
            return random.choice(self.noises)
        else: 
            return next(self.noises)

    def forward(self, wave: torch.Tensor):
        noise = self._get_noise()
        _noise = noise.repeat(1, 1 + wave.shape[-1] // noise.shape[-1])[..., :wave.shape[-1]]
        scale = math.exp(self.snr_db / 10) * _noise.norm(p=2) / wave.norm(p=2)
        _wave = (scale * wave + _noise) / 2
        return _wave


class PhoneSim(torch.nn.Module):
    def __init__(self):
        super(PhoneSim, self).__init__()

    def forward(self, wave: torch.Tensor):
        device = wave.device
        _wave = wave.cpu()
        _wave, _ = torchaudio.sox_effects.apply_effects_tensor(
          _wave, 8000,
          effects=[["lowpass", "4000"],
                   ["compand", "0.02,0.05", "-60,-60,-30,-10,-20,-8,-5,-8,-2,-8", "-8", "-7", "0.05"]]
        )
        _wave = torchaudio.functional.apply_codec(_wave, 8000, format="gsm")
        return _wave.to(device)

In [None]:
rir, _ = get_rir_sample(resample=8000, processed=True)
noise, _ = get_noise_sample(resample=8000)

In [None]:
#@markdown `StandardScaler()`
class StandardScaler(torch.nn.Module):
    def __init__(self) -> None:
        super(StandardScaler, self).__init__()
        
    def forward(self, spec: torch.Tensor) -> torch.Tensor:
        return ((spec-spec.mean())/spec.std()).nan_to_num(posinf=0.0, neginf=0.0)

#@markdown `MinMaxScaler()`
class MinMaxScaler(torch.nn.Module):
    def __init__(self, min=None, max=None) -> None:
        super(MinMaxScaler, self).__init__()
        if min:
            self._min = lambda x: min
        else:
            self._min = lambda x: x.min()
        if max:
            self._max = lambda x: max
        else:
            self._max = lambda x: x.max()
        
    def forward(self, spec: torch.Tensor) -> torch.Tensor:
        return ((spec-self._min(spec))/(self._max(spec)-self._min(spec))).nan_to_num(posinf=0.0, neginf=0.0)

# Các hàm bổ trợ trực quan

In [None]:
#@markdown Vẽ specgram `plot_specgram(wave, sr, title, xlim, ylim)`
#@markdown (specgram chỉ đơn giản là apply discrete-time Fourier transform)

def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
  # Tensor --> Numpy
  waveform = waveform.numpy()

  num_channels, num_frames = waveform.shape
  time_axis = torch.arange(0, num_frames) / sample_rate

  figure, axes = plt.subplots(num_channels, 1)
  if num_channels == 1:
    axes = [axes]

  # Plot specgram for each channel
  for c in range(num_channels):
    axes[c].specgram(waveform[c], Fs=sample_rate)
    if num_channels > 1:
      axes[c].set_ylabel(f'Channel {c+1}')
    if xlim:
      axes[c].set_xlim(xlim)
  figure.suptitle(title)
  plt.show(block=False)



#@markdown Vẽ waveform `plot_waveform(wave, sr, title, xlim, ylim)`

def plot_waveform(waveform, sample_rate, title="Waveform", xlim=None, ylim=None):
  # Tensor --> Numpy
  waveform = waveform.numpy()

  num_channels, num_frames = waveform.shape
  time_axis = torch.arange(0, num_frames) / sample_rate

  figure, axes = plt.subplots(num_channels, 1)
  if num_channels == 1:
    axes = [axes]

  # Plot waveform for each channel
  for c in range(num_channels):
    axes[c].plot(time_axis, waveform[c], linewidth=1)
    axes[c].grid(True)
    if num_channels > 1:
      axes[c].set_ylabel(f'Channel {c+1}')
    if xlim:
      axes[c].set_xlim(xlim)
    if ylim:
      axes[c].set_ylim(ylim)
  figure.suptitle(title)
  plt.show(block=False)



#@markdown Vẽ spectrogram `plot_spectrogram(spec, axs, title, ylabel, aspect, xmax)`

def plot_spectrogram(spec, fig=None, axs=None, title=None, ylabel='freq_bin', aspect='auto', xmax=None):
  if axs is None:
    fig, axs = plt.subplots(1, 1)
  axs.set_title(title or 'Spectrogram (db)')
  axs.set_ylabel(ylabel)
  axs.set_xlabel('frame')
  im = axs.imshow(log_spectrogram(spec), origin='lower', aspect=aspect)
  if xmax:
    axs.set_xlim((0, xmax))
  fig.colorbar(im, ax=axs)



#@markdown Hiển thị audio box `play_audio(wave, sr)`

def play_audio(waveform, sample_rate):
  waveform = waveform.numpy()

  num_channels, num_frames = waveform.shape
  if num_channels == 1:
    display(Audio(waveform[0], rate=sample_rate))
  elif num_channels == 2:
    display(Audio((waveform[0], waveform[1]), rate=sample_rate))
  else:
    raise ValueError("Waveform with more than 2 channels are not supported.")

# Đọc chuẩn bị dataset

## Tách validation set

In [None]:
from sklearn.model_selection import train_test_split

idx_train, idx_val = train_test_split(train_meta.index,train_size=val_split)

val_meta = train_meta.iloc[idx_val]
train_meta = train_meta.iloc[idx_train]

display(len(train_meta))
display(len(val_meta))

959

240

## Tạo dataset

In [None]:
class AICOVIDDataset(Dataset):
    def __init__(self, meta_df, audio_transforms: torch.nn.ModuleList, stacking: bool=False):
        self.meta_df = meta_df
        self.transforms = audio_transforms
        self.specs = []
        self.idxs = []  # Cause 1 audio can be duplicated up to 4 by using 4 difference transformations
                        # so we need index array use to query info from dataframe

        for id, file in enumerate(self.meta_df['file_path']):
            specs = self._read_spec_audio(file)
            if stacking:
                specs = torch.cat(specs)
                self.specs += [specs]
            else:
                self.specs += specs
            self.idxs += [id]*len(specs)

        if stacking:
            self.specs = torch.cat(self.specs)
        
    def _read_spec_audio(self, file):
        wave = read_resample_audio(file, 8000).cuda()
        specs = [trans(wave) for trans in self.transforms]
        return specs

    def __len__(self):
        return len(self.specs)

    def __getitem__(self, idx):
        spec = self.specs[idx]
        meta = self.meta_df.iloc[self.idxs[idx]]
        try:
            label = torch.tensor(meta['assessment_result'])
        except KeyError:
            label = None
        id = meta['uuid']
        gender = meta['subject_gender']
        age = meta['subject_age']

        return spec, label, id, gender, age

In [None]:
# Đọc, nhân bản và rút trích mel spectrogram
basic_transform = torch.nn.Sequential(mel_spectrogram,
                                      log_spectrogram,
                                      StandardScaler()).cuda()
transform0 = torch.nn.Sequential(basic_transform).cuda()
transform1 = torch.nn.Sequential(NoiseInject([noise.cuda()], 8),
                                 basic_transform).cuda()
transform2 = torch.nn.Sequential(NoiseInject([noise.cuda()], 16),
                                 basic_transform).cuda()
transform3 = torch.nn.Sequential(RoomReverb([rir.cuda()]), 
                                 NoiseInject([noise.cuda()], 8), 
                                 PhoneSim(),
                                 basic_transform).cuda()

In [None]:
chunking = AudioChunking(400, 200)
transform0_chunking = torch.nn.Sequential(transform0,
                                 chunking).cuda()
transform1_chunking = torch.nn.Sequential(transform1,
                                 chunking).cuda()
transform2_chunking = torch.nn.Sequential(transform2,
                                 chunking).cuda()
transform3_chunking = torch.nn.Sequential(transform3,
                                 chunking).cuda()

In [None]:
train_set = AICOVIDDataset(train_meta, torch.nn.ModuleList([transform0_chunking,transform1_chunking,transform2_chunking,transform3_chunking]),
                           stacking=True)
val_set = AICOVIDDataset(val_meta, torch.nn.ModuleList([basic_transform]))
test_set = AICOVIDDataset(test_meta, torch.nn.ModuleList([basic_transform]))

In [None]:
# TEST DATALOADER
train_set[0]

In [None]:
len(train_set)

## Tạo dataloader

In [None]:
# def overlap_chunking(tensor, chunk_size, chunk_step):
#   # Chunk_num x MFCC_features x chunk_size
#   chunks = tensor.unfold(1, chunk_size, chunk_step).permute(1,0,2)
#   return chunks[ torch.amax(chunks, dim=[1,2]) > 1.7 ]

def collate_chunking_fn(batch):
    # A data tuple has the form:
    # spec, label

    specs, labels = [], []

    # Gather in lists, and encode labels as indices
    for spec, label, _, _, _ in batch:
        chunks = chunking(spec)
        specs += [chunks]
        labels += [label]*chunks.size()[0]

    # Group the list of tensors into a batched tensor
    specs = torch.cat(specs)
    try:
      labels = torch.stack(labels)
      return specs, labels
    except:
      return specs

In [None]:
from torch.nn.utils.rnn import pad_sequence

def collate_pad_seq_fn(batch):
    # A data tuple has the form:
    # spec, label

    specs, labels = [], []

    # Gather in lists, and encode labels as indices
    for spec, label, _, _, _ in batch:
        specs += [spec.permute(2,0,1)]
        labels += [label]

    # Group the list of tensors into a batched tensor
    specs = pad_sequence(specs, batch_first=True).permute(0,2,3,1)
    try:
      labels = torch.stack(labels)
      return specs, labels
    except:
      return specs

In [None]:
def collate_fn(batch):
    # A data tuple has the form:
    # spec, label

    specs, labels = [], []

    # Gather in lists, and encode labels as indices
    for spec, label, _, _, _ in batch:
        specs += [spec]
        labels += [label]

    # Group the list of tensors into a batched tensor
    specs = torch.stack(specs)
    try:
      labels = torch.stack(labels)
      return specs, labels
    except:
      return specs

In [None]:
chunking_batch_size = 32
batch_size = 16

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if device == "cuda":
    num_workers = 2
    pin_memory = True
else:
    num_workers = 0
    pin_memory = False

In [None]:
train_loader = torch.utils.data.DataLoader(
    train_set,
    batch_size=chunking_batch_size,
    shuffle=True,
    collate_fn=collate_fn,
    num_workers=num_workers,
    pin_memory=pin_memory,
)

val_loader = torch.utils.data.DataLoader(
    val_set,
    batch_size=batch_size,
    shuffle=False,
    collate_fn=collate_pad_seq_fn,
    num_workers=num_workers,
    pin_memory=pin_memory,
)

test_loader = torch.utils.data.DataLoader(
    test_set,
    batch_size=batch_size,
    shuffle=False,
    collate_fn=collate_pad_seq_fn,
    num_workers=num_workers,
    pin_memory=pin_memory,
)

In [None]:
test_loader = torch.utils.data.DataLoader(
    AICOVIDDataset(val_meta, torch.nn.ModuleList([basic_transform])),
    batch_size=batch_size,
    shuffle=False,
    collate_fn=collate_pad_seq_fn,
    num_workers=num_workers,
    pin_memory=pin_memory,
)

# Giám sát các biến chưa được xóa

In [None]:
def pretty_size(size):
	"""Pretty prints a torch.Size object"""
	assert(isinstance(size, torch.Size))
	return " × ".join(map(str, size))

def dump_tensors(gpu_only=True):
	"""Prints a list of the Tensors being tracked by the garbage collector."""
	import gc
	total_size = 0
	for obj in gc.get_objects():
		try:
			if torch.is_tensor(obj):
				if not gpu_only or obj.is_cuda:
					print("%s:%s%s %s" % (type(obj), 
										  " GPU" if obj.is_cuda else "",
										  " pinned" if obj.is_pinned else "",
										  pretty_size(obj.size())))
					total_size += obj.numel()
			elif hasattr(obj, "data") and torch.is_tensor(obj.data):
				if not gpu_only or obj.is_cuda:
					print("%s → %s:%s%s%s%s %s" % (type(obj), 
												   type(obj.data).__name__, 
												   " GPU" if obj.is_cuda else "",
												   " pinned" if obj.data.is_pinned else "",
												   " grad" if obj.requires_grad else "", 
												   " volatile" if obj.volatile else "",
												   pretty_size(obj.data.size())))
					total_size += obj.data.numel()
		except Exception as e:
			pass        
	print("Total size:", total_size)

# Huấn luyện mô hình (có thể bỏ qua vì mô hình đã save trên drive)

## Mô hình ResNet18

In [None]:
class AICOVIDModule(pl.LightningModule):
    def __init__(self, model, learning_rate, augment=None):
        super().__init__()
        self.model = model
        if augment:
            self.augment = augment
        else:
            self.augment = lambda x: x
        self.learning_rate = learning_rate
        self.lr = learning_rate
        self.acc_metric = torchmetrics.Accuracy()

    def forward(self, x):
        x = self.model(x)
        return F.log_softmax(x, dim=1)

    def cross_entropy_loss(self, logits, labels):
        return F.nll_loss(logits, labels)

    def training_step(self, train_batch, batch_idx):
        x, y = train_batch
        x = self.augment(x)
        logits = self.forward(x)
        
        # negative log-likelihood for a tensor of size (batch x n_output)
        loss = self.cross_entropy_loss(logits, y)
        acc = self.acc_metric(logits, y)
        self.log('train_loss', loss, prog_bar=True)
        self.log('train_acc', acc, prog_bar=True)

        return loss
    
    def validation_step(self, val_batch, batch_idx):
        x, y = val_batch
        logits = self.forward(x)

        # negative log-likelihood for a tensor of size (batch x n_output)
        loss = self.cross_entropy_loss(logits, y)
        acc = self.acc_metric(logits, y)
        
        self.log('val_loss', loss, prog_bar=True)
        self.log('val_acc', acc, prog_bar=True)

    def test_step(self, test_batch, batch_idx):
        x, y = test_batch
        logits = self.forward(x)

        # negative log-likelihood for a tensor of size (batch x n_output)
        loss = self.cross_entropy_loss(logits, y)
        acc = self.acc_metric(logits, y)
        
        self.log('test_loss', loss, prog_bar=True)
        self.log('test_acc', acc, prog_bar=True)

    def configure_optimizers(self):
        self.optimizer = torch.optim.Adam(self.parameters(), lr=(self.lr or self.learning_rate))
        self.scheduler = torch.optim.lr_scheduler.OneCycleLR(self.optimizer, max_lr=(self.lr or self.learning_rate), 
                                                             steps_per_epoch=len(self.train_dataloader()) // self.trainer.accumulate_grad_batches, epochs=self.trainer.max_epochs)
        sched = {
            'scheduler': self.scheduler,
            'interval': 'step'
        }
        return [self.optimizer], [sched]

    def on_epoch_start(self):
        self.log('lr', self.scheduler.get_lr()[0], prog_bar=True)

In [None]:
def reset_weight(model):
  model.load_state_dict(torch.load('init_weights.pth'))

In [None]:
resnet = resnet18(pretrained=True)
resnet.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
num_filters = resnet.fc.in_features
resnet.fc = nn.Linear(num_filters, 2)
model = AICOVIDModule(resnet, learning_rate=0.05)
torch.save(model.state_dict(), 'init_weights.pth')

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth


HBox(children=(FloatProgress(value=0.0, max=46830571.0), HTML(value='')))




## Profiling, kiểm tra bottleneck

In [None]:
# logger = pl.loggers.TensorBoardLogger(
#                 save_dir='.',
#                 version='benchmarking',
#                 name='lightning_logs'
#                 )

In [None]:
# # Thử 5 epoch để benchmarking
# reset_weight(model)
# trainer = pl.Trainer(gpus=1,
#                      profiler="simple", logger=logger, log_gpu_memory='all', max_epochs=3, progress_bar_refresh_rate=1)
# trainer.fit(model, train_loader, val_loader)

## Train hoàn chỉnh 60 epochs

Tạo callback tự động backup model lightning logs lên drive

In [None]:
from pytorch_lightning.callbacks import Callback

class BackupCallback(Callback):
    def on_train_end(self, trainer, pl_module):
        if (trainer.current_epoch+1)%20 == 0:
            os.system("zip ./tmp_lightning_logs.zip ./lightning_logs/*")
            try:
                driveUpload("tmp_lightning_logs.zip", model_zoo)
            except:
                print("Upload failed.")
            print(f"Lightning logs backuped at epoch {trainer.current_epoch}.")

## Thí nghiệm trên raw data

In [None]:
try:
    del train_set
    del val_set
    del train_loader
    del val_loader
except:
    print("No variable to delete")
gc.collect()
with torch.no_grad():
    torch.cuda.empty_cache()
pl.utilities.memory.garbage_collection_cuda()

No variable to delete


In [None]:
(torch.cuda.memory_reserved(0) - torch.cuda.memory_allocated(0))/1024

2249.0

In [None]:
print(torch.cuda.memory_summary())

|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |    4467 MB |    8985 MB |  221297 MB |  216829 MB |
|       from large pool |    4466 MB |    8984 MB |  192905 MB |  188439 MB |
|       from small pool |       1 MB |     429 MB |   28391 MB |   28390 MB |
|---------------------------------------------------------------------------|
| Active memory         |    4467 MB |    8985 MB |  221297 MB |  216829 MB |
|       from large pool |    4466 MB |    8984 MB |  192905 MB |  188439 MB |
|       from small pool |       1 MB |     429 MB |   28391 MB |   28390 MB |
|---------------------------------------------------------------

In [None]:
train_set = AICOVIDDataset(train_meta, torch.nn.ModuleList([basic_transform]))
val_set = AICOVIDDataset(val_meta, torch.nn.ModuleList([basic_transform]))

In [None]:
batch_size = 8

train_loader = torch.utils.data.DataLoader(
    train_set,
    batch_size=batch_size,
    shuffle=True,
    collate_fn=collate_pad_seq_fn,
    num_workers=num_workers,
    pin_memory=pin_memory,
)

val_loader = torch.utils.data.DataLoader(
    val_set,
    batch_size=batch_size,
    shuffle=False,
    collate_fn=collate_pad_seq_fn,
    num_workers=num_workers,
    pin_memory=pin_memory,
)

### Thí nghiệm 1: train trên data raw (có pad sequence)

Tạo logger có tên cố định

In [None]:
logger1 = pl.loggers.TensorBoardLogger(
                save_dir='.',
                version='train_60_epochs_exp1',
                name='lightning_logs'
                )

In [None]:
reset_weight(model)
trainer = pl.Trainer(gpus=1, max_epochs=60, accumulate_grad_batches=2,
                     logger=logger1, progress_bar_refresh_rate=10, callbacks=[BackupCallback()])
trainer.fit(model, train_loader, val_loader)

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name       | Type     | Params
----------------------------------------
0 | model      | ResNet   | 11.2 M
1 | acc_metric | Accuracy | 0     
----------------------------------------
11.2 M    Trainable params
0         Non-trainable params
11.2 M    Total params
44.685    Total estimated model params size (MB)


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…





HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

RuntimeError: ignored

In [None]:
trainer.test(test_dataloaders=test_loader)

### Thí nghiệm 2: thử data raw có SpecAugment, không chunking

In [None]:
model.augment = SpecAugment(time_W=100, freq_W=50, F=50, T=50)

Tạo logger có tên cố định

In [None]:
logger2 = pl.loggers.TensorBoardLogger(
                save_dir='.',
                version='train_60_epochs_exp2',
                name='lightning_logs'
                )

In [None]:
reset_weight(model)
trainer = pl.Trainer(gpus=1, max_epochs=60, accumulate_grad_batches=2,
                     logger=logger2, progress_bar_refresh_rate=10, callbacks=[BackupCallback()])
trainer.fit(model, train_loader, val_loader)

In [None]:
trainer.test(test_dataloaders=test_loader)

## Thí nghiệm trên raw data có chunking

In [None]:
try:
    del train_set
    del val_set
    del train_loader
    del val_loader
except:
    print("No variable to delete")
gc.collect()
with torch.no_grad():
    torch.cuda.empty_cache()
pl.utilities.memory.garbage_collection_cuda()

In [None]:
train_set = AICOVIDDataset(train_meta, torch.nn.ModuleList([transform0_chunking]), stacking=True)
val_set = AICOVIDDataset(val_meta, torch.nn.ModuleList([transform0_chunking]), stacking=True)

In [None]:
batch_size = 16

train_loader = torch.utils.data.DataLoader(
    train_set,
    batch_size=batch_size,
    shuffle=True,
    collate_fn=collate_fn,
    num_workers=num_workers,
    pin_memory=pin_memory,
)

val_loader = torch.utils.data.DataLoader(
    val_set,
    batch_size=batch_size,
    shuffle=False,
    collate_fn=collate_fn,
    num_workers=num_workers,
    pin_memory=pin_memory,
)

### Thí nghiệm 3: thử data raw có chunking

In [None]:
model = AICOVIDModule(resnet, learning_rate=0.05)

Tạo logger có tên cố định

In [None]:
logger3 = pl.loggers.TensorBoardLogger(
                save_dir='.',
                version='train_60_epochs_exp3',
                name='lightning_logs'
                )

In [None]:
reset_weight(model)
trainer = pl.Trainer(gpus=1, max_epochs=60, accumulate_grad_batches=2,
                     logger=logger3, progress_bar_refresh_rate=10, callbacks=[BackupCallback()])
trainer.fit(model, train_loader, val_loader)

In [None]:
trainer.test(test_dataloaders=test_loader)

### Thí nghiệm 4: thử data raw có chunking và SpecAugment từng chunk

In [None]:
model.augment = SpecAugment(time_W=100, freq_W=50, F=50, T=50)

Tạo logger có tên cố định

In [None]:
logger4 = pl.loggers.TensorBoardLogger(
                save_dir='.',
                version='train_60_epochs_exp4',
                name='lightning_logs'
                )

In [None]:
reset_weight(model)
trainer = pl.Trainer(gpus=1, max_epochs=60, accumulate_grad_batches=2,
                     logger=logger4, progress_bar_refresh_rate=10, callbacks=[BackupCallback()])
trainer.fit(model, train_loader, val_loader)

In [None]:
trainer.test(test_dataloaders=test_loader)

## Thí nghiệm trên data noise inject không chunking

In [None]:
try:
    del train_set
    del val_set
    del train_loader
    del val_loader
except:
    print("No variable to delete")
gc.collect()
with torch.no_grad():
    torch.cuda.empty_cache()
pl.utilities.memory.garbage_collection_cuda()

In [None]:
train_set = AICOVIDDataset(train_meta, torch.nn.ModuleList([transform0,transform1,transform2,transform3]))
val_set = AICOVIDDataset(val_meta, torch.nn.ModuleList([transform0,transform1,transform2,transform3]))

In [None]:
batch_size = 16

train_loader = torch.utils.data.DataLoader(
    train_set,
    batch_size=batch_size,
    shuffle=True,
    collate_fn=collate_pad_seq_fn,
    num_workers=num_workers,
    pin_memory=pin_memory,
)

val_loader = torch.utils.data.DataLoader(
    val_set,
    batch_size=batch_size,
    shuffle=False,
    collate_fn=collate_pad_seq_fn,
    num_workers=num_workers,
    pin_memory=pin_memory,
)

### Thí nghiệm 5: thử data noise injection, không chunking

In [None]:
model = AICOVIDModule(resnet, learning_rate=0.05)

Tạo logger có tên cố định

In [None]:
logger5 = pl.loggers.TensorBoardLogger(
                save_dir='.',
                version='train_60_epochs_exp5',
                name='lightning_logs'
                )

In [None]:
reset_weight(model)
trainer = pl.Trainer(gpus=1, max_epochs=60, accumulate_grad_batches=2,
                     logger=logger5, progress_bar_refresh_rate=10, callbacks=[BackupCallback()])
trainer.fit(model, train_loader, val_loader)

In [None]:
trainer.test(test_dataloaders=test_loader)

### Thí nghiệm 6: thử data noise injection có SpecAugment, không chunking

In [None]:
model.augment = SpecAugment(time_W=100, freq_W=50, F=50, T=50)

Tạo logger có tên cố định

In [None]:
logger6 = pl.loggers.TensorBoardLogger(
                save_dir='.',
                version='train_60_epochs_exp6',
                name='lightning_logs'
                )

In [None]:
reset_weight(model)
trainer = pl.Trainer(gpus=1, max_epochs=60, accumulate_grad_batches=2,
                     logger=logger6, progress_bar_refresh_rate=10, callbacks=[BackupCallback()])
trainer.fit(model, train_loader, val_loader)

In [None]:
trainer.test(test_dataloaders=test_loader)

## Thí nghiệm trên data noise inject có chunking

In [None]:
try:
    del train_set
    del val_set
    del train_loader
    del val_loader
except:
    print("No variable to delete")
gc.collect()
with torch.no_grad():
    torch.cuda.empty_cache()
pl.utilities.memory.garbage_collection_cuda()

In [None]:
train_set = AICOVIDDataset(train_meta, torch.nn.ModuleList([transform0_chunking,transform1_chunking,transform2_chunking,transform3_chunking]), stacking=True)
val_set = AICOVIDDataset(val_meta, torch.nn.ModuleList([transform0_chunking,transform1_chunking,transform2_chunking,transform3_chunking]), stacking=True)

In [None]:
batch_size = 16

train_loader = torch.utils.data.DataLoader(
    train_set,
    batch_size=batch_size,
    shuffle=True,
    collate_fn=collate_fn,
    num_workers=num_workers,
    pin_memory=pin_memory,
)

val_loader = torch.utils.data.DataLoader(
    val_set,
    batch_size=batch_size,
    shuffle=False,
    collate_fn=collate_fn,
    num_workers=num_workers,
    pin_memory=pin_memory,
)

### Thí nghiệm 7: thử data noise injection, có chunking

In [None]:
model = AICOVIDModule(resnet, learning_rate=0.05)

Tạo logger có tên cố định

In [None]:
logger7 = pl.loggers.TensorBoardLogger(
                save_dir='.',
                version='train_60_epochs_exp7',
                name='lightning_logs'
                )

In [None]:
reset_weight(model)
trainer = pl.Trainer(gpus=1, max_epochs=60, accumulate_grad_batches=2,
                     logger=logger7, progress_bar_refresh_rate=10, callbacks=[BackupCallback()])
trainer.fit(model, train_loader, val_loader)

In [None]:
trainer.test(test_dataloaders=test_loader)

### Thí nghiệm 8: thử data noise injection có SpecAugment, có chunking

In [None]:
model.augment = SpecAugment(time_W=100, freq_W=50, F=50, T=50)

Tạo logger có tên cố định

In [None]:
logger8 = pl.loggers.TensorBoardLogger(
                save_dir='.',
                version='train_60_epochs_exp8',
                name='lightning_logs'
                )

In [None]:
reset_weight(model)
trainer = pl.Trainer(gpus=1, max_epochs=60, accumulate_grad_batches=2,
                     logger=logger8, progress_bar_refresh_rate=10, callbacks=[BackupCallback()])
trainer.fit(model, train_loader, val_loader)

In [None]:
trainer.test(test_dataloaders=test_loader)

# Đánh giá mô hình

## Tensorboard

In [None]:
%load_ext tensorboard
%tensorboard --logdir ./lightning_logs

## Xem thử các mẫu phân lớp sai

In [None]:
def number_of_correct(pred, target):
    # count number of correct predictions
    return pred.squeeze().eq(target).sum().item()


def get_likely_index(tensor):
    # find most likely label index for each element in the batch
    return tensor.argmax(dim=-1)


def test(model):
    correct = 0
    for data, target in test_loader:

        #data = data.to(device)
        #target = target.to(device)

        output = trainer.call_hook('forward', data)

        pred = get_likely_index(output)
        correct += number_of_correct(pred, target)

    print(f"Accuracy: {correct}/{len(test_loader.dataset)} ({100. * correct / len(test_loader.dataset):.0f}%)\n")

test(model)

In [None]:
correct = 0
preds = []
for i in range(len(train_meta)):
  x = read_MFCC_audio(train_meta['file_path'].iloc[i]).view(1,1,200,-1).cuda()
  y = torch.Tensor([train_meta['assessment_result'].iloc[i]]).long().cuda()
  output = trainer.call_hook('forward', x)

  pred = get_likely_index(output)
  preds += [pred.item()]
  correct += number_of_correct(pred, y)

correct/len(train_meta)

In [None]:
confusion_matrix(train_meta['assessment_result'], preds)

In [None]:
torch.Tensor([False, True, True]).nonzero()

In [None]:
correct = 0
preds = []
for i in range(len(train_meta)):
  x = read_MFCC_audio(train_meta['file_path'].iloc[i]).cuda()
  x = overlap_chunking(pad_tensor(x, 300), 300, 50, max_thresh=False).unsqueeze(1)
  y = torch.Tensor([train_meta['assessment_result'].iloc[i]]).long().cuda()
  output = trainer.call_hook('forward', x)

  pred = get_likely_index(output)
  pred = torch.mode(pred, 0)[0]
  preds += [pred.item()]
  correct += number_of_correct(pred, y)

correct/len(train_meta)

In [None]:
confusion_matrix(train_meta['assessment_result'], preds)

# Dự đoán trên test set để submit

In [None]:
preds = []
for i in range(len(private_test_meta)):
  x = read_MFCC_audio(private_test_meta['file_path'].iloc[i]).view(1,1,200,-1).cuda()
  output = trainer.call_hook('forward', x)

  pred = get_likely_index(output)
  preds += [pred.item()]
np.array(preds)

# Lưu kết quả

In [None]:
#@markdown Lưu lại model lên Google Drive
if train_mode:
  os.system('mkdir trained_models')
  compressed_name = f'{zip_name}_model.zip'
  torch.save(model.state_dict(), './trained_models/model_weights.pth')
  
  os.system(f'zip -j ./{compressed_name} ./trained_models/*')
  driveUpload(compressed_name, model_zoo)

In [None]:
#@markdown Lưu lại public test submission lên Google Drive
submit_df = pd.DataFrame({'uuid': test_meta['uuid'],
                          'assessment_result': preds})
submit_df.to_csv('results.csv', index=False)

# Nén file
os.system(f'zip -j ./{zip_name}.zip ./results.csv')

driveUpload(zip_name+'.zip', submission_folder)

In [None]:
#@markdown Lưu lại private test submission lên Google Drive
submit_df = pd.DataFrame({'uuid': private_test_meta['uuid'],
                          'assessment_result': preds})
submit_df.to_csv('results.csv', index=False)

# Nén file
os.system(f'zip -j ./{zip_name}_private_test.zip ./results.csv')

driveUpload(zip_name+'_private_test.zip', submission_folder)

In [None]:
#@markdown Load model lưu sẵn
if not train_mode:
  driveDownload(f'{zip_name}_model.zip', model_zoo)
  os.system(f'unzip -o {zip_name}_model.zip')
  model.load_state_dict(torch.load('model_weights.pth'))