<a href="https://colab.research.google.com/github/usamireko/WFL-ASR/blob/main/training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Structure Guide

### Zip file
___

```
your_zip.zip
├── folder1
│   ├── audio.wav
│   └── audio.lab
│   └── ...
├── folder2
│   ├── audio.wav
│   └── audio.lab
│   └── ...
├── folder3
│   ├── audio.wav
│   └── audio.lab
│   └── ...
...
```

> Folder names will be used as language name. For language code, look at langs.txt you got after preprocessing

- Both `.wav` and `.lab` filenames should match within each folder (example: `song.wav` and `song.lab`)
___

### Example

```
(multi-lang)
data.zip
├── JPN
│   ├── sample_001.wav
│   └── sample_001.lab
│   └── ...
├── ENG
│   ├── sample_001.wav
│   └── sample_001.lab
│   └── ...
```

```
(single-lang)
data.zip
├── JPN
│   ├── sample_001.wav
│   └── sample_001.lab
│   └── ...
```

Please try to avoid bad labels like trash phonemes or labels, there's a risk of them getting predicted!


# Training

In [None]:
#@title # Setup
#```pip install torch torchaudio soundfile transformers torchcrf matplotlib tqdm pyyaml tensorboard```
!rm -rf /content/sample_data

from google.colab import drive
drive.mount('/content/drive')
import os
os.environ["HF_HUB_DOWNLOAD_TIMEOUT"] = "60"

%cd /content
!git clone https://github.com/MLo7Ghinsan/WFL-ASR.git
!pip install pytorch_optimizer

In [None]:
#@title # Extract data



%cd /content


!apt-get install -y p7zip-full
!mkdir /content/training_dataset

data_zip = "" #@param {type:"string"}
#!7z x /content/drive/MyDrive/WFL_Training_Kit/long_data.zip -o/content/training_dataset
#!7z x /content/drive/MyDrive/WFL_Training_Kit/Neiro_only.zip -o/content/training_dataset
#!7z x /content/drive/MyDrive/WFL_Finetuning_data.zip -o/content/training_dataset
!7z x {data_zip} -o/content/training_dataset

In [None]:
#@title # Edit Config
# copied from @HAI-D, im too lazy to make this, ty <333

data_dir = "/content/training_dataset"
#@markdown # Model
#@markdown ___
encoder_type = "whisper" #@param ["whisper", "wavlm"]
whisper_model = "small" #@param ["tiny", "base", "small", "medium", "large"]
whisper_model_path = "openai/whisper-" + whisper_model
wavlm_model = "base" #@param ["base", "base-sd", "base-sv", "base-plus", "base-plus-sd", "base-plus-sv", "wavlm-large"]
wavlm_model_path = "microsoft/wavlm-" + wavlm_model
#@markdown <font size="-1.5"> False to finetune the encoder, True to not finetune
freeze_encoder = False #@param {type:"boolean"}


#@markdown # Training
#@markdown ___
#@markdown <font size="-1.5"> Batch size finally working!
#@markdown <font size="-1.5"> Prodigy optimizer set as default, lr requires to be 1.0, change if using other optimizer manually
batch_size = 16 #@param {type: "integer"}
num_workers = 4 #@param {type: "integer"}
learning_rate = 1 #@param {type: "number"}
#@markdown <font size="-1.5"> learning rate decay gamma every val_check_interval (set to 1 for Prodigy)
lr_decay_gamma = 1 #@param {type: "number"}
#@markdown <font size="-1.5"> prevent `overconfident` labels
label_smoothing = 0.1 #@param {type: "number"}
#markdown <font size="-1.5"> path to logs folder for tensorboard
#log_dir = "/content/drive/MyDrive/WFL/Model/logs" #@param {type: "string"}
#markdown <font size="-1.5"> path to checkpoints folder
#@markdown # Finetuning
#@markdown ___
#@markdown <font size="-1.5"> enabling WLF finetuning
finetune_enbale = False #@param {type:"boolean"}
#@markdown <font size="-1.5"> path to finetune model
finetune_model_path = "" #@param {type: "string"}
#@markdown # Saving
#@markdown ___
#@markdown <font size="-1.5"> path to save folder
save_dir = "" #@param {type: "string"}
#@markdown <font size="-1.5"> training stop point
max_steps = 100000 # @param {"type":"slider","min":1000,"max":1500000,"step":1000}
#@markdown <font size="-1.5"> validation/saving interval
val_check_interval = 500 # @param {"type":"slider","min":100,"max":10000,"step":100}

if not save_dir:
    raise ValueError("save_dir is not set, please set a saving directory")
import re
import os
import yaml

log_dir = os.path.join(save_dir, "logs")

with open("/content/WFL-ASR/config.yaml", "r") as config:
    wfl_config = yaml.safe_load(config)
#data
wfl_config["data"]["data_dir"] = data_dir
#model
wfl_config["model"]["encoder_type"] = encoder_type
wfl_config["model"]["whisper_model"] = whisper_model_path
wfl_config["model"]["wavlm_model"] = wavlm_model_path
wfl_config["model"]["freeze_encoder"] = freeze_encoder
#training
wfl_config["training"]["batch_size"] = batch_size
wfl_config["training"]["num_workers"] = num_workers
wfl_config["training"]["learning_rate"] = learning_rate
wfl_config["training"]["lr_decay_gamma"] = lr_decay_gamma
wfl_config["training"]["label_smoothing"] = label_smoothing
wfl_config["training"]["max_steps"] = max_steps
wfl_config["training"]["val_check_interval"] = val_check_interval
wfl_config["training"]["log_dir"] = log_dir
#finetuning
wfl_config["finetuning"]["enable"] = finetune_enbale
wfl_config["finetuning"]["model_path"] = finetune_model_path
#output
wfl_config["output"]["save_dir"] = save_dir

with open("/content/WFL-ASR/config.yaml", "w") as config:
    yaml.dump(wfl_config, config)

In [None]:
%cd /content/WFL-ASR
#@title # Preprocess
!python preprocess.py

In [None]:
#@title # Training
#@markdown Input the config you got after preprocessing
config_path = "" # @param {"type":"string"}

with open(config_path, "r") as config:
    wfl_config = yaml.safe_load(config)

log_dir = wfl_config["training"]["log_dir"]

%load_ext tensorboard
%tensorboard --logdir {log_dir}
%cd /content/WFL-ASR
!python train.py {config_path}