### Model training notebook for **EfficientSpeech: An On-Device Text to Speech Model**
This goal of this notebook is to streamline training new models for EfficientSpeech.   
  

#### Links
Official repository: https://github.com/roatienza/efficientspeech  
Paper: https://ieeexplore.ieee.org/abstract/document/10094639



Mount drive

In [78]:
from google.colab import drive

drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Configuration options
Dataset parameters
* dataset_name: the name of the dataset
* dataset_location: folder path to dataset files
* config_dir: location of configuration files
* checkpoints_dir: location of checkpoint folder containing .ckpt files 
* output_dir: folder path to save all generated .ckpt files

In [79]:
dataset_name = 'dataset' #@param {type:"string"}

dataset_location = '/content/drive/MyDrive/dataset' #@param {type:"string"}

config_dir = '/content/efficientspeech/config' #@param {type:"string"}

checkpoints_dir = '/content/efficientspeech/checkpoints' #@param {type:"string"}

output_dir = '/content/drive/MyDrive/saved_checkpoints' #@param {type:"string"}

dataset_config_dir = f'{config_dir}/{dataset_name}'
!mkdir $dataset_config_dir

mkdir: cannot create directory ‘/content/efficientspeech/config/dataset’: File exists


Model training options
* accelerator: One of `cpu`, `gpu`, `tpu`, `cuda`, `auto`
* devices: Per pytorch_lightning documentation - Will be mapped to either `gpus`, `tpu_cores`, `num_processes` or `ipus`, based on the accelerator type.
* model_size_to_train: One of 'base', 'small', 'tiny'. Determines the command line options to start train.py with.
* resume_from_checkpoint: If not None, ensure the .ckpt model type matches the model_size_to_train option
* max_epochs: The officially published weights were trained to 5000 epochs 

In [83]:
# Accelerator is TPU for Colab
accelerator = 'tpu' #@param {type:'string'} ['tpu', 'cuda', 'auto']

# Devices
devices = 1 #@param {type:'integer'}

# Cmd line opts for training different size models 
model_size_to_train = "base" #@param ["tiny", "small", "base"]
match (model_size_to_train):
  case "small":
    cmd_line_opts = '--n-blocks 3 --reduction 2'
  case "base":
    cmd_line_opts = '--head 2 --reduction 1 --expansion 2  --kernel-size 5 --n-blocks 3 --block-depth 3'
  case _: #tiny
    cmd_line_opts = ''

# Path to a .ckpt checkpoint to resume from
resume_from_checkpoint = "/content/efficientspeech/checkpoints/base_eng_4M.ckpt" #@param {type:'string'} \ ['None','/content/efficientspeech/checkpoints/base_eng_4M.ckpt','/content/efficientspeech/checkpoints/small_eng_952k.ckpt','/content/efficientspeech/checkpoints/tiny_eng_266k.ckpt']
if resume_from_checkpoint != "None":
  cmd_line_opts += f' --resume-from-checkpoint {resume_from_checkpoint}'

# Pretrained checkpoints stop at 5000
max_epochs = 10000 #@param {type:"integer"}
cmd_line_opts += f' --max_epochs {max_epochs}'


!echo Command line arguments: $cmd_line_opts


Command line arguments: --head 2 --reduction 1 --expansion 2 --kernel-size 5 --n-blocks 3 --block-depth 3 --resume-from-checkpoint /content/efficientspeech/checkpoints/base_eng_4M.ckpt --max_epochs 10000


# Setup dependencies


In [11]:
# Clone repository (Note: this is my fork with additional training options)
!git clone https://github.com/v0xie/efficientspeech

# Checkout training branch
%cd /content/efficientspeech
!git checkout training

# Download model files
!mkdir /content/efficientspeech/checkpoints
!wget --show-progress --continue -O /content/efficientspeech/checkpoints/base_eng_4M.ckpt  https://github.com/roatienza/efficientspeech/releases/download/icassp2023/base_eng_4M.ckpt 
!wget --show-progress --continue -O /content/efficientspeech/checkpoints/small_eng_952k.ckpt  https://github.com/roatienza/efficientspeech/releases/download/icassp2023/small_eng_952k.ckpt
!wget --show-progress --continue -O /content/efficientspeech/checkpoints/tiny_eng_266k.ckpt  https://github.com/roatienza/efficientspeech/releases/download/icassp2023/tiny_eng_266k.ckpt 

# Install requirements
!pip install -r requirements.txt

Cloning into 'efficientspeech'...
remote: Enumerating objects: 137, done.[K
remote: Counting objects: 100% (38/38), done.[K
remote: Compressing objects: 100% (33/33), done.[K
remote: Total 137 (delta 13), reused 9 (delta 3), pack-reused 99[K
Receiving objects: 100% (137/137), 4.85 MiB | 14.83 MiB/s, done.
Resolving deltas: 100% (45/45), done.
/content/efficientspeech
Branch 'training' set up to track remote branch 'training' from 'origin'.
Switched to a new branch 'training'
--2023-05-13 20:29:02--  https://github.com/roatienza/efficientspeech/releases/download/icassp2023/base_eng_4M.ckpt
Resolving github.com (github.com)... 140.82.121.3
Connecting to github.com (github.com)|140.82.121.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/483135884/534649d3-be57-4dcd-88c4-fa0e9fbf0fd4?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230513%2Fus-east-1%2F

Run inference to test


In [None]:
!python demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/icassp2023/tiny_eng_266k.ckpt \
  --infer-device cpu --text "the quick brown fox jumps over the lazy dog" --wav-filename fox.wav

Make a new config (optional)

In [81]:
pp_config = f"""
dataset: "{dataset_name}"
	
path:
  corpus_path: "{dataset_location}/corpus"
  lexicon_path: "/content/efficientspeech/lexicon/librispeech-lexicon.txt"
  raw_path: "{dataset_location}/raw_data"
  preprocessed_path: "{dataset_location}/preprocessed_data"

preprocessing:
  val_size: 64
  text:
    text_cleaners: ["english_cleaners"]
    language: "en"
    max_length: 4096
  audio:
    sampling_rate: 22050
    max_wav_value: 32768.0
  stft:
    filter_length: 1024
    hop_length: 256
    win_length: 1024
  mel:
    n_mel_channels: 80
    mel_fmin: 0
    mel_fmax: 8000 # please set to 8000 for HiFi-GAN vocoder, set to null for MelGAN vocoder
  pitch:
    feature: "phoneme_level" # support 'phoneme_level' or 'frame_level'
    normalization: True
  energy:
    feature: "phoneme_level" # support 'phoneme_level' or 'frame_level'
    normalization: True
"""

# Write config to file
with open(f'{dataset_config_dir}/preprocess.yaml', mode='w') as f:
  f.write(pp_config)


#Train a model 
### Run training

In [84]:
# Train
%cd /content/efficientspeech/

!python train.py $cmd_line_opts

/content/efficientspeech
  warn_missing_pkg("wandb")
  "lr_options": generate_power_seq(LEARNING_RATE_CIFAR, 11),
  contrastive_task: Union[FeatureMapContrastiveTask] = FeatureMapContrastiveTask("01, 02, 11"),
  self.nce_loss = AmdimNCELoss(tclip)
Traceback (most recent call last):
  File "/content/efficientspeech/train.py", line 58, in <module>
    pl_module = EfficientFSModule(preprocess_config=preprocess_config, lr=args.lr,
  File "/content/efficientspeech/model.py", line 79, in __init__
    self.hifigan = get_hifigan(checkpoint=hifigan_checkpoint,
  File "/content/efficientspeech/model.py", line 32, in get_hifigan
    ckpt = torch.load(checkpoint)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 795, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1012, in _legacy_load
    result = unpickler.load()
  File "/usr/lib/python3.10/pickl

### *IMPORTANT* - Copy all checkpoints to your drive

In [74]:
import os
import shutil

source_dir = '/content/efficientspeech/lightning_logs'

# Create the target directory if it doesn't exist
if not os.path.exists(output_dir):
   os.makedirs(output_dir)

# Iterate through all subdirectories in the source directory
for root, dirs, files in os.walk(source_dir):
  for file in files:
    # Check if the file is a checkpoint file
    if file.endswith('.ckpt'):
        source_file_path = os.path.join(root, file)
        target_file_path = os.path.join(output_dir, file)

      # Copy the file to the target directory if it doesn't exist
    if not os.path.exists(target_file_path):
        shutil.copy2(source_file_path, target_file_path)

### Monitor training with Tensorboard

In [None]:
# Load the TensorBoard notebook extension
%load_ext TensorBoard

%tensorboard --logdir /content/efficientspeech/lightning_logs/