### Model training notebook for **EfficientSpeech: An On-Device Text to Speech Model**
This goal of this notebook is to streamline training new models for EfficientSpeech.   
  

#### Links
Official repository: https://github.com/roatienza/efficientspeech  
Paper: https://ieeexplore.ieee.org/abstract/document/10094639



Mount drive

In [1]:
from google.colab import drive

drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Configuration options
####Dataset parameters
* dataset_name: the name of the dataset
* dataset_location: folder path to dataset files
* config_dir: location of configuration files
* checkpoints_dir: location of checkpoint folder containing .ckpt files 
* output_dir: folder path to save all generated .ckpt files

####Model training options
* accelerator: One of `cpu`, `gpu`, `tpu`, `cuda`, `auto`
* devices: Per pytorch_lightning documentation - Will be mapped to either `gpus`, `tpu_cores`, `num_processes` or `ipus`, based on the accelerator type.
* model_size_to_train: One of 'base', 'small', 'tiny'. Determines the command line options to start train.py with.
* resume_from_checkpoint: If not None, ensure the .ckpt model type matches the model_size_to_train option
* max_epochs: The officially published weights were trained to 5000 epochs 

In [27]:
# Dataset parameters
dataset_name = 'dataset' #@param {type:"string"}

dataset_location = '/content/drive/MyDrive/dataset' #@param {type:"string"}

config_dir = '/content/efficientspeech/config' #@param {type:"string"}

checkpoints_dir = '/content/efficientspeech/checkpoints' #@param {type:"string"}

output_dir = '/content/drive/MyDrive/saved_checkpoints' #@param {type:"string"}

dataset_config_dir = f'{config_dir}/{dataset_name}'
!mkdir $dataset_config_dir


# Model training Options
cmd_line_opts = ''

# Accelerator is TPU for Colab
accelerator = 'tpu' #@param {type:'string'} ['tpu', 'cuda', 'auto']
cmd_line_opts += f' --accelerator {accelerator}'

# Devices
devices = 8 #@param {type:'integer'}
cmd_line_opts += f' --devices {devices}'

# Cmd line opts for training different size models 
model_size_to_train = "base" #@param ["tiny", "small", "base"]
match (model_size_to_train):
  case "small":
    cmd_line_opts += ' --n-blocks 3 --reduction 2'
  case "base":
    cmd_line_opts += ' --head 2 --reduction 1 --expansion 2  --kernel-size 5 --n-blocks 3 --block-depth 3'
  case _: #tiny
    pass

# Max epochs
max_epochs = 10000 #@param {type:"integer"}
cmd_line_opts += f' --max_epochs {max_epochs}'

# Inference device
infer_device = 'cpu' if accelerator == 'tpu' or accelerator == 'cpu' else 'cuda'
cmd_line_opts += f' --infer-device {infer_device}'


!echo Command line arguments: $cmd_line_opts

mkdir: cannot create directory ‘/content/efficientspeech/config/dataset’: File exists
Command line arguments: --accelerator tpu --devices 8 --head 2 --reduction 1 --expansion 2 --kernel-size 5 --n-blocks 3 --block-depth 3 --max_epochs 10000 --infer-device cpu


# Setup dependencies


In [4]:
# Delete existing
# !rm -rf /content/efficientspeech

# Clone repository (Note: this is my fork with additional training options)
!git clone https://github.com/v0xie/efficientspeech

# Checkout training branch
%cd /content/efficientspeech
!git checkout training

# Download model files
!mkdir /content/efficientspeech/checkpoints
!wget --show-progress --continue -O /content/efficientspeech/checkpoints/base_eng_4M.ckpt  https://github.com/roatienza/efficientspeech/releases/download/icassp2023/base_eng_4M.ckpt 
!wget --show-progress --continue -O /content/efficientspeech/checkpoints/small_eng_952k.ckpt  https://github.com/roatienza/efficientspeech/releases/download/icassp2023/small_eng_952k.ckpt
!wget --show-progress --continue -O /content/efficientspeech/checkpoints/tiny_eng_266k.ckpt  https://github.com/roatienza/efficientspeech/releases/download/icassp2023/tiny_eng_266k.ckpt 

Cloning into 'efficientspeech'...
remote: Enumerating objects: 141, done.[K
remote: Counting objects: 100% (42/42), done.[K
remote: Compressing objects: 100% (37/37), done.[K
remote: Total 141 (delta 15), reused 10 (delta 2), pack-reused 99[K
Receiving objects: 100% (141/141), 4.86 MiB | 13.70 MiB/s, done.
Resolving deltas: 100% (47/47), done.
/content/efficientspeech
Branch 'training' set up to track remote branch 'training' from 'origin'.
Switched to a new branch 'training'
--2023-05-13 23:56:07--  https://github.com/roatienza/efficientspeech/releases/download/icassp2023/base_eng_4M.ckpt
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/483135884/534649d3-be57-4dcd-88c4-fa0e9fbf0fd4?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230513%2Fus-east-1%2

In [29]:
# https://pytorch-lightning.readthedocs.io/en/1.2.10/advanced/tpu.html#tpu-terminology
!pip install cloud-tpu-client==0.10 # https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl
!pip install torch-xla --index-url https://pip.repos.neuron.amazonaws.com

# Install requirements
!pip install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torch_xla
  Downloading torch_xla-1.0-py3-none-any.whl (1.4 kB)
Installing collected packages: torch_xla
Successfully installed torch_xla-1.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Run inference to test


In [7]:
!python demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/icassp2023/tiny_eng_266k.ckpt \
  --infer-device cpu --text "the quick brown fox jumps over the lazy dog" --wav-filename fox.wav

  warn_missing_pkg("wandb")
  "lr_options": generate_power_seq(LEARNING_RATE_CIFAR, 11),
  contrastive_task: Union[FeatureMapContrastiveTask] = FeatureMapContrastiveTask("01, 02, 11"),
  self.nce_loss = AmdimNCELoss(tclip)
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package cmudict to /root/nltk_data...
[nltk_data]   Unzipping corpora/cmudict.zip.
100% 6.76M/6.76M [00:00<00:00, 81.1MB/s]
Removing weight norm...
Removing weight norm...


Make a new config (optional)

In [12]:
pp_config = f"""
dataset: "{dataset_name}"
	
path:
  corpus_path: "{dataset_location}/corpus"
  lexicon_path: "/content/efficientspeech/lexicon/librispeech-lexicon.txt"
  raw_path: "{dataset_location}/raw_data"
  preprocessed_path: "{dataset_location}/preprocessed_data"

preprocessing:
  val_size: 64
  text:
    text_cleaners: ["english_cleaners"]
    language: "en"
    max_length: 4096
  audio:
    sampling_rate: 22050
    max_wav_value: 32768.0
  stft:
    filter_length: 1024
    hop_length: 256
    win_length: 1024
  mel:
    n_mel_channels: 80
    mel_fmin: 0
    mel_fmax: 8000 # please set to 8000 for HiFi-GAN vocoder, set to null for MelGAN vocoder
  pitch:
    feature: "phoneme_level" # support 'phoneme_level' or 'frame_level'
    normalization: True
  energy:
    feature: "phoneme_level" # support 'phoneme_level' or 'frame_level'
    normalization: True
"""

# Write config to file
with open(f'{dataset_config_dir}/preprocess.yaml', mode='w') as f:
  f.write(pp_config)


#Train a model 
### Run training

In [39]:
%cd /content/efficientspeech/
!git pull

/content/efficientspeech
remote: Enumerating objects: 4, done.[K
remote: Counting objects: 100% (4/4), done.[K
remote: Compressing objects: 100% (2/2), done.[K
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0[K
Unpacking objects: 100% (3/3), 1.28 KiB | 1.28 MiB/s, done.
From https://github.com/v0xie/efficientspeech
   80c4b27..add4ac7  training   -> origin/training
Updating 80c4b27..add4ac7
Fast-forward
 train_tpu.py | 83 [32m++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++[m
 1 file changed, 83 insertions(+)
 create mode 100644 train_tpu.py


In [40]:
# Train
%cd /content/efficientspeech/

!python /content/efficientspeech/train_tpu.py $cmd_line_opts

/content/efficientspeech
  warn_missing_pkg("wandb")
  "lr_options": generate_power_seq(LEARNING_RATE_CIFAR, 11),
  contrastive_task: Union[FeatureMapContrastiveTask] = FeatureMapContrastiveTask("01, 02, 11"),
  self.nce_loss = AmdimNCELoss(tclip)
Removing weight norm...
  rank_zero_deprecation(
  rank_zero_warn(
  rank_zero_warn(
GPU available: False, used: False
TPU available: True, using: 8 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Exception in device=TPU:0: [Errno 2] No such file or directory: './preprocessed_data/LJSpeech/train.txt'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 331, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/usr/local/lib/python3.10/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 325, in _start_fn
    fn(gindex, *args)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/laun

### *IMPORTANT* - Copy all checkpoints to your drive

In [None]:
import os
import shutil

source_dir = '/content/efficientspeech/lightning_logs'

# Create the target directory if it doesn't exist
if not os.path.exists(output_dir):
   os.makedirs(output_dir)

# Iterate through all subdirectories in the source directory
for root, dirs, files in os.walk(source_dir):
  for file in files:
    # Check if the file is a checkpoint file
    if file.endswith('.ckpt'):
        source_file_path = os.path.join(root, file)
        target_file_path = os.path.join(output_dir, file)

      # Copy the file to the target directory if it doesn't exist
    if not os.path.exists(target_file_path):
        shutil.copy2(source_file_path, target_file_path)

### Monitor training with Tensorboard

In [None]:
# Load the TensorBoard notebook extension
%load_ext TensorBoard

%tensorboard --logdir /content/efficientspeech/lightning_logs/