### Model training notebook for **EfficientSpeech: An On-Device Text to Speech Model**
This goal of this notebook is to streamline training new models for EfficientSpeech. Please use the preprocess_dataset notebook for dataset preparation
  

#### Links
Official EfficientSpeech repository: https://github.com/roatienza/efficientspeech  
Paper: https://ieeexplore.ieee.org/abstract/document/10094639



Mount drive

In [1]:
from google.colab import drive

drive.mount("/content/drive")

Mounted at /content/drive


### Unzip Preprocessed dataset

# Configuration options
#### Dataset parameters
* dataset_name: the name of the dataset
* dataset_location: folder path to dataset files
* config_dir: location of configuration files
* checkpoints_dir: location of checkpoint folder containing .ckpt files 
* output_dir: folder path to save all generated .ckpt files

#### Model training options
* accelerator: One of `cpu`, `gpu`, `tpu`, `cuda`, `auto` (Only GPU works for now)
* devices: Per pytorch_lightning documentation - Will be mapped to either `gpus`, `tpu_cores`, `num_processes` or `ipus`, based on the accelerator type.
* model_size_to_train: One of 'base', 'small', 'tiny'. Determines the command line options to start train.py with.
* resume_from_checkpoint: If not None, ensure the .ckpt model type matches the model_size_to_train option
* max_epochs: The officially published weights were trained to 5000 epochs 

In [2]:
# Dataset parameters
dataset_name = 'MyDataset' #@param {type:"string"}

dataset_location = '/content/drive/MyDrive/MyDataset' #@param {type:"string"}

config_dir = '/content/efficientspeech/config' #@param {type:"string"}

checkpoints_dir = '/content/efficientspeech/checkpoints' #@param {type:"string"}

output_dir = '/content/drive/MyDrive/saved_checkpoints' #@param {type:"string"}

dataset_config_dir = f'{config_dir}/{dataset_name}'
!mkdir $dataset_config_dir


# Model training Options
cmd_line_opts = ''

# Accelerator is TPU for Colab
accelerator = 'cuda' #@param {type:'string'} ['tpu', 'cuda', 'auto']
cmd_line_opts += f' --accelerator {accelerator}'

# Accelerator is TPU for Colab
preprocess_cfg = 'con'
cmd_line_opts += f' --accelerator {accelerator}'

# Num Workers
num_workers = 4 #@param {type:'integer'}
cmd_line_opts += f' --num_workers {num_workers}'

# Devices
devices = 1 #@param {type:'integer'}
cmd_line_opts += f' --devices {devices}'

# Cmd line opts for training different size models 
model_size_to_train = "base" #@param ["tiny", "small", "base"]
match (model_size_to_train):
  case "small":
    cmd_line_opts += ' --n-blocks 3 --reduction 2'
  case "base":
    cmd_line_opts += ' --head 2 --reduction 1 --expansion 2  --kernel-size 5 --n-blocks 3 --block-depth 3'
  case _: #tiny
    pass

# Max epochs
max_epochs = 5000 #@param {type:"integer"}
cmd_line_opts += f' --max_epochs {max_epochs}'

# # Inference device
# infer_device = 'cpu' if accelerator == 'tpu' or accelerator == 'cpu' else 'cuda'
# cmd_line_opts += f' --infer-device {infer_device}'

# Precision
precision = '16-mixed'
cmd_line_opts += f' --precision {precision}'

# Batch size (128 is default)
batch_size = 128 #@param [16, 32, 64, 128]
cmd_line_opts += f' --batch-size {batch_size}'

# Resume from checkpoint path
resume_from_checkpoint = '' #@param {type:'string'}
if len(resume_from_checkpoint) > 0:
  cmd_line_opts += f' --resume-from-checkpoint {resume_from_checkpoint}'

# GPU only arguments
if accelerator == 'cuda' or accelerator == 'gpu':
  cmd_line_opts += f' --pin-memory --persistent-workers'

!echo Command line arguments: $cmd_line_opts

mkdir: cannot create directory ‘/content/efficientspeech/config/dataset’: No such file or directory
Command line arguments: --accelerator tpu --num_workers 2 --devices 8 --head 2 --reduction 1 --expansion 2 --kernel-size 5 --n-blocks 3 --block-depth 3 --max_epochs 10000 --infer-device cpu --precision 16 --batch-size 16


# Setup dependencies


In [3]:
# Delete existing
# !rm -rf /content/efficientspeech

# Clone repository (Note: this is my fork with additional training options)
!git clone https://github.com/v0xie/efficientspeech

# Checkout training branch
%cd /content/efficientspeech
!git checkout feature/faster-training
#!git pull

# Download model files
!mkdir /content/efficientspeech/checkpoints
!wget --show-progress --continue -O /content/efficientspeech/checkpoints/base_eng_4M.ckpt  https://github.com/roatienza/efficientspeech/releases/download/icassp2023/base_eng_4M.ckpt 
!wget --show-progress --continue -O /content/efficientspeech/checkpoints/small_eng_952k.ckpt  https://github.com/roatienza/efficientspeech/releases/download/icassp2023/small_eng_952k.ckpt
!wget --show-progress --continue -O /content/efficientspeech/checkpoints/tiny_eng_266k.ckpt  https://github.com/roatienza/efficientspeech/releases/download/icassp2023/tiny_eng_266k.ckpt 

Cloning into 'efficientspeech'...
remote: Enumerating objects: 155, done.[K
remote: Counting objects: 100% (56/56), done.[K
remote: Compressing objects: 100% (45/45), done.[K
remote: Total 155 (delta 22), reused 21 (delta 6), pack-reused 99[K
Receiving objects: 100% (155/155), 4.87 MiB | 17.24 MiB/s, done.
Resolving deltas: 100% (54/54), done.
/content/efficientspeech
Branch 'training' set up to track remote branch 'training' from 'origin'.
Switched to a new branch 'training'
Already up to date.
--2023-05-14 00:59:47--  https://github.com/roatienza/efficientspeech/releases/download/icassp2023/base_eng_4M.ckpt
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/483135884/534649d3-be57-4dcd-88c4-fa0e9fbf0fd4?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20

In [None]:
# https://pytorch-lightning.readthedocs.io/en/1.2.10/advanced/tpu.html#tpu-terminology
!pip install cloud-tpu-client==0.10 # https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl
!pip install torch-xla --index-url https://pip.repos.neuron.amazonaws.com
!pip install wandb

# Install requirements
!pip install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting cloud-tpu-client==0.10
  Downloading cloud_tpu_client-0.10-py3-none-any.whl (7.4 kB)
Collecting google-api-python-client==1.8.0 (from cloud-tpu-client==0.10)
  Downloading google_api_python_client-1.8.0-py3-none-any.whl (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.7/57.7 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
Collecting google-api-core<2dev,>=1.13.0 (from google-api-python-client==1.8.0->cloud-tpu-client==0.10)
  Downloading google_api_core-1.34.0-py3-none-any.whl (120 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m120.2/120.2 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
Collecting uritemplate<4dev,>=3.0.0 (from google-api-python-client==1.8.0->cloud-tpu-client==0.10)
  Downloading uritemplate-3.0.1-py2.py3-none-any.whl (15 kB)
Installing collected packages: uritemplate, google-api-core, google-api-python-client, cloud-t

Looking in indexes: https://pip.repos.neuron.amazonaws.com, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torch-xla
  Downloading https://pip.repos.neuron.amazonaws.com/torch-xla/torch_xla-1.13.1%2Btorchneuron6-cp310-cp310-linux_x86_64.whl (267.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m267.7/267.7 MB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torch-xla
Successfully installed torch-xla-1.13.1+torchneuron6
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wandb
  Downloading wandb-0.15.2-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
Collecting GitPython!=3.1.29,>=1.0.0 (from wandb)
  Downloading GitPython-3.1.31-py3-none-any.whl (184 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/184.3 kB[0m [31m17.7 MB/s[0m eta [36m0:00:00[0m

Run inference to test


In [7]:
!python demo.py --checkpoint https://github.com/roatienza/efficientspeech/releases/download/icassp2023/tiny_eng_266k.ckpt \
  --infer-device cpu --text "the quick brown fox jumps over the lazy dog" --wav-filename fox.wav

  warn_missing_pkg("wandb")
  "lr_options": generate_power_seq(LEARNING_RATE_CIFAR, 11),
  contrastive_task: Union[FeatureMapContrastiveTask] = FeatureMapContrastiveTask("01, 02, 11"),
  self.nce_loss = AmdimNCELoss(tclip)
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package cmudict to /root/nltk_data...
[nltk_data]   Unzipping corpora/cmudict.zip.
100% 6.76M/6.76M [00:00<00:00, 81.1MB/s]
Removing weight norm...
Removing weight norm...


Make a new config (optional)

In [12]:
pp_config = f"""
dataset: "{dataset_name}"
	
path:
  corpus_path: "{dataset_location}/corpus"
  lexicon_path: "/content/efficientspeech/lexicon/librispeech-lexicon.txt"
  raw_path: "{dataset_location}/raw_data"
  preprocessed_path: "{dataset_location}/preprocessed_data"

preprocessing:
  val_size: 64
  text:
    text_cleaners: ["english_cleaners"]
    language: "en"
    max_length: 4096
  audio:
    sampling_rate: 22050
    max_wav_value: 32768.0
  stft:
    filter_length: 1024
    hop_length: 256
    win_length: 1024
  mel:
    n_mel_channels: 80
    mel_fmin: 0
    mel_fmax: 8000 # please set to 8000 for HiFi-GAN vocoder, set to null for MelGAN vocoder
  pitch:
    feature: "phoneme_level" # support 'phoneme_level' or 'frame_level'
    normalization: True
  energy:
    feature: "phoneme_level" # support 'phoneme_level' or 'frame_level'
    normalization: True
"""

# Write config to file
with open(f'{dataset_config_dir}/preprocess.yaml', mode='w') as f:
  f.write(pp_config)


#Train a model 
### Run training

In [50]:
# Temporary workaround for trainer wanting relative dirs
%mkdir -p /content/efficientspeech/raw_data/LJSpeech

%cp -R /content/drive/MyDrive/dataset/preprocessed_data/* /content/efficientspeech/preprocessed_data/LJSpeech/
%cp -R /content/drive/MyDrive/dataset/raw_data/* /content/efficientspeech/raw_data/LJSpeech/

In [65]:
# Train
%cd /content/efficientspeech/

!python /content/efficientspeech/train_tpu.py $cmd_line_opts

/content/efficientspeech
  warn_missing_pkg("wandb")
  "lr_options": generate_power_seq(LEARNING_RATE_CIFAR, 11),
  contrastive_task: Union[FeatureMapContrastiveTask] = FeatureMapContrastiveTask("01, 02, 11"),
  self.nce_loss = AmdimNCELoss(tclip)
Removing weight norm...
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/lightning_lite/accelerators/tpu.py", line 93, in wrapper
    return queue.get_nowait()
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 135, in get_nowait
    return self.get(False)
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 116, in get
    raise Empty
_queue.Empty
Traceback (most recent call last):
  File "/content/efficientspeech/train_tpu.py", line 73, in <module>
    trainer = Trainer(accelerator=args.accelerator, 
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/argparse.py", line 340, in insert_env_defaults
    return fn(self, **kwargs)
  File "/usr/local/lib/python3.10/dist-

### *IMPORTANT* - Copy all checkpoints to your drive

In [None]:
import os
import shutil

source_dir = '/content/efficientspeech/lightning_logs'

# Create the target directory if it doesn't exist
if not os.path.exists(output_dir):
   os.makedirs(output_dir)

# Iterate through all subdirectories in the source directory
for root, dirs, files in os.walk(source_dir):
  for file in files:
    # Check if the file is a checkpoint file
    if file.endswith('.ckpt'):
        source_file_path = os.path.join(root, file)
        target_file_path = os.path.join(output_dir, file)

      # Copy the file to the target directory if it doesn't exist
    if not os.path.exists(target_file_path):
        shutil.copy2(source_file_path, target_file_path)

### Monitor training with Tensorboard

In [None]:
# Load the TensorBoard notebook extension
%load_ext TensorBoard

%tensorboard --logdir /content/efficientspeech/lightning_logs/