<a href="https://colab.research.google.com/github/xiaoyufan/speech-data-augmentation/blob/main/inference_and_evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup

### Create logger

In [103]:
from importlib import reload
import logging
import sys

reload(logging)

LOGGING_LEVEL = 'DEBUG'

logger = logging.getLogger('baseline')
logger.setLevel(LOGGING_LEVEL)

formatter = logging.Formatter('[%(asctime)s - logger %(name)s - %(levelname)s] %(message)s')

ch = logging.StreamHandler(sys.stdout)
ch.setFormatter(formatter)
logger.addHandler(ch)

logger.debug('debug test')
logger.info('info test')

[2020-12-16 21:29:53,890 - logger baseline - DEBUG] debug test
[2020-12-16 21:29:53,891 - logger baseline - INFO] info test


### Mount Google Drive

In [104]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Configurations

#### Configure mode

In [105]:
MODES = {
  'BASELINE': 'baseline',
}
MODE =  MODES['BASELINE']
logger.info(f'Notebook runs in {MODE} mode.')

[2020-12-16 21:29:54,319 - logger baseline - INFO] Notebook runs in baseline mode.


#### Get notebook's start time

In [106]:
from datetime import datetime, tzinfo
import pytz

NB_RUN_TIME = datetime.now(tz=pytz.timezone('US/Eastern')).strftime('%Y%m%d-%H%M%S')
logger.info(f'Notebook started at {NB_RUN_TIME}.')

[2020-12-16 21:29:54,326 - logger baseline - INFO] Notebook started at 20201216-162954.


#### Paths

In [107]:
DEEPSPEECH_PATH = '/content/DeepSpeech'
PROJECT_ROOT_PATH = '/content/drive/MyDrive/nlp-project'

DATASET_PATH = f'{PROJECT_ROOT_PATH}/cmu_arctic'
TEST_DIR = f'{DATASET_PATH}/test'
WAV_TEST_DIR = f'{TEST_DIR}/audio'
TEST_FILES_PATH = f'{TEST_DIR}/test_files.csv'

INFERENCE_DIR = f'{PROJECT_ROOT_PATH}/xiaoyu-baseline/20201216-154833-all-b8-e80'
INFERENCE_LOG_LOCAL_PATH = '/content/inference.log'
INFERENCE_OUTPUT_DIR = f'{INFERENCE_DIR}/inferences'
INFERENCE_RESULTS_PATH = f'{INFERENCE_OUTPUT_DIR}/results.csv'
EVALUATION_RESULTS_PATH = f'{INFERENCE_OUTPUT_DIR}/evaluation.json'
MODEL_PATH = f'{INFERENCE_DIR}/models/output_graph.pb'

### Create non-exist directories

In [108]:
!mkdir -p $INFERENCE_OUTPUT_DIR

### Install packages

In [109]:
%%bash -s "$DEEPSPEECH_PATH"
DEEPSPEECH_PATH=$1

if [ ! -d "$DEEPSPEECH_PATH" ] ; then
  git clone --branch v0.9.2 https://github.com/mozilla/DeepSpeech $DEEPSPEECH_PATH
fi

cd $DEEPSPEECH_PATH
pip install --upgrade pip==20.2.2 wheel==0.34.2 setuptools==49.6.0
pip install --upgrade -e .

# pip uninstall tensorflow -y
pip install --upgrade tensorflow==1.15.4
pip install tensorflow-gpu==1.15.4

# Install other python packages
pip install pandas
# tensorflow 1.15.4 requires numpy<1.19.0,>=1.16.0, but you'll have numpy 1.19.4 which is incompatible.
pip install --upgrade numpy==1.16.0

Requirement already up-to-date: pip==20.2.2 in /usr/local/lib/python3.6/dist-packages (20.2.2)
Requirement already up-to-date: wheel==0.34.2 in /usr/local/lib/python3.6/dist-packages (0.34.2)
Requirement already up-to-date: setuptools==49.6.0 in /usr/local/lib/python3.6/dist-packages (49.6.0)
Obtaining file:///content/DeepSpeech
Installing collected packages: deepspeech-training
  Attempting uninstall: deepspeech-training
    Found existing installation: deepspeech-training 0.9.2
    Can't uninstall 'deepspeech-training'. No files were found to uninstall.
  Running setup.py develop for deepspeech-training
Successfully installed deepspeech-training
Requirement already up-to-date: tensorflow==1.15.4 in /usr/local/lib/python3.6/dist-packages (1.15.4)
Requirement already up-to-date: numpy==1.16.0 in /usr/local/lib/python3.6/dist-packages (1.16.0)


In [110]:
%%bash

pip install --upgrade deepspeech-gpu==0.9.2

Requirement already up-to-date: deepspeech-gpu==0.9.2 in /usr/local/lib/python3.6/dist-packages (0.9.2)


#  Inference

## Run inference

In [111]:
%%bash -s "$MODEL_PATH" "$WAV_TEST_DIR" "$INFERENCE_LOG_LOCAL_PATH" "$INFERENCE_OUTPUT_DIR"
MODEL_PATH=$1
WAV_TEST_DIR=$2
INFERENCE_LOG_LOCAL_PATH=$3
INFERENCE_OUTPUT_DIR=$4

for file in $WAV_TEST_DIR/female_40/*; do
  echo "=========================="
  echo "Running inference on $file"
  deepspeech --model $MODEL_PATH --audio $file
done 2>&1 | tee -a $INFERENCE_LOG_LOCAL_PATH

cp inference.log $INFERENCE_OUTPUT_DIR/inference.log

Running inference on /content/drive/MyDrive/nlp-project/cmu_arctic/test/audio/female_40/arctic_a0161.wav
2020-12-16 21:30:14.867159: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.2-0-gb2920c7
2020-12-16 21:30:14.995350: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-16 21:30:14.997393: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-16 21:30:15.029916: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so retu

## Collect hypothesis

In [112]:
%%bash

pip install jiwer



In [113]:
import jiwer
import re

transformation = jiwer.Compose([
  jiwer.ToLowerCase(),
  jiwer.RemoveWhiteSpace(replace_by_space=True),
  jiwer.RemoveMultipleSpaces(),
  jiwer.Strip(),
]) 

PUNCTUATIONS_TO_REMOVE = re.compile(r'[!"#$%&()*\+,-./\\:;<=>?@\[\]^_`{|}~]')

def process_hypothesis(raw):
  processed = re.sub(PUNCTUATIONS_TO_REMOVE, ' ', raw)
  processed = transformation(processed)
  return processed

In [114]:
import pandas

RESULTS_FILE_COLUMNS = ['wav_filename', 'hypothesis', 'ground_truth']

test_files_df = pandas.read_csv(TEST_FILES_PATH)

with open(INFERENCE_LOG_LOCAL_PATH) as file:
  data = []

  for line in file:
    # Remove the trailing newline character
    line = line.rstrip()    

    # Find wav filename
    if line.startswith('Running inference on'):
      wav_filename = line.replace('Running inference on ', '')
      continue
    
    # Find predicted text
    if line.startswith('Inference took'):
      raw_hypothesis = next(file, None)
      hypothesis = process_hypothesis(raw_hypothesis)

      if wav_filename:
        test_data_loc = test_files_df.loc[test_files_df['wav_filename'] == wav_filename]
        ground_truth = test_data_loc['transcript'].iloc[0]

      # if wav_filename and predicted and ground_truth:
      if wav_filename and predicted and ground_truth:
        logger.debug(f'wav_filename: {wav_filename}, hypothesis: {hypothesis}, ground_truth: {ground_truth}')
        data.append((wav_filename, hypothesis, ground_truth))
      else:
        logger.warn(f'wav_filename: {wav_filename} or hypothesis: {hypothesis} or ground_truth: {ground_truth} is missing. Not writing to the result file')

      wav_filename = None
      hypothesis = None
      ground_truth = None

    df = pandas.DataFrame(data=data, columns=RESULTS_FILE_COLUMNS)
    df.to_csv(INFERENCE_RESULTS_PATH, index=False)

[2020-12-16 21:35:01,187 - logger baseline - DEBUG] wav_filename: /content/drive/MyDrive/nlp-project/cmu_arctic/test/audio/female_40/arctic_a0161.wav, hypothesis: thaicowavther formen anthi witin bewor er brob, ground_truth: take away their foreman and they wouldn't be worth their grub
[2020-12-16 21:35:01,380 - logger baseline - DEBUG] wav_filename: /content/drive/MyDrive/nlp-project/cmu_arctic/test/audio/female_40/arctic_a0162.wav, hypothesis: dac esub frmrenig expling thorp, ground_truth: that's the sub foreman explained thorpe
[2020-12-16 21:35:01,570 - logger baseline - DEBUG] wav_filename: /content/drive/MyDrive/nlp-project/cmu_arctic/test/audio/female_40/arctic_a0163.wav, hypothesis: philipt mad no e er to falow, ground_truth: philip made no effort to follow
[2020-12-16 21:35:01,769 - logger baseline - DEBUG] wav_filename: /content/drive/MyDrive/nlp-project/cmu_arctic/test/audio/female_40/arctic_a0164.wav, hypothesis: he came firstas yearego andre heeldam self to jhealn, ground_

# Evaluation

## Run evaluation

In [115]:
def levenshtein(a, b):
  "Calculates the Levenshtein distance between a and b."
  n, m = len(a), len(b)
  if n > m:
    # Make sure n <= m, to use O(min(n,m)) space
    a, b = b, a
    n, m = m, n

  current = list(range(n+1))
  for i in range(1, m+1):
    previous, current = current, [i]+[0]*n
    for j in range(1, n+1):
      add, delete = previous[j]+1, current[j-1]+1
      change = previous[j-1]
      if a[j-1] != b[i-1]:
        change = change + 1
      current[j] = min(add, delete, change)

  return current[n]

def compute_cer(ground_truth, hypothesis):
  ground_truth_str = ''.join(ground_truth)
  hypothesis_str = ''.join(hypothesis)
  return levenshtein(ground_truth_str, hypothesis_str) / len(ground_truth_str)

In [116]:
import jiwer
import json
import pandas

results_df = pandas.read_csv(INFERENCE_RESULTS_PATH)
  
ground_truth = results_df['ground_truth'].tolist()
hypothesis = results_df['hypothesis'].tolist()

measures = jiwer.compute_measures(ground_truth, hypothesis)
measures['cer'] = compute_cer(ground_truth, hypothesis)
logger.info('Evaluation:')
logger.info(measures)

with open(EVALUATION_RESULTS_PATH, 'w') as file:
  json.dump(measures, file, indent=2)

[2020-12-16 21:36:16,396 - logger baseline - INFO] Evaluation:
[2020-12-16 21:36:16,397 - logger baseline - INFO] {'wer': 0.8197674418604651, 'mer': 0.8075601374570447, 'wil': 0.957543849092933, 'wip': 0.042456150907067, 'hits': 336, 'substitutions': 1184, 'deletions': 200, 'insertions': 26, 'cer': 0.3530285714285714}
