<span style="font-family:Lucida Bright;">
<p style="margin-bottom:0.5cm"></p>
<center>
<font size="8"><b>Deep Learning, Fall 2021</b></font>
<p style="margin-bottom:0.6cm"></p>
<font size="3"><b>Final Project:</b></font>
<p style="margin-bottom:0.6cm"></p>
<font size="5"><b>Enhancing Voices for Better Speech Intelligibility</b></font>
<p style="margin-bottom:2cm"></p>
<font size="6"><b>Start</b></font>
</center>
<p style="margin-bottom:2cm"></p>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Setup" data-toc-modified-id="Setup-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Setup</a></span><ul class="toc-item"><li><span><a href="#Initialization" data-toc-modified-id="Initialization-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Initialization</a></span></li><li><span><a href="#Load-data" data-toc-modified-id="Load-data-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Load data</a></span></li></ul></li><li><span><a href="#Load-wav-file" data-toc-modified-id="Load-wav-file-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Load wav file</a></span></li></ul></div>

# Setup

## Initialization

The initialization procedure is defined in the notebook: [Initialization](intialization.ipynb).

In [40]:
# %run ./initialization.ipynb

In [1]:
import pandas as pd

from toolbox.initialization import *

## Load data

Let's define a function that will load the information from files representing a given sentence from a folder of our choice:

In [42]:
def get_sentence_data(sentence_code: str,
                      path_folder: Path):
    # Define the extensions of the files containing
    # the data about the sentences.
    file_extensions = ['wav', 'txt', 'wrd', 'phn']

    # Create a path for each file type.
    sentence_paths = {
        extension: path_folder / f'{sentence_code}.{extension}'
        for extension in file_extensions
    }

    ## Sentence data.
    # Get the content of the 'txt' file as a list of lines.
    file_content = (
        sentence_paths['txt']
            .read_text()
            .split('\n')
    )

    # Get the data about the sentence
    sentence = dict()
    sentence_data = file_content[0].strip().split()
    sentence['start_sample'] = int(sentence_data[0])
    sentence['end_sample'] = int(sentence_data[1])
    sentence['text'] = ' '.join(sentence_data[2:])
    sentence['audio_path'] = sentence_paths['wav']

    ## Word data
    # Get the content of the 'wrd' as a list of lines.
    file_content = (
        sentence_paths['wrd']
            .read_text()
            .split('\n')
    )

    # Get the data about the words in the sentence.
    words = list()
    for line in file_content:
        # Skip empty lines.
        if not line.strip():
            continue

        # Initialize the dict in which the data about the words
        # will be saved.
        word = dict()

        # Extract the data about the words in the sentence.
        word_data = line.strip().split()
        word['start_sample'] = int(word_data[0])
        word['end_sample'] = int(word_data[1])
        word['text'] = ' '.join(word_data[2:])

        # Append the extracted data to the list of words.
        words.append(word)

    ## Get the data about the phonemes in the sentence.
    # Get the content of the 'wrd' as a list of lines.
    file_content = (
        sentence_paths['phn']
            .read_text()
            .split('\n')
    )

    # Get the data about the words sentence.
    phonemes = list()
    for line in file_content:
        # Skip empty lines.
        if not line.strip():
            continue

        # Initialize the dict in which the data about the words
        # will be saved.
        phoneme = dict()

        # Extract the data about the words in the sentence.
        phoneme_data = line.strip().split()
        phoneme['start_sample'] = int(phoneme_data[0])
        phoneme['end_sample'] = int(phoneme_data[1])
        phoneme['text'] = ' '.join(phoneme_data[2:])

        # Append the extracted data to the list of words.
        phonemes.append(phoneme)

    return sentence, words, phonemes

Now, let's get the data:

In [43]:
# column_names = [
#     'data_group',
#     'dialect',
#     'gender',
#     'speaker',
#     'type',
#     'text',
#     'audio_path',
#     'start_sample',
#     'end_sample',
#     'words_text',
#     'words_start_sample',
#     'words_end_sample',
#     'phonemes_text',
#     'phonemes_start_sample',
#     'phonemes_end_sample'
# ]
#
# df_sentences = pd.DataFrame(columns=column_names)

In [44]:
# # Define the folder paths for the training and test data
# data_folders = dict(
#     train=paths.data.train,
#     test=paths.data.test
# )
#
# count = 0
#
# for data_group, path_data_folder in data_folders.items():
#     # Get the dialects.
#     dialects = [
#         folder.stem
#         for folder in path_data_folder.iterdir()
#         if folder.is_dir()
#     ]
#
#     # Get the speaker codes.
#     for dialect in dialects:
#         # Get the path to the folder representing
#         # a given dialect.
#         path_dialect_folder = path_data_folder / dialect
#
#         # Get the speaker codes present in the folder.
#         speaker_codes = [
#             folder.stem
#             for folder in path_dialect_folder.iterdir()
#             if folder.is_dir()
#         ]
#
#         # Get information for each speaker.
#         for speaker_code in speaker_codes:
#             # Get the gender and ID code for the speaker.
#             gender = speaker_code[0]
#             speaker = speaker_code[1:]
#
#             # Define the path to the folder containing the
#             # sentences spoken by the speaker.
#             speaker_folder = path_dialect_folder / speaker_code
#
#             # Get the codes for all the sentences present in the
#             # speaker folder.
#             sentence_codes = [
#                 path.stem
#                 for path in speaker_folder.glob('**/*.wav')
#             ]
#
#             # Get the information for each sentence.
#             for sentence_code in sentence_codes:
#                 # Extract and interpret the sentence types.
#                 sentence_type_code = sentence_code[:2]
#
#                 if sentence_type_code.lower() == 'si':
#                     sentence_type = 'phonetically-diverse'
#
#                 elif sentence_type_code.lower() == 'sa':
#                     sentence_type = 'dialect'
#
#                 elif sentence_type_code.lower() == 'sx':
#                     sentence_type = 'phonetically-compact'
#
#                 else:
#                     raise ValueError(
#                         f'Invalid sentence type: '
#                         f'"{sentence_type_code}"'
#                     )
#
#                 # Extract the sentence number.
#                 sentence_number = sentence_code[-1]
#
#                 # Get the data about the
#                 sentence, words, phonemes = \
#                     get_sentence_data(sentence_code, speaker_folder)
#
#                 # Put the data about the data in a dict
#                 new_row = {
#                     'data_group': data_group,
#                     'dialect': dialect,
#                     'gender': gender,
#                     'speaker': speaker,
#                     'type': sentence_type,
#                     'text': sentence['text'],
#                     'audio_path': sentence['audio_path'],
#                     'start_sample': sentence['start_sample'],
#                     'end_sample': sentence['end_sample'],
#                     'words_text': [word['text']
#                                    for word in words],
#                     'words_start_sample': [word['start_sample']
#                                            for word in words],
#                     'words_end_sample': [word['end_sample']
#                                            for word in words],
#                     'phonemes_text': [phoneme['text']
#                                       for phoneme in phonemes],
#                     'phonemes_start_sample': [phoneme['start_sample']
#                                            for phoneme in phonemes],
#                     'phonemes_end_sample': [phoneme['end_sample']
#                                            for phoneme in phonemes]
#                 }
#
#                 # Append the data to the sentence dataframe.
#                 df_sentences = df_sentences.append(new_row,
#                                                    ignore_index=True)
#
# # Save the dataframe to pickle
# df_sentences.to_pickle(paths.cache.sentence_data)

In [50]:
# Load the data from memory:
df_sentences = pd.read_pickle(paths.cache.sentence_data)

In [51]:
display(df_sentences.sample(20))

Unnamed: 0,data_group,dialect,gender,speaker,type,text,audio_path,start_sample,end_sample,words_text,words_start_sample,words_end_sample,phonemes_text,phonemes_start_sample,phonemes_end_sample
545,train,DR2,F,MMH0,phonetically-compact,They often go out in the evening.,G:\My Drive\DTU\Kurser\Deep_Learning_02456\fin...,0,33792,"[they, often, go, out, in, the, evening]","[2360, 4842, 10026, 12616, 15749, 17721, 19938]","[4842, 10026, 12616, 15749, 17721, 19258, 25152]","[h#, dh, ey, aa, f, ix, ng, gcl, g, ow, aw, dx...","[0, 2360, 2924, 4842, 6760, 8260, 9080, 10026,...","[2360, 2924, 4842, 6760, 8260, 9080, 10026, 10..."
4155,train,DR7,M,KLR0,phonetically-compact,The government sought authorization of his cit...,G:\My Drive\DTU\Kurser\Deep_Learning_02456\fin...,0,47104,"[the, government, sought, authorization, of, h...","[9333, 9744, 15303, 19480, 31137, 32779, 35619]","[9744, 15303, 19480, 31137, 32779, 35619, 45500]","[h#, th, ax-h, gcl, g, ah, r, m, ix, tcl, s, a...","[0, 9333, 9577, 9744, 10790, 11100, 12437, 128...","[9333, 9577, 9744, 10790, 11100, 12437, 12828,..."
4794,test,DR2,F,RAM1,phonetically-diverse,Another memo for sightseers: bring your camera...,G:\My Drive\DTU\Kurser\Deep_Learning_02456\fin...,0,71373,"[another, memo, for, sightseers, bring, your, ...","[2440, 9024, 14955, 17503, 32980, 36396, 38251...","[9024, 14955, 17503, 31010, 36396, 38251, 4572...","[h#, q, ax, n, ah, dh, er, m, eh, m, ow, f, er...","[0, 2440, 2713, 3216, 4971, 6760, 7520, 9024, ...","[2440, 2713, 3216, 4971, 6760, 7520, 9024, 105..."
2379,train,DR4,M,LSH0,phonetically-compact,Last year's gas shortage caused steep price in...,G:\My Drive\DTU\Kurser\Deep_Learning_02456\fin...,0,54887,"[last, year's, gas, shortage, caused, steep, p...","[2590, 8610, 12200, 16250, 24680, 32920, 36260...","[8610, 12200, 16250, 24680, 32920, 36260, 4196...","[h#, l, ae, s, tcl, ch, y, ih, axr, z, gcl, g,...","[0, 2590, 3467, 5932, 6934, 7728, 8610, 9082, ...","[2590, 3467, 5932, 6934, 7728, 8610, 9082, 100..."
6020,test,DR7,F,SXA0,dialect,She had your dark suit in greasy wash water al...,G:\My Drive\DTU\Kurser\Deep_Learning_02456\fin...,0,57960,"[she, had, your, dark, suit, in, greasy, wash,...","[2200, 5200, 9107, 11240, 17440, 23143, 26223,...","[5200, 9566, 11240, 17440, 23143, 26223, 32920...","[h#, sh, iy, hv, ae, dcl, jh, er, dcl, d, aa, ...","[0, 2200, 3960, 5200, 6080, 8360, 9107, 9566, ...","[2200, 3960, 5200, 6080, 8360, 9107, 9566, 112..."
1048,train,DR2,M,RLJ0,phonetically-compact,Those who are not purists use canned vegetable...,G:\My Drive\DTU\Kurser\Deep_Learning_02456\fin...,0,54989,"[those, who, are, not, purists, use, canned, v...","[2170, 5529, 7018, 8280, 12600, 20160, 23394, ...","[5529, 7018, 8280, 12600, 20160, 23394, 29160,...","[h#, dh, ow, z, hv, uw, er, nx, aa, tcl, p, y,...","[0, 2170, 2440, 4400, 5529, 5863, 7018, 8280, ...","[2170, 2440, 4400, 5529, 5863, 7018, 8280, 884..."
5430,test,DR4,M,DRM0,dialect,She had your dark suit in greasy wash water al...,G:\My Drive\DTU\Kurser\Deep_Learning_02456\fin...,0,48538,"[she, had, your, dark, suit, in, greasy, wash,...","[2570, 6099, 9453, 11159, 16240, 21383, 22544,...","[6099, 10100, 11159, 16240, 21383, 22544, 2748...","[h#, sh, iy, hv, eh, dcl, jh, axr, dcl, d, aa,...","[0, 2570, 4803, 6099, 7352, 8992, 9453, 10100,...","[2570, 4803, 6099, 7352, 8992, 9453, 10100, 11..."
505,train,DR2,F,LMA0,phonetically-compact,They remained lifelong friends and companions.,G:\My Drive\DTU\Kurser\Deep_Learning_02456\fin...,0,36660,"[they, remained, lifelong, friends, and, compa...","[2040, 3200, 8691, 16158, 21947, 22680]","[3200, 8691, 16158, 21947, 22680, 34339]","[h#, dh, ih, r, axr, m, ey, n, dcl, d, l, ay, ...","[0, 2040, 2520, 3200, 3592, 4762, 5994, 7737, ...","[2040, 2520, 3200, 3592, 4762, 5994, 7737, 811..."
4817,test,DR2,M,ABW0,phonetically-compact,"If people were more generous, there would be n...",G:\My Drive\DTU\Kurser\Deep_Learning_02456\fin...,0,49972,"[if, people, were, more, generous, there, woul...","[2320, 3760, 9566, 11995, 15960, 25322, 26569,...","[3760, 9566, 11995, 15960, 24310, 26569, 29040...","[h#, ix, f, pcl, p, iy, pcl, p, el, w, axr, m,...","[0, 2320, 2800, 3760, 4880, 5500, 6600, 7700, ...","[2320, 2800, 3760, 4880, 5500, 6600, 7700, 796..."
1680,train,DR3,M,MAR0,dialect,She had your dark suit in greasy wash water al...,G:\My Drive\DTU\Kurser\Deep_Learning_02456\fin...,0,64308,"[she, had, your, dark, suit, in, greasy, wash,...","[2340, 5765, 10213, 12116, 17867, 23680, 27650...","[5765, 10213, 12116, 17867, 23680, 27650, 3429...","[h#, sh, iy, hv, ae, dcl, d, y, axr, dcl, d, a...","[0, 2340, 4423, 5765, 6360, 8731, 9680, 10213,...","[2340, 4423, 5765, 6360, 8731, 9680, 10213, 10..."


# Load wav file