# Verbalizing Hillary

The goals of this notebook are:

- Construct an algorithm to verbalize the Hillary dataset.
- Understand the distribution of not verbalized symbols such as numbers and special characters.

In [5]:
import re
import sys

# Setup the "PYTHONPATH"
sys.path.insert(0, '../')

from src.datasets.lj_speech import _iterate_and_replace

In [6]:
from src.datasets import hillary_dataset

data, _ = hillary_dataset(directory='../data')

No config for `hillary.hillary_dataset` (`src.datasets.hillary.hillary_dataset`)
100%|██████████| 10528/10528 [00:00<00:00, 342004.99it/s]


In [7]:
import random
import os

from IPython.display import Audio
from IPython.display import Markdown
from IPython.display import FileLink

def find_examples(regex, display_n=5, load_audio=False, replace=True, group=1, context=25):
    """ Print ``display_n`` examples of ``regex`` in ``lj_speech_dataset``.
    
    This is the bread and butter module for our data analysis. Enabling us to use regex to query the
    dataset and retrieve samples.
    
    Args:
        regex (str): Pattern or compiled regex object.
        display_n (int or None, optional): Number of examples to display.
        load_audio (bool, optional): If to load audio.
        replace (bool, optional): If to replace the matched characters with XXX...
        group (int, optional): Group to select in regex.
        context (int, optional): Number of characters to include on the left and right of the matched
            text as context.
        
    Returns:
        None
    """
    examples = []
    for row in data:
        matches = re.finditer(regex, row['text'])
        for match in matches:
            if match.start(group) - match.end(group) == 0:
                continue
            
            text = row['text']    
            start = match.start(group)
            end = match.end(group)
            start_context = max(start - context, 0)
            end_context = min(end + context, len(text))
            
            if replace:
                row['text'] = '{}{}{}'.format(text[:start],
                                              'X' * (end - start), 
                                              text[end:])
                
            if start != 0 or end != len(row['text']):
                text = '{}**{}**{}'.format(text[start_context:start],
                                           match.group(group),
                                           text[end:end_context])
            if start != 0:
                text = '…' + text
            if end != len(row['text']):
                text = text + '…'
                
            examples.append({
                'text': text,
                'audio': row['wav_filename']
            })
            
    # Print Examples
    display(Markdown('### Examples Captured by Regex'))
    display(Markdown('**Regex:** ' + str(regex)))
    display(Markdown('**Number of Examples:** ' + str(len(examples))))
    
    random.shuffle(examples)
    if display_n is not None:
        examples = examples[:display_n]
    
    display(Markdown('**Number of Examples Shown:** ' + str(len(examples))))
    display(Markdown('\n\n ___'))
    
    for example in examples:
        display(Markdown('**Text:** "' + example['text'] + '"'))
        display(FileLink(example['audio']))
        if load_audio:
            display(Audio(str(example['audio'])))
        display(Markdown('\n\n ___'))
        display()

## Sample of the Dataset

In [8]:
find_examples(r'(?s).*', display_n=100, replace=False, group=0, load_audio=False)

### Examples Captured by Regex

**Regex:** (?s).*

**Number of Examples:** 8422

**Number of Examples Shown:** 100



 ___

**Text:** "Businessballs education website."



 ___

**Text:** "Why, he's bought forty pounds of goods from you already."



 ___

**Text:** "During the early nineteenth century, the beaver pelt was the single most valuable commodity;"



 ___

**Text:** "faces of Caucasian and Japanese individuals"



 ___

**Text:** "functionalist perspective,"



 ___

**Text:** "are often hard to test objectively, e."



 ___

**Text:** "give "clear, conspicuous, and accurate statements" of their information-sharing practices."



 ___

**Text:** "If your new puppy is younger than 10 weeks when you bring him home, we suggest placing an exercise pen around the crate."



 ___

**Text:** "Savar building collapse, which killed over 1000"



 ___

**Text:** "Instead, use shims to"



 ___

**Text:** "As a result, many organizations doing business within the EU began to draft policies to comply with this Directive."



 ___

**Text:** "There is a small suction cup underneath the counting device, making sure that the count is accurate"



 ___

**Text:** "Kenneth R. Andrews helped popularize the framework"



 ___

**Text:** "extended corporate social responsibility from the traditional economic and legal responsibility"



 ___

**Text:** "CEOs' political ideologies are evident manifestations"



 ___

**Text:** "The major branches of management are financial management, marketing management,"



 ___

**Text:** "In Information Technology, the disambiguation"



 ___

**Text:** "who can challenge her/his role in the organization and reduce it to that of a figurehead."



 ___

**Text:** "Critical thinking is significant in academics due to being significant in learning."



 ___

**Text:** "that its invention within living memory, and by Alfred North Whitehead"



 ___

**Text:** "requires notice in writing of the privacy practices of health care services,"



 ___

**Text:** "A business name structure does not separate the business entity from the owner,"



 ___

**Text:** "they define and discuss information and policies from top management to lower management,"



 ___

**Text:** "These changes required a greater mental capacity and, in turn, a larger brain size."



 ___

**Text:** "There were three phases according to Hymer's work. The first phase of Hymer's work was his dissertation in 1960"



 ___

**Text:** "reciprocal"



 ___

**Text:** "and a standing army are not the ideal ingredients for a 'get away from it all' holiday."



 ___

**Text:** "which take over once strategic management decisions are implemented."



 ___

**Text:** "Many state institutions and enterprises in China and Russia"



 ___

**Text:** "More than 16 million Americans served in the armed forces during the war."



 ___

**Text:** "MMP will attempt to open and seal both the sealed and unsealed envelopes."



 ___

**Text:** "after their arrival at Cuatitlan ."



 ___

**Text:** "But the brainwashing was soon to follow, for that is how true selling works."



 ___

**Text:** "Policy actors attempt to determine whether the course of action is a success or failure by examining its impact and outcomes."



 ___

**Text:** "Fungi communicate with their own and related species"



 ___

**Text:** "Whistleblowers"



 ___

**Text:** "may file a formal report, rather than confronting the wrongdoer, because confrontation would be more emotionally and psychologically stressful."



 ___

**Text:** "Canadian media scholar Harold Innis"



 ___

**Text:** "when invading British troops set fire to the Capitol Building, burning and pillaging the contents of the small library."



 ___

**Text:** "Your face is red with blood."



 ___

**Text:** "Most stores and catalog companies are distributors or retailers."



 ___

**Text:** "Clicking on an underlined word or phrase will darken it."



 ___

**Text:** "With a beginning that was unlike any other beginning of a ‘war’ up until then,"



 ___

**Text:** "steps that make it happen. When you're ready to begin, click "episode 8" on your screen. Ready?"



 ___

**Text:** "Strategy should be seen as laying out the general path rather than precise steps."



 ___

**Text:** "Our goal is to teach and explain the fundamental rules of money, wealth and debt management."



 ___

**Text:** "Whistleblowing in the United Kingdom is subject to the Public Interest Disclosure Act 1998."



 ___

**Text:** "His voice was passionately rebellious."



 ___

**Text:** "shows that the company aims to treat all its staff equally."



 ___

**Text:** "is why and under what circumstances do people either act on the spot to stop illegal and otherwise unacceptable behavior"



 ___

**Text:** "preparing them for top positions."



 ___

**Text:** "MFC samples.  Myapp lets you open new child windows,"



 ___

**Text:** "and your environment. Recording should always be done in the quietest possible space,"



 ___

**Text:** "suffered what some thought might be terminal health complications,"



 ___

**Text:** "The need for continuous adaption"



 ___

**Text:** "Rough edges on your nails will impair the tone of your playing,"



 ___

**Text:** "Then "the risk that a government will indiscriminately change the laws, regulations, or contracts governing an investment—"



 ___

**Text:** "it has "flow-through taxation to the members" and must be "dissolved upon the death or bankruptcy of a member"."



 ___

**Text:** "From 1997"



 ___

**Text:** "and see if their means are normally distributed. If not, the sample size should be increased."



 ___

**Text:** "The reason it is called a cycle is that once one is completed with a problem another usually will pop up."



 ___

**Text:** "The boy grew and prospered."



 ___

**Text:** "Power is a stronger form of influence because it reflects a person's ability to enforce action"



 ___

**Text:** "we begin working on the shorter axis."



 ___

**Text:** "Liane Gabora, posits that creativity arises due to the self-organizing,"



 ___

**Text:** "Critical thinking calls for the ability to:

Recognize problems, to find workable means for meeting those problems"



 ___

**Text:** "In the English and Welsh school systems, Critical Thinking is offered as a subject that 16- to"



 ___

**Text:** "concluded that bachelor's degree and master's degree holders felt that the training received through education"



 ___

**Text:** "Now Irvine was a man of impulse, a poet."



 ___

**Text:** "for Germans it provides secure employment;"



 ___

**Text:** "The local leader of the Romanian management consulting market is Ensight Management Consulting."



 ___

**Text:** "represents the key to solving the problem."



 ___

**Text:** "or sometimes being eliminated outright)."



 ___

**Text:** "or collect some news from the television"



 ___

**Text:** "within the boundaries set by the organization's strategy."



 ___

**Text:** "Suddenly his fingers closed tightly over the handkerchief."



 ___

**Text:** "Pioneering scientists are trekking across new frontiers of neuroscience"



 ___

**Text:** "organizations are increasingly adopting the use of consolidated and"



 ___

**Text:** "In 1987"



 ___

**Text:** "NYC STUFF Exchange"



 ___

**Text:** "they strive to be helpful towards others, warm in relation to others, understanding, and mindful of others' feelings."



 ___

**Text:** "'one's own',"



 ___

**Text:** "to attain"



 ___

**Text:** "of"



 ___

**Text:** "About two weeks later, return to those banks and ask to borrow the same amount of money you already have on deposit."



 ___

**Text:** "Some specialized businesses may also require licenses, either due to laws governing entry"



 ___

**Text:** "by which incoming stimuli can be ordered and dispatched.""



 ___

**Text:** "through July 27th 1953."



 ___

**Text:** "with Journal of Business Ethics and Business Ethics Quarterly considered the leaders."



 ___

**Text:** "so be sure to use a good set of clippers and an emery board to maintain smooth nail tips."



 ___

**Text:** "in his book On Killing, suggests that military training artificially creates"



 ___

**Text:** "21 February 2014. The Whistle Blowers Protection Act,"



 ___

**Text:** "and dissolving one's limiting beliefs and habits,"



 ___

**Text:** "namely the Explicit–Implicit Interaction theory of creativity."



 ___

**Text:** "and its position at any particular instant in time."



 ___

**Text:** "They must have been swept away by the chaotic currents."



 ___

**Text:** "published in the following year."



 ___

**Text:** "whether it is a business, a not-for-profit organization, or government body."



 ___

**Text:** "Floating candles add instant atmosphere to a casual summer table,"



 ___

**Text:** "while the other 50 percent lies within academic skills."



 ___