# Verbalizing Hillary

The goals of this notebook are:

- Construct an algorithm to verbalize the Hillary dataset.
- Understand the distribution of not verbalized symbols such as numbers and special characters.

In [1]:
import re
import sys

# Setup the "PYTHONPATH"
sys.path.insert(0, '../')

from src.datasets.lj_speech import _iterate_and_replace

In [21]:
from src.datasets import hillary_dataset

data, _ = hillary_dataset(directory='../data')

No config for `hillary.hillary_dataset` (`src.datasets.hillary.hillary_dataset`)
100%|██████████| 10528/10528 [00:00<00:00, 309411.29it/s]


In [42]:
import random
import os

from IPython.display import Audio
from IPython.display import Markdown
from IPython.display import FileLink

def find_examples(regex, display_n=5, load_audio=False, replace=True, group=1, context=25):
    """ Print ``display_n`` examples of ``regex`` in ``lj_speech_dataset``.
    
    This is the bread and butter module for our data analysis. Enabling us to use regex to query the
    dataset and retrieve samples.
    
    Args:
        regex (str): Pattern or compiled regex object.
        display_n (int or None, optional): Number of examples to display.
        load_audio (bool, optional): If to load audio.
        replace (bool, optional): If to replace the matched characters with XXX...
        group (int, optional): Group to select in regex.
        context (int, optional): Number of characters to include on the left and right of the matched
            text as context.
        
    Returns:
        None
    """
    examples = []
    for row in data:
        matches = re.finditer(regex, row['text'])
        for match in matches:
            if match.start(group) - match.end(group) == 0:
                continue
            
            text = row['text']    
            start = match.start(group)
            end = match.end(group)
            start_context = max(start - context, 0)
            end_context = min(end + context, len(text))
            
            if replace:
                row['text'] = '{}{}{}'.format(text[:start],
                                              'X' * (end - start),
                                              text[end:])
                
            if start != 0 or end != len(row['text']):
                text = '{}**{}**{}'.format(text[start_context:start],
                                           match.group(group),
                                           text[end:end_context])
            if start != 0:
                text = '…' + text
            if end != len(row['text']):
                text = text + '…'
                
            examples.append({
                'text': text,
                'audio': row['wav_filename']
            })
            
    # Print Examples
    display(Markdown('### Examples Captured by Regex'))
    display(Markdown('**Regex:** ' + str(regex)))
    display(Markdown('**Number of Examples:** ' + str(len(examples))))
    
    random.shuffle(examples)
    if display_n is not None:
        examples = examples[:display_n]
    
    display(Markdown('**Number of Examples Shown:** ' + str(len(examples))))
    display(Markdown('\n\n ___'))
    
    for example in examples:
        display(Markdown('**Text:** "' + example['text'] + '"'))
        display(FileLink(example['audio']))
        if load_audio:
            display(Audio(str(example['audio'])))
        display(Markdown('\n\n ___'))
        display()

## Sample of the Dataset

In [43]:
find_examples(r'(?s).*', display_n=100, replace=False, group=0, load_audio=True)

### Examples Captured by Regex

**Regex:** (?s).*

**Number of Examples:** 8422

**Number of Examples Shown:** 100



 ___

**Text:** "Physical examinations to determine whether a person has a physical condition that would create a problem."



 ___

**Text:** "management by members or managers, and limitations on ownership transfer", i.e., L.L.C."



 ___

**Text:** "therefore being responsible for any harm to come as a result."



 ___

**Text:** "Customer: 	I’d like to open a checking account.

Banker:	 Great! I can help you with that. Please sit down, my   name is John."



 ___

**Text:** "and critics were accused of supporting communists."



 ___

**Text:** "whereas members of a jury come to a decision."



 ___

**Text:** "More importantly, we’ll teach you a new way to look at your spending, saving, investing"



 ___

**Text:** "shows that the company aims to treat all its staff equally."



 ___

**Text:** "But they make the mistake of ignoring their own duality."



 ___

**Text:** "Thus, the first step in evaluating organizational effectiveness"



 ___

**Text:** "Wash your hands prior to playing. Clean hands transmit less dirt and help maintain string life and tone."



 ___

**Text:** "brought about by insufficient training to enable them to carry out particular tasks,"



 ___

**Text:** "and trait EI."



 ___

**Text:** "Historically this use of the term often contrasted"



 ___

**Text:** "Preckel et al., investigating fluid intelligence and creativity, reported small correlations of r = .3"



 ___

**Text:** "The Falcon Heavy consists of 27 engines with 5 million pounds of thrust -"



 ___

**Text:** "More than unspoiled"



 ___

**Text:** "Nonverbal communication describes the processes of conveying a type of information in the form of non-linguistic representations."



 ___

**Text:** "because how it will actualize depends on the different internally or externally generated contexts"



 ___

**Text:** "And every story matters.

The first men painted stories on stone walls, the ancient Egyptians chose the chisel instead."



 ___

**Text:** "THIS MAJESTIC WILDERNESS OF TOWERING TREES IS A SANCTUARY. FOR WILDLIFE… FOR CHAMPION TREES…"



 ___

**Text:** "Congress has, at times, considered comprehensive laws regulating the collection of information online,"



 ___

**Text:** "This radiation is very powerful and dangerous to living things."



 ___

**Text:** "They are also being considered in Kenya and Rwanda."



 ___

**Text:** "dollar line of unsecured"



 ___

**Text:** "inferred from it. These character traits can be powerful forces which are totally unconscious to the person."



 ___

**Text:** "They put American science on the world stage and nearly destroyed one another in the process."



 ___

**Text:** "1864, Sherman offered the port city to President Lincoln as a Christmas gift. Union victory was near."



 ___

**Text:** "Mpofu et al. surveyed"



 ___

**Text:** "Sheehy examined a range of different disciplinary approaches to defining CSR."



 ___

**Text:** "was $212,512 in 2010."



 ___

**Text:** "Using proprietary advice technology, proven financial methodology,"



 ___

**Text:** "wrote in"



 ___

**Text:** "Note that many of the assumptions made by management"



 ___

**Text:** "Emotions motivate individual behavior that aids in solving communal challenges"



 ___

**Text:** "There are laws in a number of states. The former NSW Police Commissioner Tony Lauer"



 ___

**Text:** "God bless 'em, I hope I'll go on seeing them forever."



 ___

**Text:** "The survey found out that businesses have assimilated a much more strategic view, and that 68%"



 ___

**Text:** "is that it can overly constrain managerial discretion in a dynamic environment. ""



 ___

**Text:** "overbilling for days not worked, speed at the cost of quality,"



 ___

**Text:** "risk management systems and appropriate skills and expertise to manage the superannuation fund."



 ___

**Text:** "The gray eyes faltered; the flush deepened."



 ___

**Text:** "through the Canadian Securities Administrators.

Keeping the Promise for a Strong Economy Act, 2002 is"



 ___

**Text:** "EI and methods of developing it have become more widely coveted in the past decade."



 ___

**Text:** "1977, Abraham Zaleznik distinguished leaders from managers."



 ___

**Text:** "increase the knowledge, sharpen and add to the skills,"



 ___

**Text:** "My, I'm almost homesick for it already."



 ___

**Text:** "Robbery, bribery, fraud,"



 ___

**Text:** "It is essential for innovation, and is a factor affecting economic growth and businesses."



 ___

**Text:** "He cried in such genuine dismay that she broke into hearty laughter."



 ___

**Text:** "The deer approaches the opening, unaware of the cougar's presence. Slowly and quietly, Shuka creeps toward his prey."



 ___

**Text:** "In 2013,"



 ___

**Text:** "endows them with the authority attached to their position."



 ___

**Text:** "Besides, that noise makes me deaf."



 ___

**Text:** "Government employees could be at a similar risk for bringing threats to health or the environment to public attention,"



 ___

**Text:** "colours and textures can be blended in any combination or ratio - its up to you to decide what you like."



 ___

**Text:** "Nobody knows how the natives got them."



 ___

**Text:** "and Tolerance""



 ___

**Text:** "Experts believe it may have started in the Philippines by a"



 ___

**Text:** "This environment consists of three interrelated dimensions, which continuously interact with individuals, organizations, and systems."



 ___

**Text:** "and sets out standards of good practice in relation to board leadership and effectiveness,"



 ___

**Text:** "simple problems were used for reasons of convenience and with the expectation that thought generalizations"



 ___

**Text:** "To be considered a"



 ___

**Text:** "A limited liability company. "A company—statutorily authorized in certain states—that is"



 ___

**Text:** "to maintain competitiveness."



 ___

**Text:** "417 feet above the sea,"



 ___

**Text:** "into certain trades, occupations or professions, that require special education or to raise revenue for local governments."



 ___

**Text:** "Reflected in the liquid's shimmering surface, the flickers of a candle's flame appear doubly radiant and twice as beautiful."



 ___

**Text:** "around. Still, a country that gives a wig-wearing ex-junkie balladeer a knighthood"



 ___

**Text:** "'creativity' may affect the views of creativity among speakers of such languages. However, more research would be needed to establish this,"



 ___

**Text:** "Many companies employ benchmarking to assess their CSR policy, implementation and effectiveness."



 ___

**Text:** "Service providers must establish an infrastructure to support the availability requirements, or they may be subject to fines and penalties."



 ___

**Text:** "but offers more protection and benefits for the owner."



 ___

**Text:** "The Torrance Tests of Creative Thinking assesses"



 ___

**Text:** "Sociology distinguishes the term organisation into planned formal and unplanned informal organisations."



 ___

**Text:** "of multinational enterprises."



 ___

**Text:** "The application you'll create (called Myapp) is a subset of the VIEWEX sample application provided with the"



 ___

**Text:** "Functional fixedness is a specific form of mental set and fixation, which was alluded to earlier in the"



 ___

**Text:** "Examples of the benefit of understanding local culture include the following:"



 ___

**Text:** "The coming ISO 37001 – anti-bribery management systems standard,"



 ___

**Text:** "The festival of the dead was gradually incorporated into Christian ritual."



 ___

**Text:** "to such famous landmarks as St. Stephen’s Green, University College, and the Martello Tower in the nearby suburb of Sandycove."



 ___

**Text:** "In"



 ___

**Text:** "They ate dinner at the fifth, and rested for two hours."



 ___

**Text:** "In general, avoid putting orchids in drafty shelves."



 ___

**Text:** "and
Involves both conceptual and analytical thought processes."



 ___

**Text:** "whistleblowers has become a serious issue in many parts of the world:"



 ___

**Text:** "This work represents an initial step in the development of process-based theories of creativity"



 ___

**Text:** "I have hunted along this ridge, replied Philip."



 ___

**Text:** "An employee will, however, not breach his duty of good faith if he reports an irregularity to an authority and

a period set by the employer"



 ___

**Text:** "organisation analysis. A number of different perspectives exist, some of which are compatible:"



 ___

**Text:** "I'll only be in the way."



 ___

**Text:** "and he is said to have been the first to call himself a psychologist;"



 ___

**Text:** "The subject is complex, and several different definitions exist, which generally include the rational, skeptical,"



 ___

**Text:** "'ways' to achieve 'ends'.""



 ___

**Text:** "Generally, a smaller business is more flexible, while larger businesses, or those with wider ownership or more formal structures,"



 ___

**Text:** "Both of these notable works lent great initial support for the notion that leadership is rooted in characteristics of a leader."



 ___

**Text:** "some have skills but not the disposition to use them, some are disposed but lack strong skills, and some have neither."



 ___

**Text:** "People skills are patterns of behavior and behavioral interactions."



 ___

**Text:** "The sixth day he spent in the cabin with Gregson."



 ___