# Verbalizing LJ Speech

The goal of this notebook is experiment with different methods for verbalizing LJ speech symbols.

In [1]:
import re
import sys

sys.path.insert(0, '../')

from src.datasets.lj_speech import _iterate_and_replace

In [2]:
import random
import os

from IPython.display import Audio
from IPython.display import Markdown
from IPython.display import FileLink

from src.datasets import lj_speech_dataset

data = lj_speech_dataset(directory='../data', verbalize=False)

def get_unique(examples, get_key):
    """ Get a unique list of ``examples`` based on ``key``.
    
    Args:
        examples (list): Examples to dedup.
        get_key (callable): Get a key to dedup examples.
    """
    seen = set() 
    filtered = []
    for example in examples:
        key = get_key(example)
        if key not in seen:
            seen.add(key)
            filtered.append(example)
    return filtered

def find_examples(regex, display_n=5, match_to_key=None, load_audio=False, replace=True, group=1):
    """ Print ``display_n`` examples of ``regex`` in ``lj_speech_dataset``.
    
    Args:
        regex (str): Pattern or compiled regex object.
        display_n (int or None, optional): Number of examples to display.
        match_to_key (callable or None, optional): Key assigned to match to filter duplicates.
        load_audio (bool, optional): If to load audio.
        replace (bool, optional): If to replace the matched characters with XXX...
        group (int, optional): Group to select in regex.
        
    Returns:
        None
    """
    examples = []
    for row in data:
        matches = re.finditer(regex, row['text'])
        for match in matches:
            start = max(match.start(group) - 25, 0)
            end = min(match.end(group) + 25, len(row['text']))
            if replace:
                row['text'] = (row['text'][:match.start(group)] + 
                               'X' * (match.end(group) - match.start(group)) + 
                               row['text'][match.end(group):])
            if match_to_key is not None:
                key = match_to_key(match.group(group))
            else:
                key = None
            text = (row['text'][start:match.start(group)] + '**' + match.group(group) +
                    '**' + row['text'][match.end(group):end])
            examples.append({
                'text': '…' + text + '…',
                'audio': os.path.join('../data/LJSpeech-1.1/', row['wav']),
                'key': key
            })
            
    # Print Examples
    display(Markdown('### Examples Captured by Regex'))
    display(Markdown('**Regex:** ' + str(regex)))
    display(Markdown('**Number of Examples:** ' + str(len(examples))))
    
    if match_to_key is not None:
        examples = get_unique(examples, lambda example: example['key'])
    random.shuffle(examples)
    if display_n is not None:
        examples = examples[:display_n]
    
    display(Markdown('**Number of Examples Shown:** ' + str(len(examples))))
    display(Markdown('\n\n ___'))
    
    for example in examples:
        display(Markdown('**Text:** "' + example['text'] + '"'))
        if load_audio:
            display(Audio(example['audio']))
        else:
            display(FileLink(example['audio']))
        display(Markdown('\n\n ___'))
        display()

## Sample of Word with a Number

In [3]:
find_examples(r'\S*(\d+)\S*', display_n=100, replace=False, group=0, load_audio=False)

### Examples Captured by Regex

**Regex:** \S*(\d+)\S*

**Number of Examples:** 2118

**Number of Examples Shown:** 100



 ___

**Text:** "…rthur Griffiths. Section **21:** Newgate Notorieties, par…"



 ___

**Text:** "…where he had reported at **12:54** p.m.…"



 ___

**Text:** "…**145** are proposed for the fie…"



 ___

**Text:** "…The Service had **28** agents participating in …"



 ___

**Text:** "…In **1844**…"



 ___

**Text:** "…which was **2** points above the minimum…"



 ___

**Text:** "…Lee Oswald was No. **3;**…"



 ___

**Text:** "…Oswald was carrying only **$13.87** at the time of his arres…"



 ___

**Text:** "…deficiency of upwards of **£70,000.**…"



 ___

**Text:** "…erviewed her on November **1.**…"



 ___

**Text:** "…se forgeries amounted to **£8000** or £10,000,…"



 ___

**Text:** "…the reform introduced in **1837,** were once more in the as…"



 ___

**Text:** "…In **1850** Sir George Grey brought …"



 ___

**Text:** "…e costs were 6 shillings **10** pence.…"



 ___

**Text:** "… part prior to August 9, **1963,**…"



 ___

**Text:** "…s thought this was about **11:55** a.m.…"



 ___

**Text:** "…fortune of her own, some **£1700** or £1800,…"



 ___

**Text:** "…approximately **3** years before. Robert Osw…"



 ___

**Text:** "… of the Oswalds' effects **3** days later.…"



 ___

**Text:** "…eekend of November 16 to **17,** 1963.…"



 ___

**Text:** "… The Assassination: Part **3.**…"



 ___

**Text:** "… New Orleans in April of **1963.**…"



 ___

**Text:** "…On September 20, **1963,** Mrs. Paine and her two c…"



 ___

**Text:** "…ve the weight as between **165** and 175 pounds and the h…"



 ___

**Text:** "…d seen the day before at **10th** and Patton.…"



 ___

**Text:** "…Prior to November 22, **1963**…"



 ___

**Text:** "…viewed the evidence that **(1)** Lee Harvey Oswald purcha…"



 ___

**Text:** "…At **15** yards each man's shots l…"



 ___

**Text:** "…ress was Post Office Box **2915,** Dallas, Texas.…"



 ___

**Text:** "…sk from the street about **2** minutes after the shooti…"



 ___

**Text:** "…1469, **1470;**…"



 ___

**Text:** "…On October **23rd,** I had attended a ultra-r…"



 ___

**Text:** "…ing to this committee of **1863,** beds in the smaller and …"



 ___

**Text:** "…The following **23rd** April, it is stated that…"



 ___

**Text:** "…the House of Commons. In **1849** Mr. Charles Pearson, M.P…"



 ___

**Text:** "…a period ending early in **1962.**…"



 ___

**Text:** "…ry five Presidents since **1865** has been assassinated;…"



 ___

**Text:** "…On Sunday, November **17,** 1963,…"



 ___

**Text:** "…Love Field shortly after **11:50** a.m. and drove at speeds…"



 ___

**Text:** "…time he was in Dallas in **1963**…"



 ___

**Text:** "…ber 19, 1962 to March of **1963.**…"



 ___

**Text:** "…At **12:54** p.m., Tippit reported th…"



 ___

**Text:** "…periments had shown that **24** hours was a likely maxim…"



 ___

**Text:** "…as tested in December of **1956,** and obtained a score of …"



 ___

**Text:** "…was on November **20** to 21, 1963.…"



 ___

**Text:** "… 1963 the total exceeded **32,000** items.…"



 ___

**Text:** "…ayed no longer than 3 or **4** minutes.…"



 ___

**Text:** "…was paid on either April **2** or April 3.…"



 ___

**Text:** "…e sovereign by Oxford in **1840**…"



 ___

**Text:** "…is case between November **5** and November 22? Answer:…"



 ___

**Text:** "…As described in chapter **2,** the President directed t…"



 ___

**Text:** "…ase of poor debtors whom **£4** each could free.…"



 ___

**Text:** "…is represents a speed of **11.2** miles per hour.…"



 ___

**Text:** "… office which is located **4** blocks from the drugstor…"



 ___

**Text:** "…**3.** That I want to, and I sh…"



 ___

**Text:** "… them to Newgate between **1797** and 1808,…"



 ___

**Text:** "…tober 15th for a debt of **1** shilling 5 pence.…"



 ___

**Text:** "…ad fallen into disuse by **1832.**…"



 ___

**Text:** "…ted at Kennington on the **24th** March.…"



 ___

**Text:** "…testified that **20** identifiable fingerprint…"



 ___

**Text:** "…Jones, a burglar, was in **1793** ordered for execution in…"



 ___

**Text:** "…he first, I find that in **1786**…"



 ___

**Text:** "…These were: **1.** The male debtors' side.…"



 ___

**Text:** "… at 500 North Beckley at **12:45** p.m.…"



 ___

**Text:** "…**5.6,** and 6.5 seconds.…"



 ___

**Text:** "…than 5 minutes on August **17,** 1963.…"



 ___

**Text:** "…that it was **12:30** p.m., the time they were…"



 ___

**Text:** "…to May **14,** 1963.…"



 ___

**Text:** "…that date there had been **700** or 800 frequently, and o…"



 ___

**Text:** "…obtain possession of the **£2000,** but had failed, and had …"



 ___

**Text:** "…ons were still living in **1855** who had witnessed dissec…"



 ___

**Text:** "…he city rebelled, and in **484** B.C. he captured it, and…"



 ___

**Text:** "…PRS had received, over a **20-year** period, basic informatio…"



 ___

**Text:** "…ers had risen to 275 and **375** respectively, or 650 in …"



 ___

**Text:** "…th the provisions of the **1865** Act for four consecutive…"



 ___

**Text:** "…included in the group of **400**…"



 ___

**Text:** "…The **500** block of North Beckley i…"



 ___

**Text:** "…In June **1964** PRS had arrangements to …"



 ___

**Text:** "…n was unavailing, and in **275** B.C., the inhabitants of…"



 ___

**Text:** "…s-Herald on November 15, **1963.**…"



 ___

**Text:** "…rthur Griffiths. Section **5:** Newgate down to 1818, pa…"



 ___

**Text:** "…formation to Fritz about **15** or 20 minutes after the …"



 ___

**Text:** "…unloaded at **500** North Beckley at 12:45 p…"



 ___

**Text:** "…ewgate Notorieties, part **1.**…"



 ___

**Text:** "…d Commission Exhibit No. **162,** the jacket found by Capt…"



 ___

**Text:** "…ate of £8 per £100, with **£4** for every additional hun…"



 ___

**Text:** "…and arrived on September **27,** 1963.…"



 ___

**Text:** "…en I got pretty close to **500** block at Neches and Nort…"



 ___

**Text:** "…esident Kennedy. Chapter **7.** Lee Harvey Oswald:…"



 ___

**Text:** "…, 8 pence, with costs of **7** shillings, 6 pence. Quot…"



 ___

**Text:** "…icles of Newgate, Volume **2.** By Arthur Griffiths. Sec…"



 ___

**Text:** "…Chapter **4.** The Assassin: Part 4. Os…"



 ___

**Text:** "…ert, and used a borrowed **.22** caliber bolt-action rifl…"



 ___

**Text:** "…n day, the 27th October, **1862,** the two were arrested si…"



 ___

**Text:** "…lin D Roosevelt, Section **7.**…"



 ___

**Text:** "…This price included **$19.95** for the rifle and the sc…"



 ___

**Text:** "…ixth floor approximately **35** minutes before the assas…"



 ___

**Text:** "…idate Johnson during the **1960** campaign,…"



 ___

**Text:** "…uilding at approximately **12:33** p.m., Lee Harvey Oswald …"



 ___

**Text:** "…atrolman Tippit at about **1:16** p.m.:…"



 ___

## Special Cases

In [4]:
import os

from IPython.display import Markdown
from IPython.display import FileLink

lookup = {
    'LJ044-0055': ('544 Camp Street New', 'five four four Camp Street New'),
    'LJ028-0180': ('In the year 562', 'In the year five sixty-two'),
    'LJ047-0063': ('602 Elsbeth Street', 'six oh two Elsbeth Street'),
    'LJ047-0160': ('411 Elm Street', 'four one one Elm Street'),
    'LJ047-0069': ('214 Neely Street', 'two one four Neely Street'),
    'LJ040-0121': ('P.S. 117', 'P.S. one seventeen'),
    'LJ032-0036': ('No. 2,202,130,462', 'No. two two zero two one three zero four six two'),
    'LJ029-0193': ('100 extra off-duty', 'one hundred extra off-duty'),
}

def special_cases():
    for row in data:
        basename = os.path.basename(row['wav']).split('.')[0]
        if basename in lookup:
            original = row['text']
            row['text'] = row['text'].replace(*lookup[basename])
            display(Markdown(original + ' → ' + row['text']))

special_cases()

In the year 562, after a long reign of forty-three years, Nebuchadnezzar died. → In the year five sixty-two, after a long reign of forty-three years, Nebuchadnezzar died.

to call in 100 extra off-duty officers to help protect President Kennedy. → to call in one hundred extra off-duty officers to help protect President Kennedy.

purchased as No. 2,202,130,462 in Dallas, Texas, on March 12, 1963. → purchased as No. two two zero two one three zero four six two in Dallas, Texas, on March 12, 1963.

On September 30, 1952, Lee enrolled in P.S. 117 → On September 30, 1952, Lee enrolled in P.S. one seventeen

While the legend, quote, FPCC, 544 Camp Street New Orleans, Louisiana, end quote, → While the legend, quote, FPCC, five four four Camp Street New Orleans, Louisiana, end quote,

Agent Hosty was told by Mrs. M. F. Tobias, a former landlady of the Oswalds at 602 Elsbeth Street in Dallas, → Agent Hosty was told by Mrs. M. F. Tobias, a former landlady of the Oswalds at six oh two Elsbeth Street in Dallas,

that the Oswalds were living at 214 Neely Street in Dallas. → that the Oswalds were living at two one four Neely Street in Dallas.

found it to be 411 Elm Street. End quote. → found it to be four one one Elm Street. End quote.

## Time of the Day

In [5]:
regex = r'([0-9]{1,2}:[0-9]{1,2})'
find_examples(regex)

### Examples Captured by Regex

**Regex:** ([0-9]{1,2}:[0-9]{1,2})

**Number of Examples:** 84

**Number of Examples Shown:** 5



 ___

**Text:** "…At **1:35** p.m., after Governor Con…"



 ___

**Text:** "…imately 12 hours between **2:30** p.m., on November 22, an…"



 ___

**Text:** "…d at his watch and said "**12:30**" to the driver, Special …"



 ___

**Text:** "…showed the numerals **12:30** as the Vice-Presidential…"



 ___

**Text:** "…unloaded at **12:15** p.m.,…"



 ___

In [6]:
from IPython.display import Markdown
from functools import partial
from num2words import num2words

cases = [
    ('alone in the shop about 9:30', 'nine thirty'),
    ('San Antonio at 1:30 p.m.,', 'one thirty'),
    ('At 1:51 p.m., police car 2 report', 'one fifty-one'),
]

def replace(text, true):
    split = text.split(':')
    assert len(split) == 2
    words = [num2words(int(num)) for num in split]
    ret = ' '.join(words)
    display(Markdown(text + ' → ' + ret + ' (' + true + ')'))
    return ret

for text in cases:
    _iterate_and_replace(regex, text[0], partial(replace, true=text[1]))

9:30 → nine thirty (nine thirty)

1:30 → one thirty (one thirty)

1:51 → one fifty-one (one fifty-one)

## Ordinals

In [7]:
regex = r'([0-9]+(st|nd|rd|th))'
find_examples(regex)

### Examples Captured by Regex

**Regex:** ([0-9]+(st|nd|rd|th))

**Number of Examples:** 71

**Number of Examples Shown:** 5



 ___

**Text:** "…, end quote, on November **22nd**,…"



 ___

**Text:** "…ses found on the lawn at **10th** Street and Patton Avenue…"



 ___

**Text:** "…e prison was laid on the **10th** April, 1840, by the Marq…"



 ___

**Text:** "…ted at Kennington on the **24th** March.…"



 ___

**Text:** "…in on the morning of the **24th**.…"



 ___

In [8]:
from IPython.display import Markdown
from functools import partial

from num2words import num2words

cases = [('shortly before Lee\'s 13th birthday', 'thirteenth'), 
         ('On October 23rd, I had attended a ultra', 'twenty-third'),
         ('between May 1st, 1827,', 'first'),
         ('and 30th April, 1831,', 'thirtieth')]

def replace(text, true):
    digit = ''.join([c for c in text if c.isdigit()])
    ret = num2words(int(digit), ordinal=True)
    display(Markdown(text + ' → ' + ret + ' (' + true + ')'))
    return ret


for text in cases:
    _iterate_and_replace(regex, text[0], partial(replace, true=text[1]))

13th → thirteenth (thirteenth)

23rd → twenty-third (twenty-third)

1st → first (first)

30th → thirtieth (thirtieth)

## Money (dollars or pounds)

In [9]:
regex = r'(\S*([$£]{1}[0-9\,\.]+\b))'
find_examples(regex)

### Examples Captured by Regex

**Regex:** (\S*([$£]{1}[0-9\,\.]+\b))

**Number of Examples:** 128

**Number of Examples Shown:** 5



 ___

**Text:** "…ropriated one cheque for **£1400**,…"



 ___

**Text:** "…**£285,950**; in other words, that to…"



 ___

**Text:** "…robbery there were still **£6000** worth in the warehouse.…"



 ___

**Text:** "…ses of these agents cost **£15**, and another £10 were sp…"



 ___

**Text:** "…963, included an item of **$21.45**. Klein's shipping order …"



 ___

In [10]:
from IPython.display import Markdown

from functools import partial
from num2words import num2words

cases = [('rough diamonds valued at £4000.', 'four thousand pounds'), 
         ('inch BBL, unquote, cost $29.95.', 'twenty-nine dollars, ninety-five cents'),
         ('was indebted upwards of £50,000 subsequently stopped pay', 'fifty thousand pounds'),
         ('warden, whose income was £2372.', 'two thousand, three hundred seventy-two pounds'),
         ('plus $1.27', 'one dollar, twenty-seven cents'),
         ('$19.95,', 'nighteen dollars, nighty-five cents'),
         ('were out to the value of £367,800.', 'three hundred sixty-seven thousand and eight hundred pounds'),
         ('the offer of a reward of £1500 for the detection of the', 'fifteen hundred pounds'),
         ('of England notes for £1000 each,', 'one thousand pounds each'),
         ('of approximately $3,000,000 during that period', 'three million'),
         ('only afford to give £1750 for stones', 'one thousand seven-fifty pounds'),
         ('e surrender of the other £1200', 'one thousand, two hundred pounds')]

def replace(text, true):
    digit = text[1:].replace(',', '')
    ret = num2words(digit, to='currency', currency='USD')
    ret = ret.replace(', zero cents', '')
    ret = ret.replace('hundred and', 'hundred')
    if '£' in text:
        # num2words has bugs with their GBP current
        ret = ret.replace('dollar', 'pound')
        ret = ret.replace('cents', 'pence')
        ret = ret.replace('cent', 'penny')
    display(Markdown(text + ' → ' + ret + ' (' + true + ')'))
    return ret

for text in cases:
    _iterate_and_replace(regex, text[0], partial(replace, true=text[1]))

£4000 → four thousand pounds (four thousand pounds)

$29.95 → twenty-nine dollars, ninety-five cents (twenty-nine dollars, ninety-five cents)

£50,000 → fifty thousand pounds (fifty thousand pounds)

£2372 → two thousand, three hundred seventy-two pounds (two thousand, three hundred seventy-two pounds)

$1.27 → one dollar, twenty-seven cents (one dollar, twenty-seven cents)

$19.95 → nineteen dollars, ninety-five cents (nighteen dollars, nighty-five cents)

£367,800 → three hundred sixty-seven thousand, eight hundred pounds (three hundred sixty-seven thousand and eight hundred pounds)

£1500 → one thousand, five hundred pounds (fifteen hundred pounds)

£1000 → one thousand pounds (one thousand pounds each)

$3,000,000 → three million dollars (three million)

£1750 → one thousand, seven hundred fifty pounds (one thousand seven-fifty pounds)

£1200 → one thousand, two hundred pounds (one thousand, two hundred pounds)

In [11]:
# No more currency examples
find_examples(r'([$£])', replace=False)

### Examples Captured by Regex

**Regex:** ([$£])

**Number of Examples:** 0

**Number of Examples Shown:** 0



 ___

## PO Box Numbers & Serial Numbers

In [12]:
find_examples(r'([Bb]ox [0-9]+\b)')

### Examples Captured by Regex

**Regex:** ([Bb]ox [0-9]+\b)

**Number of Examples:** 14

**Number of Examples Shown:** 5



 ___

**Text:** "…e had rented post office **box 2915**, Dallas,…"



 ___

**Text:** "…Post Office **Box 2915**, Dallas, Texas, on March…"



 ___

**Text:** "…e address of post office **box 2915** in Dallas.…"



 ___

**Text:** "…as listed on post office **box 30061**, New Orleans,…"



 ___

**Text:** "…d had rented post office **box 30061** in New Orleans on June 3…"



 ___

In [13]:
find_examples(r'(\b[A-Za-z]+[0-9]+\b)')

### Examples Captured by Regex

**Regex:** (\b[A-Za-z]+[0-9]+\b)

**Number of Examples:** 16

**Number of Examples Shown:** 5



 ___

**Text:** "…ano rifle, serial number **C2766**,…"



 ___

**Text:** "…h the Mannlicher-Carcano **C2766** rifle, over 100 rounds o…"



 ___

**Text:** "… Commando, serial number **V510210**, end quote,…"



 ___

**Text:** "…bearing serial number **C2766**.…"



 ___

**Text:** "… internal control number **VC836** on this rifle.…"



 ___

In [14]:
from IPython.display import Markdown

from functools import partial
from num2words import num2words

cases = [('Post Office Box 2915, Dallas, Texas, on March', 'two nine one five'), 
         ('Post Office Box 30016, New Orleans', 'three zero zero one six'),
         ('serial No. C2766, which was also found', 'C two seven six six'),
         ('control number VC836, serial number', 'V C eight three six'),
         ('Commando, serial number V510210, end quote', 'V five one zero two one zero')]

def replace(text, true):
    split = text.split(' ')
    ret = [num2words(int(t)) if t.isdigit() else t for t in list(split[-1])]
    ret = ' '.join(ret)
    if len(split) == 2:
        ret = split[0] + ' ' + ret
    display(Markdown(text + ' → ' + ret + ' (' + true + ')'))
    return ret

for regex in [r'([Bb]ox [0-9]+\b)', r'(\b[A-Za-z]+[0-9]+\b)']:
    for text in cases:
        _iterate_and_replace(regex, text[0], partial(replace, true=text[1]))

Box 2915 → Box two nine one five (two nine one five)

Box 30016 → Box three zero zero one six (three zero zero one six)

C2766 → C two seven six six (C two seven six six)

VC836 → V C eight three six (V C eight three six)

V510210 → V five one zero two one zero (V five one zero two one zero)

## Year

In [15]:
regexes = [r'(\b[0-9]{4}\b)', r'\b(?:in|In) ([0-9]{3})\b', r'\b([0-9]{3}) B\.C\b']
for regex in regexes:
    find_examples(regex)

### Examples Captured by Regex

**Regex:** (\b[0-9]{4}\b)

**Number of Examples:** 582

**Number of Examples Shown:** 5



 ___

**Text:** "…In **1813**…"



 ___

**Text:** "…al Consolidation Acts of **1861**.…"



 ___

**Text:** "…o Texas in late November **1963**.…"



 ___

**Text:** "…In **1818** prisoners awaiting trial…"



 ___

**Text:** "…This was in May **1842**.…"



 ___

### Examples Captured by Regex

**Regex:** \b(?:in|In) ([0-9]{3})\b

**Number of Examples:** 13

**Number of Examples Shown:** 5



 ___

**Text:** "…and in **555** Nabonidus, the father of…"



 ___

**Text:** "…In **597**, when he sent his army t…"



 ___

**Text:** "…The next year, in **605**, Nabopolassar died, and …"



 ___

**Text:** "…In **538** the city fell, and for a…"



 ___

**Text:** "…he city rebelled, and in **484** B.C. he captured it, and…"



 ___

### Examples Captured by Regex

**Regex:** \b([0-9]{3}) B\.C\b

**Number of Examples:** 2

**Number of Examples Shown:** 2



 ___

**Text:** "…t Babylon, writing about **250** B.C.,…"



 ___

**Text:** "…and there on June 13, **323** B.C., he met his death.…"



 ___

In [16]:
from IPython.display import Markdown

from functools import partial
from num2words import num2words

cases = [('dated April XXXX, 1787, describing an', 'seventeen eighty-seven'), 
         ('Newgate down to 1818,', 'eighteen eighteen'),
         ('It was about 2250 B.C., when the great', 'twenty-two fifty'),
         ('In 597, when he sent his army', 'five ninety-seven'),
         ('writing about 250 B.C.', 'two fifty'),
         ('In 606, Nineveh', 'six oh-six'),
         ('June 13, 323 B.C.,', 'three twenty-three')]

def replace(text, true):
    ret = num2words(int(text), lang='en', to='year')
    display(Markdown(text + ' → ' + ret + ' (' + true + ')'))
    return ret


for regex in regexes:
    for text in cases:
        _iterate_and_replace(regex, text[0], partial(replace, true=text[1]))


1787 → seventeen eighty-seven (seventeen eighty-seven)

1818 → eighteen eighteen (eighteen eighteen)

2250 → twenty-two fifty (twenty-two fifty)

597 → five ninety-seven (five ninety-seven)

606 → six oh-six (six oh-six)

250 → two fifty (two fifty)

323 → three twenty-three (three twenty-three)

## Numero (no.)

In [17]:
regex = r'(?:No|no)\. ([0-9]+)'
find_examples(regex)

### Examples Captured by Regex

**Regex:** (?:No|no)\. ([0-9]+)

**Number of Examples:** 29

**Number of Examples Shown:** 5



 ___

**Text:** "…uld not test Exhibit No. **133**-A in the same way becaus…"



 ___

**Text:** "…d along them towards No. **1**, Newgate Street.…"



 ___

**Text:** "…Commission Exhibit No. **133**-B, to Oswald's Imperial …"



 ___

**Text:** "…se pictures, Exhibit No. **133**-A, shows most of the rif…"



 ___

**Text:** "…d Commission Exhibit No. **162** as the light-colored jac…"



 ___

In [18]:
from IPython.display import Markdown

from functools import partial
from num2words import num2words

cases = [('Commission Exhibit No. 133-B,', 'one thirty-three'), 
         ('Commission Exhibit No. 162 as', 'one sixty-two')]

def replace(text, true):
    ret = num2words(int(text), lang='en', to='year')
    display(Markdown(text + ' → ' + ret + ' (' + true + ')'))
    return ret

for text in cases:
    _iterate_and_replace(regex, text[0], partial(replace, true=text[1]))

133 → one thirty-three (one thirty-three)

162 → one sixty-two (one sixty-two)

## Other Numbers

In [19]:
find_examples(r'(\b[0-9]{1}[0-9\.\,]{0,}\b)', display_n=50)

### Examples Captured by Regex

**Regex:** (\b[0-9]{1}[0-9\.\,]{0,}\b)

**Number of Examples:** 1171

**Number of Examples Shown:** 50



 ___

**Text:** "…(**2**) testimony of firearms i…"



 ___

**Text:** "…nd B. Tom Carter on June **26**, XXXX,…"



 ___

**Text:** "…ld's actions on November **22**, XXXX.…"



 ___

**Text:** "…or class (**1**), with whom, under the i…"



 ___

**Text:** "…A. J. Hidell, aged **28**, end quote. The date of …"



 ___

**Text:** "…y landscaped triangle of **3** acres.…"



 ___

**Text:** "…On November **4**, Sorrels told Behn he be…"



 ___

**Text:** "…Based on (**1**) the contents of the not…"



 ___

**Text:** "…X foot XX inches, weight **165** pounds, end quote.…"



 ___

**Text:** "…X in New Orleans on June **3**, XXXX,…"



 ___

**Text:** "…fold. On November XX and **23**, Oswald refused to tell …"



 ___

**Text:** "…nk deposit made on March **13**, XXXX, included an item …"



 ___

**Text:** "…d Possible Motives, Part **6**.…"



 ___

**Text:** "…accumulated over a **20**-year period, some of whi…"



 ___

**Text:** "…tayed approximately X to **5** car lengths ahead of the…"



 ___

**Text:** "…total of between XXX and **5.6** seconds between the two …"



 ___

**Text:** "…Dallas, Texas," on March **20**, XXXX.…"



 ___

**Text:** "…iday, November XX, about **1** p.m., he entered the hou…"



 ___

**Text:** "…In addition, the **3** degree downward slope of…"



 ___

**Text:** "…n Washington on November **29**, mounted on a card on wh…"



 ___

**Text:** "…more than **5,000** names were referred to t…"



 ___

**Text:** "…y stated, I have between **25** and 40 cases assigned to…"



 ___

**Text:** "…February XX through June **30**, XXXX,…"



 ___

**Text:** "…Mrs. Lillian Murret, for **2** or 3 weeks.…"



 ___

**Text:** "…on April **10**, XXXX.…"



 ___

**Text:** "…for New Orleans on April **24**, XXXX.…"



 ___

**Text:** "…**10**. I left you as much mone…"



 ___

**Text:** "…arged with misdemeanors. **5**. Debtors.…"



 ___

**Text:** "…d Possible Motives, Part **4**.…"



 ___

**Text:** "…rthur Griffiths. Section **12**: Executions, part two.…"



 ___

**Text:** "…After about a **2**-week separation, Marina …"



 ___

**Text:** "…ved there about XXXXX to **1** p.m.…"



 ___

**Text:** "…PRS received items in **8,709** cases.…"



 ___

**Text:** "…, X pence, with costs of **7** shillings, 6 pence. Quot…"



 ___

**Text:** "…t regarded approximately **100** of these 400 cases as se…"



 ___

**Text:** "…and Engels at the age of **15**, the conclusive thing th…"



 ___

**Text:** "…tion on Monday, November **18**, XXXX,…"



 ___

**Text:** "…e costs were X shillings **10** pence.…"



 ___

**Text:** "…was reviewed on November **8**,…"



 ___

**Text:** "… November XX to November **18**, when he was joined by A…"



 ___

**Text:** "…detained for X shilling **9** pence, with costs of 5 s…"



 ___

**Text:** "…required XXXX, XXXX, and **7** seconds.…"



 ___

**Text:** "…and John Lancaster for **1** shilling, 8 pence, with …"



 ___

**Text:** "…ons described in chapter **3**.…"



 ___

**Text:** "…Soon afterward, at about **3** p.m., police officers ar…"



 ___

**Text:** "…nterviewed by FBI agents **2** months after the shootin…"



 ___

**Text:** "…ile stayed approximately **4** to 5 car lengths ahead o…"



 ___

**Text:** "…shots from the weapon at **15** yards in 6, 7, and 9 sec…"



 ___

**Text:** "…icles of Newgate, Volume **2**. By Arthur Griffiths. Se…"



 ___

**Text:** "…ly after serving out his **3** years in the U.S. Marine…"



 ___

In [20]:
from IPython.display import Markdown

from functools import partial
from num2words import num2words

cases = [('Chapter 4. The Assassin:', 'four'), 
         ('the morning of November 22 prior to the motorcade', 'twenty-two'),
         ('was shipped on March 20, and the shooting', 'twenty'),
         ('Kennedy in the neck at 176.9', 'one hundred seventy-six point nine'), 
         ('distance of 265.3 feet was, quote', 'two hundred sixty-five point three'),
         ('ries they required XXXX, 6.45,', 'six point four five'),
         ('information on some 50,000 cases', 'fifty thousand'), 
         ('actually had only 1,000 printed.', 'one thousand'),
         ('PRS received items in 8,709 cases', 'eight thousand, seven hundred nine'),
         ('debtors and 182 felons,', 'one hundred eighty-two')]

def replace(text, true):
    text = text.replace(',', '')
    ret = num2words(float(text))
    ret = ret.replace('hundred and', 'hundred')
    display(Markdown(text + ' → ' + ret + ' (' + true + ')'))
    return ret

for text in cases:
    _iterate_and_replace(r'(\b[0-9\.\,]+\b)', text[0], partial(replace, true=text[1]))

4 → four (four)

22 → twenty-two (twenty-two)

20 → twenty (twenty)

176.9 → one hundred seventy-six point nine (one hundred seventy-six point nine)

265.3 → two hundred sixty-five point three (two hundred sixty-five point three)

6.45 → six point four five (six point four five)

50000 → fifty thousand (fifty thousand)

1000 → one thousand (one thousand)

8709 → eight thousand, seven hundred nine (eight thousand, seven hundred nine)

182 → one hundred eighty-two (one hundred eighty-two)

## Roman Numbers

In [21]:
find_examples(r'\b(?:George|Charles|Napoleon|Henry|Nebuchadnezzar|William) ([IV]+\.{0,})')

### Examples Captured by Regex

**Regex:** \b(?:George|Charles|Napoleon|Henry|Nebuchadnezzar|William) ([IV]+\.{0,})

**Number of Examples:** 22

**Number of Examples Shown:** 5



 ___

**Text:** "…irements of the X George **IV.**…"



 ___

**Text:** "…tried to stab George **III.** as he was alighting from…"



 ___

**Text:** "…wever, was the XX George **III.** c. XX, s. X (XXXX)…"



 ___

**Text:** "…nd in XXX Nebuchadnezzar **III.**, a native Babylonian, wa…"



 ___

**Text:** "… as the reign of Charles **II.**, a law was passed declar…"



 ___

In [22]:
from IPython.display import Markdown

from functools import partial
from num2words import num2words

cases = [('reign of Charles II., a law was passed', 'the second'), 
         ('William IV. was also the victim', 'the forth'),
         ('the reign of Henry VIII. a new and most', 'the eighth')]

def replace(text, true):
    if text[-1] == '.':
        text = text[:-1]
        
    num = 0
    if 'V' not in text:
        num = len(text)
    elif 'IV' == text:
        num = 4
    else:
        num = 5 + len(text) - 1
        
    ret = 'the ' + num2words(int(num), to='ordinal')
    display(Markdown(text + ' → ' + ret + ' (' + true + ')'))
    return ret

for text in cases:
    _iterate_and_replace(r'\b(?:George|Charles|Napoleon|Henry|Nebuchadnezzar|William) ([IV]+\.{0,})',
                         text[0], partial(replace, true=text[1]))

II → the second (the second)

IV → the fourth (the forth)

VIII → the eighth (the eighth)