## Encoding words (phoneme strings)

Let's use the SPA to encode words,
or any arbitrary string of phonemes.
Consider the word BAT.

$$\text{WORD} = \text{PH1} \circledast \text{B} \oplus \text{PH2} \circledast \text{A} \oplus \text{PH3} \circledast \text{T} \oplus \text{PH4} \circledast \text{STOP}$$

Then, we can decode the phone that we want
by binding with the inverse of the
phone that we're on.
So, for the first phone:

$$\text{B} \approx \text{WORD} \circledast \text{PH1}^{-1}$$

This result would then be sent to a cleanup memory.
Unlike other cleanup memories, however,
the outputs of the cleanup would be oscillators.
These would generate the trajectory of articulators
for each phoneme.

As always, timing is the difficult thing here.
Since these are oscillators, we really want to
just give them a kick, so the cleanup shouldn't
be cleaning up all the time, just when
we query for the next phoneme.
The kick should also advance the query
to the next phone,
so that the next kick gets the next phone.

We'd need to build in some structure to the SPs
representing phone positions.

$$\text{PH2} = \text{PH1} \circledast \text{NEXTPH}$$

And so on.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import nengo
from nengo import spa
from nengo.spa import Vocabulary
import nengo_gui.ipython

In [None]:
# Number of dimensions for the Semantic Pointers
dimensions = 128

# Change the seed of this RNG to change the vocabulary
rng = np.random.RandomState(0)

# Assume a maximum of 8 phonemes. There are words with more,
# but I would argue that you can make up those words from
# smaller word parts chained together.
# `PH2` to equal the convolution of `NEXTPH` with `PH1`, and so on.
vocab = Vocabulary(dimensions=dimensions, rng=rng, unitary=['INC', 'PH1'])
vocab.parse('INC')
vocab.parse('PH1')
for i in xrange(2, 9):
    vocab.add('PH%d' % i, vocab.parse('PH%d * INC' % (i-1)))

# English has 42 phonemes. That's nothing!
# These are all the phonemes.
#
# /A/ /a/ /b/ /k/ /d/ /E/ /e/ /f/ /g/ /h/ /I/ /i/ /j/ /l/ /m/ /n/ /O/ /o/ /p/ /kw/ /r/ /s/ /t/
# /U/ /u/ /v/ /w/ /ks/ /y/ /z/ /OO/ /oo/ /oi/ /ou/ /aw/ /ar/ /sh/ /hw/ /ch/ /th/ /ng/ /zh/
#
# Since SPs are all caps, we'll adopt the convention of doubling a letter for a capital;
# e.g., /O/ becomes OO. We'll use underscores between letters to avoid ambiguities.
#
# TODO: build in something to do with vowels vs consonants?

phs = ('AA', 'A', 'B', 'K', 'D', 'EE', 'E', 'F', 'G', 'H', 'II', 'I', 'J',
       'L', 'M', 'N', 'OO', 'O', 'P', 'K_W', 'R', 'S', 'T',
       'UU', 'U', 'V', 'W', 'K_S', 'Y', 'Z', 'OO_OO', 'O_O', 'O_I',
       'O_U', 'A_W', 'A_R', 'S_H', 'H_W', 'C_H', 'T_H', 'N_G', 'Z_H')
for ph in phs:
    vocab.parse(ph)
vocab.parse('STOP')  # special one to denote end of word

just_phones = vocab.create_subset(phs + ('STOP',))

In [None]:
difference_gain = 15
neuron_per_d = 100

with spa.SPA() as model:
    model.word = spa.Buffer(dimensions=dimensions, neurons_per_dimension=100)

    # ph_idx uses the two WM model from
    # https://github.com/ctn-waterloo/summerschool2015/blob/master/tutorials/memory/Notebooks/4.UsingWM.ipynb
    model.ph_idx1 = nengo.networks.InputGatedMemory(
        neuron_per_d, dimensions=dimensions, difference_gain=difference_gain)
    model.ph_idx2 = nengo.networks.InputGatedMemory(
        neuron_per_d, dimensions=dimensions, difference_gain=difference_gain)
    nengo.Connection(model.ph_idx1.output, model.ph_idx2.input) # Purple line
    nengo.Connection(model.ph_idx2.output, model.ph_idx1.input,
                     transform=vocab['INC'].get_convolution_matrix())

    model.kick = nengo.Ensemble(100, dimensions=1,
                                encoders=nengo.dists.Choice([[1]]),
                                intercepts=nengo.dists.Uniform(0.2, 1))
    nengo.Connection(model.kick, model.ph_idx1.gate)
    # bias so that model.ph_idx2.gate gets 1 - kick
    nengo.Connection(nengo.Node(1, label='kick bias'), model.ph_idx2.gate)
    nengo.Connection(model.kick, model.ph_idx2.gate, transform=[-1])
    
    # phone = word * ~ph_idx
    model.bind = nengo.networks.CircularConvolution(
        neuron_per_d, dimensions, invert_a=False, invert_b=True)
    nengo.Connection(model.word.state.output, model.bind.A)
    nengo.Connection(model.ph_idx1.output, model.bind.B)
    
    # For quick testing
    nengo.Connection(nengo.Node(lambda t: 1 if (t % 0.2) > 0.1 else 0, label='manual kick'), model.kick)
    init_ph = nengo.Node(lambda t: vocab.parse('PH1').v if t < 0.2 else vocab.parse('0').v, label='init phone')
    nengo.Connection(init_ph, model.ph_idx1.input)
    nengo.Connection(nengo.Node(vocab.parse('PH1*B + PH2*A + PH3*D + PH4*STOP').v, label="word in"),
                     model.word.state.input)

    kick_p = nengo.Probe(model.kick, synapse=0.01)
    phidx_p = nengo.Probe(model.ph_idx1.output, synapse=0.01)

Normally, we use the associative memory simply
to clean up to the correct semantic pointer.
In this model, we'll leverage the fact that
our clean up ensembles are already set up
to have a dead zone that it needs to be
kicked out of;
we will change the clean up ensembles
such that they essentially
do what the oscillators in
the 'synth choices' notebook do,
with the exception of the `'STOP'`
pointer, which will pass through as normal.

In [None]:
with model:
    model.phone = spa.AssociativeMemory(just_phones)  # modify them here...
    nengo.Connection(model.bind.output, model.phone.input)

    phone_p = nengo.Probe(model.phone.output, synapse=0.01)

The real engineering here is in how these clean up ensembles
interact with the kick.
The kick needs to start whenever
we don't have an oscillator going.
So, we have all of the oscillators
inhibit the kick
(it also uses modified intercepts,
so a decoded connection inhibits).
Since the `'STOP'` ensemble also inhibits the kick,
we will 

In [None]:
# Do that

In order to start an utterance, then,
we just need to feed a word to `model.word`
and `'PH1'` to `model.ph_idx1`.
This should unbind to the right pointer,
which will go through the cleanup and
start that phoneme's oscillator.
Once the oscillator is done (or nearly done),
activity will cease,
uninhibiting the kick ensemble.
That causes the next phoneme index
to be loaded up,
which will unbind to a new pointer,
starting that phoneme's oscillator.

In [None]:
# nengo_gui.ipython.IPythonViz(model, cfg='word.cfg')

In [None]:
sim = nengo.Simulator(model)
sim.run(.8)

In [None]:
plt.figure(figsize=(12, 12));
    
plt.subplot(3, 1, 1);
plt.plot(sim.trange(), 
         spa.similarity(sim.data[phidx_p],
                        vocab.create_subset(['PH1', 'PH2', 'PH3', 'PH4'])))
legend = plt.legend(['PH1', 'PH2', 'PH3', 'PH4', 'STOP'])
legend.get_frame().set_facecolor('1')
plt.ylim([-0.3, 1.1])

plt.subplot(3, 1, 2)
plt.plot(sim.trange(), sim.data[kick_p])
plt.ylim([-0.1, 1.1])

plt.subplot(3, 1, 3)
plt.plot(sim.trange(), 
         spa.similarity(sim.data[phone_p],
                        vocab.create_subset(['B', 'A', 'D', 'STOP'])))
legend = plt.legend(['B', 'A', 'D', 'STOP'])
legend.get_frame().set_facecolor('1')