In [1]:
import random

One way to understand the structure of [syllables](https://en.wikipedia.org/wiki/Syllable) is with [context free grammars](https://en.wikipedia.org/wiki/Context-free_grammar). A context free grammar is a set of production rules that maps symbolic tokens, called *non-terminal* symbols, to strings made up of other symbols. By repeatedly applying these rules until every non-terminal has been replaced by literal strings of *terminal* symbols, we can generate strings belonging to the [context free language](https://en.wikipedia.org/wiki/Context-free_language) of the grammar.

For example, this is grammar might be used to represent the syntax of a an individual syllable in a word:

In [2]:
syllables = {
    "Syllable" : ["Rime", "Onset Rime"],
    "Rime" : ["Nucleus", "Nucleus Coda"],
    "Onset" : ["Consonant"],
    "Nucleus" : ["Vowel", "Vowel Vowel"],
    "Coda" : ["Consonant", "Cluster"],
    "Consonant" : ["b", "p", "g", "k", "t", "d", "s", "z", "m", "n"],
    "Vowel" : ["a", "e", "i", "o"],
    "Cluster" : ["ts", "ks"],
}

To generate a word that belongs to this language, we start with a list containing one initial non-terminal symbol, `["Syllable"]`, and randomly extend the list according to the rules of the grammar.

Every non-terminal maps to at least one string of other symbols, some of which are themselves non-terminal. These non-terminals can be expanded into sub-words by recursively applying the grammar's production rules to each of them. When the list contains only terminal symbols, we can stop and return the resulting word:

In [3]:
def generate_word(grammar, start, word=[]):
    if word == []:
        word = [start]
    
    for i, symbol in enumerate(word):
        if symbol in grammar:
            subwords = random.choice(grammar[symbol]).split()
            word[i] = generate_word(grammar, None, subwords)
    
    return word

`generate_word` returns a nested list, which preserves the structure of the word's parse tree. However, it's also useful to be able to flatten the result into a single list of symbols:

In [4]:
def flatten(lst):
    flat = []
    
    for item in lst:
        if isinstance(item, list):
            flat.extend(flatten(item))
        else:
            flat.append(item)
            
    return flat

Now we can generate syllables, which end up looking like short pronouncable gibberish:

In [5]:
for _ in range(10):
    word = generate_word(syllables, "Syllable")
    print(''.join(flatten(word)))

o
ae
di
kod
zi
a
zoots
ki
ed
gi


It's worth pointing out that our syllable grammar is non recursive; none of the production rules ever expands a non-terminal into a string that contains itself. This limits the grammar to a finite number of possible results; for syllables, this is exactly what we want!

For a simple example of a recursive grammar, we could extend the previous grammar to produce random words from our syllables:

In [6]:
words = {
    # here we say a word must have at least two syllables
    "Word" : ["Syllable - Word", "Syllable - Syllable"],
}

words.update(syllables)

for _ in range(10):
    word = generate_word(words, "Word")
    print(''.join(flatten(word)))

ats-ep-ez
ots-o-oa-o-e
its-nee-ie-zeit-ze
en-ob
mats-ok
i-ge-pa
kiiks-zet-keots-naots-gi
ots-bei-ob-i
pei-sits
ots-sots


Of course, formal grammars are applicable to natural language in general. As more complicated example, we could try to describe a subset of the English language with a context free grammar. In reality, English is context sensitive, so this approach isn't strong enough to produce sentences that are always grammatically correct, but we can create rough approximations which are at least recognizable to English speakers:

In [7]:
english = {
    "Sentence" : [
        "NounPhrase VerbPhrase"
    ],
    "NounPhrase" : [
        "Article QualifiedNoun", "QualifiedNoun"
    ],
    "VerbPhrase" : [
        "Verb Preposition NounPhrase", "Verb Article QualifiedNoun", "Verb"
    ],
    "QualifiedNoun" : [
        "Adjective QualifiedNoun", "Noun"
    ],
    "Article" : [
        "a", "an", "the"
    ],
    "Preposition" : [
        "in", "of", "on", "over", "who"
    ],
    "Noun" : [
        'actor', 'afternoon', 'appliance', 'bottle', 'box', 'bulb', 'bush',
        'car', 'clam', 'club', 'coal', 'committee', 'cushion', 'doctor', 'dog',
        'duck', 'earthquake', 'father', 'fox', 'frog', 'god', 'hall', 'hall',
        'harmony', 'horse', 'ink', 'iron', 'knot', 'money', 'month', 'noise',
        'number', 'page', 'ring', 'road', 'robin', 'roof', 'shade', 'shirt',
        'sister', 'snake', 'sneeze', 'tent', 'test', 'thing', 'tomato',
        'toothbrush', 'truck', 'vein'
    ],
    "Adjective" : [
        'addicted', 'afraid', 'angry', 'assorted', 'beneficial', 'black',
        'brave', 'burly', 'closed', 'confused', 'creepy','cruel', 'dazzling',
        'diligent', 'dramatic', 'elegant','empty', 'fancy', 'flowery', 'furry',
        'fuzzy', 'gigantic','grubby', 'hallowed', 'heavenly', 'jobless',
        'juicy', 'kindhearted', 'laughable', 'lewd', 'ludicrous', 'maniacal',
        'muddled', 'petite', 'quick', 'reflective', 'repulsive', 'rightful',
        'scandalous', 'scarce', 'tasty', 'thick', 'tremendous', 'unknown',
        'unsightly', 'young'
    ],
    "Verb" : [
        'attracts', 'carves', 'claims', 'collects', 'compares', 'concerns',
        'destroys', 'disapproves', 'educates', 'exists', 'extends', 'grips',
        'hugs', 'includes', 'increases', 'is', 'jumps', 'lightens', 'obeys',
        'occurs', 'performs', 'points', 'receives', 'recognises', 'soothes',
        'spills', 'taps', 'teases', 'traps', 'wails', 'wanders', 'whispers',
        'wonders', 'wriggles'
    ]
}

for _ in range(10):
    sentence = generate_word(english, "Sentence")
    print(' '.join(flatten(sentence)))

the car claims
month whispers
the ink increases the cruel quick duck
an confused repulsive thick fancy petite sister taps
vein wanders
an addicted empty afternoon wriggles on thing
iron claims a dog
burly assorted duck occurs a page
ink educates a noise
thick kindhearted clam performs in addicted rightful frog
