> The general case can be described as follows: There
exist a finite number of possible “states” of a system; S1 S2 Sn. In addition there is a set of transition
probabilities; pi j the probability that if the system is in state Si it will next go to state Sj. To make this
Markoff process into an information source we need only assume that a letter is produced for each transition
from one state to another.

- A Mathematical Theory of Information, Claude Shannon
- https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf

In [4]:
from typing import Sequence, Iterator
from collections import defaultdict, Counter
import random 
from textwrap import fill 

from itertools import islice

rng = random.Random()

Open a corpus, in this case 'Alice in Wonderland'

In [5]:
words=open("../texts/alice.txt").read().split()

Iterate through the corpus in pairs

In [6]:
def pairs(sequence: Sequence[str]) -> Iterator[tuple[str, str]]:
    a = None
    for b in sequence:
        if a is not None:
            yield (a, b)
        a = b


Create a data structure that tracks the statistical structure of these pairs: i.e. for any given word, which words typically follow it?

In [7]:
counters = defaultdict(lambda: Counter())
for a, b in pairs(words):
    counters[a][b]+=1



In [8]:
counters["up"]

Counter({'and': 20,
         'to': 7,
         'into': 7,
         'the': 6,
         'in': 6,
         'by': 3,
         'like': 3,
         'at': 3,
         'on': 2,
         'this': 2,
         'very': 2,
         'again': 2,
         'somewhere.”': 1,
         'one': 1,
         'my': 1,
         'again,': 1,
         'eagerly,': 1,
         'now,”': 1,
         'any': 1,
         'his': 1,
         'I': 1,
         'a': 1,
         'again!': 1,
         'towards': 1,
         'against': 1,
         'closer': 1,
         'both': 1,
         'as': 1,
         'again.)': 1,
         'with': 1})

Starting with a random word, produce words based only on:
* The current word (i.e. the state of the process)
* Our constructed probability table

In [9]:
def process()->Iterator[str]:
    current = rng.choice(words)
    while True:
        yield current 
        counter = counters[current]
        # make a weighted selection from words
        # that typically follow this one:
        choices, weights = zip(*counter.items())
        current = rng.choices(choices, weights)[0]


In [10]:
p = process()
print(next(p))
print(next(p))
print(next(p))

with
one
foot


In [11]:
print(fill(" ".join(list(islice(process() ,250)))))

as there were all because they’re making such a coaxing tone, “so now
about half expecting nothing so much sooner than ever: she helped
herself in the first speech. “You are _these?_” said a delightful
thing I must have put their heads are tarts on the middle of the
Pigeon, raising its wings. “Serpent!” screamed the soldiers, or “Off
with a delightful it would be a great relief. “Now tell you!” said the
most confusing thing I say,” added the Dodo, a dunce? Go on!” Alice
would go with a cart-horse, and they all spoke to ask them into little
chin into the King. “Nearly two people. “But they repeated their
shoulders, that cats if he knows it so much pleasanter at this, and
under the verses to say ‘Who am I had taught them: such a whisper,
half hoping she thought. “I was dozing off, for him: the next question
the door into the same thing is another question. “Why did they liked,
so often, you liked.” “Is that she was surprised to be true): If she
was just as it had made. “He must be quite 