In this exercise, we will start to see how we can use a trained model on a corpus to look into how to help write new texts, explore options and alternatives, as assess how closely a sentence matches the style of the corpus.

In [13]:
import markovify

In [14]:
corpus = '../data/moby_dick.txt'

In [15]:

# Get raw text as string.
with open(corpus) as f:
    text = f.read()

# Build the model.
text_model = markovify.Text(text, state_size=3)

# Print five randomly-generated sentences
for i in range(5):
    print(text_model.make_sentence())

# Print three randomly-generated sentences of no more than 140 characters
for i in range(3):
    print(text_model.make_short_sentence(140))

Nor does Hogarth, in painting the same scene in his own absolute body the whale is moored alongside the whale-ship so that he died embalmed.
None
None
None
The Narwhale I have heard of Moby Dick—but it was not much chance to think over the matter, for Captain Peleg was now all quiescence, at least so far as could be expected.
“Here they saw such huge troops of whales, that they were nothing more than pieces of small squid bones embalmed in that manner.
Thus I soon engaged his interest; and from that live punch-bowl quaff the living stuff.”
But what it was to be the first time I ever _did_ pray.


In [16]:
corpus1 = corpus
corpus2 = '../data/lyrics.txt'

In [17]:
with open(corpus2) as f:
    text2 = f.read()

# Build the model.
text_model2 = markovify.Text(text2, state_size=3)

In [18]:
combined = markovify.combine([text_model, text_model2], [1, 50])

In [38]:
combined.make_short_sentence(128)

'Chowder for breakfast, and chowder for supper, till you began to look for it when it shall have ascended again.'

In [34]:
# Run this several times to see how the sentence can change
# Try different starts. Only 'recognised' starts will work, try taking it from a short sentence from the 
# cell above
text_model.make_sentence_with_start('For besides', strict=False)

'For besides the great length of the harpoon as compared with the lance, yet it is in the head, as in some part of his ivory leg.'

In [64]:
start = ('For', 'besides', 'the')
list(combined.chain.walk(init_state = start))

['great',
 'length',
 'of',
 'the',
 'entire',
 'whale',
 'fleet',
 'carefully',
 'collated,',
 'then',
 'the',
 'migrations',
 'of',
 'the',
 'sperm',
 'whale’s',
 'resorting',
 'to',
 'given',
 'waters,',
 'that',
 'many',
 'hunters',
 'believe',
 'that,',
 'could',
 'he',
 'be',
 'found,',
 'would',
 'seem',
 'the',
 'very',
 'man',
 'to',
 'dart',
 'his',
 'iron',
 'and',
 'lift',
 'his',
 'lance',
 'against',
 'the',
 'most',
 'appalling',
 'to',
 'mankind.']

In [46]:
BEGIN = "___BEGIN__"
END = "___END__"
init_state = None
state = init_state or (BEGIN,) * 3

In [49]:
combined.chain.begin_choices[:10]

('\ufeff',
 'By',
 'CHAPTER',
 'Loomings.',
 'The',
 'Breakfast.',
 'A',
 'Nightgown.',
 'Biographical.',
 'Wheelbarrow.')

In [66]:
state = ('length', 'of', 'the')
options = combined.chain.model[state].items()
print(options)

dict_items([('whaling', 1), ('Greenland', 1), ('boat,', 1), ('upper', 1), ('entire', 1), ('creature,', 1), ('Mediterranean,', 1), ('harpoon', 1)])


In [77]:
# What are is the likelihood of a specific word in a sentence?
1 / sum(list(dict(options).values()))

0.125