Построение модели

In [16]:
from collections import Counter, defaultdict

class LanguageModel:
    def __init__(self, data, order=4):
        self.order = order
        self.ngrams = defaultdict(Counter)
        pad = '~' * order
        data = pad + data
        ### YOUR CODE HERE
        # For each ngram in data count all characters following this ngram.
        # For instance for oder = 2 and data = 'abcbcb' self.ngrams should be the following:
        # self.ngrams['~~']['a'] == 1
        # self.ngrams['~a']['b'] == 1
        # self.ngrams['ab']['c'] == 1
        # self.ngrams['bc']['b'] == 2
        # self.ngrams['cb']['c'] == 1
        
        for i in range(len(data) - order):
            ngram = data[i: i + order]
            char = data[i + order]
            self.ngrams[ngram][char] += 1       
        
        ### END YOUR CODE
        self.lm = {history: self.normalize(chars) for history, chars in self.ngrams.items()}
             
        
    
    def normalize(self, counter):
        ### YOUR CODE HERE
        # Normalize entries of counter.
        # For instance if you have Counter('a', 'b', 'a', 'a')
        # you should return the following list:
        # [('a', 0.75), ('b', 0.25)]
        
        total = sum(counter.values())
        items = counter.items()
        norm = []
        for char, count in items:
            norm.append((char, count/total))
        
        return norm

        ### END YOUR CODE
    
    def __getitem__(self, history):
        return self.lm[history]

In [None]:
Простые тесты:

In [17]:
lm = LanguageModel('abcabdabc', order=2)

In [18]:
lm['ab']

[('c', 0.6666666666666666), ('d', 0.3333333333333333)]

Модель на Шекспире

In [None]:
(Модель обучалась на сокращенном корпусе shakespeare_short, так как для большого не хватало памяти)

In [27]:
with open('shakespeare_short.txt', 'r') as fin:
    lm = LanguageModel(fin.read())

In [28]:
lm['ello']

[('r', 0.05), ('u', 0.05), ('w', 0.9)]

In [29]:
lm['firs']

[('t', 1.0)]

In [None]:
Генерация текста

In [22]:
from random import random
import numpy

def generate_letter(lm, history):
    history = history[-lm.order:]
    ### YOUR CODE HERE
    # Generate the next character according to the history.
    # Don't forget to use your probabilities!
    # Sample the next letter according to your probability distribution.
    
    candidates = lm[history]
    chars = [i[0] for i in candidates]
    probs = [i[1] for i in candidates]
    
    res = numpy.random.choice(chars, p=probs)
    
    return res
    
    
    ### END YOUR CODE
        
def generate_text(lm, n_letters=1000):
    history = '~' * lm.order
    out = []
    ### YOUR CODE HERE
    # Generate random text and stash it into out variable.

    #out.append(history)
    for i in range(n_letters):
        char = generate_letter(lm, history)
        out.append(char)
        history = history[1:] + char
    
    ### END YOUR CODE
    return ''.join(out)

In [None]:
Тесты:

In [26]:
with open('shakespeare_short.txt', 'r') as fin:
    lm = LanguageModel(fin.read())
    
print(generate_text(lm, 1000))

First Senators, and some charge he is come.

AUFIDIUS:
On fairly.

CORIOLANUS:
Hear Come, you well, which
in ther.

VOLUMNIA:
How? who way will the voluptuously.

MENENIUS:
I have as if I besider ancient often his cause had chievingman:
Therefore thine times
To held; and battles in alike the flouts well.

CORIOLANUS:
Say, more the will not alone sea was 'tis the scents!

MENENIUS:
Master it? Prithere upon him out.

MENENIUS:
Cholesome I am hang 'em in conside only surcease you noble home, get you,
Let me news?

First Senators, led with these this sworn an 't:
Pray your enterrily and I'll party, 'tis these shall's vouch'd, what loves, am a bowl thin.

SICINIUS:
Say you make it, I think 'twas necessity
Than I send me by creater and god: he's good it go about as Coriolanus.

COMINIUS:
This dearth the rive!

MENENIUS:
Nay, your Roman; in love forwards twenty, I prison writes, our body, which doit, Corioli gates
Which in our own you reads must
Confusion on much a rod to these ask'd of your 

In [15]:
with open('shakespeare_short.txt', 'r') as fin:
    lm = LanguageModel(fin.read(), 8)
    
print(generate_text(lm, 2000))

First Citizen:
An 'twere to gird the gods, keep you in free contempt
When he hath said
Which were inshell'd when Marcius home against thy valour. Know thou hast a grim appearance, and shut your loves,
Cog their god: he leads them like a hare.

MARCIUS:

COMINIUS:
What, what, what? his choler
And pass'd him unelected.

BRUTUS:
If it were son
and heir to Mars; set up the blood upon yourselves. What do you two
have not indeed loved
the people,
Permitted by our putting him to the people! Coriolanus; never more
To enter our Rome embraced with all the point of battle;
The one half of my common cry of curs! whose bed, whose meal, and exercise,
Are still and wonder,
When one but the bran.' What says the other.

Third Servingman:
Where is their vulgar station: or veil'd till when
They needs must show them the unaching scars which caused
Our swifter composition.

CORIOLANUS:
I am known to the good horse is mine.

MARCIUS:
'Tis not to call us the tribune.

MENENIUS:
As with a voice as free
As wor

In [25]:
with open('shakespeare_short.txt', 'r') as fin:
    lm = LanguageModel(fin.read(), 16)
    
print(generate_text(lm, 2000))

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
Our business is not unknown to the senate; they have
had inkling this fortnight what we intend to do,
which now we'll show 'em in deeds. They say poor
suitors have strong breaths: they shall know our mind: away!

COMINIUS:
Breathe you, my friends: well fought;
we are come off
Like Romans, neither foolish in our stands,
Nor cowardly in retire: believe me, sirs,
We shall be charged again. Whiles we have struck,
By interims and conveying gusts we have heard
The charges of our friends. Ye Roman gods!
Lead their successes as we wish our own,
That both our powers, with smiling
fronts encountering,
May give you thankful sacrifice.
Thy news?

Messenger:
They lie in view; but have not spoke as yet.

LARTIUS:
So, let the ports be guarded: keep your duties,
As I have set them down. If I do send, dispatch
Those centuries to our aid: the rest will serve
For a short holding: if we lose the field,
We cann