# <center>Other NLP Packages: spaCy and Gensim</center>

References: 
- https://nlpforhackers.io/complete-guide-to-spacy/
- https://radimrehurek.com/gensim/models/phrases.html

## 1. spaCy
- spaCy is a relatively new framework in the Python Natural Language Processing, but is getting popular
- Provides models for Part Of Speech tagging, Named Entity Recognition and Dependency Parsing
- Supports 8 languages out of the box
- Provides easy and beautiful visualizations
- PProvides pretrained word vectors
- installation:
  1. pip install spacy
  2. python -m spacy download en

In [1]:
# Exercise 1.1. Load package and language library

import spacy
nlp = spacy.load('en')


  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)


In [2]:
# Exercise 1.2. Get POS, lemmatization, and other NLP tasks all in one task

doc = nlp("Next week I'll be in Madrid.")
for token in doc:
    print("{0}\t{1}\t{2}\t{3}\t{4}\t{5}".format(
        token.text,         # original text
        token.lemma_,       # lemma
        token.is_punct,     # is it a punctuation ?
        token.is_space,     # is it a space
        token.pos_,         # The simple part-of-speech tag.
        token.tag_          # The detailed part-of-speech tag
    ))

Next	next	False	False	ADJ	JJ
week	week	False	False	NOUN	NN
I	-PRON-	False	False	PRON	PRP
'll	will	False	False	VERB	MD
be	be	False	False	VERB	VB
in	in	False	False	ADP	IN
Madrid	madrid	False	False	PROPN	NNP
.	.	True	False	PUNCT	.


In [3]:
# Exercise 1.3. Segment by sentences

doc = nlp("These are apples. These are oranges.")
 
for sent in doc.sents:
    print(sent)

These are apples.
These are oranges.


In [4]:
# Exercise 1.4. Entity Recognition

doc = nlp("Next week I'll be in Madrid.")
for ent in doc.ents:
    print(ent.text, ent.label_)

Next week DATE
Madrid GPE


In [5]:
# Exercise 1.5. Visulaize named entities

from spacy import displacy
 
doc = nlp('I just bought 2 shares at 9 a.m. because the stock went up 30% in just 2 days according to the WSJ')
displacy.render(doc, style='ent', jupyter=True)


In [6]:
# Exercise 1.6. Visualized dependency graph

from spacy import displacy
 
doc = nlp('Wall Street Journal just published an interesting piece on crypto currencies')
displacy.render(doc, style='dep', jupyter=True, options={'distance': 90})
 

## 2. gensim
- Gensim is an open source Python library for NLP, with a focus on topic modeling.
- It is not an everything-including-the-kitchen-sink NLP research library (like NLTK); instead, Gensim is a mature, focused, and efficient suite of NLP tools for topic modeling, including 
  - Word2Vec word embedding 
  - Topic modeling
  - Text preprocessing like **phrase extraction**
  
- Gensim Phrase Model: 
    - **gensim.models.phrases.Phrases(sentences, min_count, threshold, max_vocab_size, delimiter, scoring, ...)**
        - *sentences*: list of sentences or iterables, each of which can be a document
        - *min_count*: Ignore all words and bigrams with total collected count lower than this value.
        - *threshold*: Represent a score threshold for forming the phrases (higher means fewer phrases). A phrase of words $a$ followed by $b$ is accepted if the score of the phrase is greater than threshold. Heavily depends on concrete scoring-function.
        - *max_vocab_size*: Maximum size (number of tokens) of the vocabulary. 
        - *delimiter*: Glue character used to join collocation tokens, should be a byte string (e.g. b’\_’).
        - *scoring: Specify how potential phrases are scored. 
           - **default** - original_scorer(), by Mikolov et al. (2013) (https://arxiv.org/pdf/1310.4546.pdf)
           - **npmi** - npmi_scorer().

In [7]:
# Exercise 2.1. Find bigrams using gensim

import nltk
from nltk.collocations import *

from gensim.models.phrases import Phrases, Phraser

# load a built-in NLTK corpus as a list of words
words=nltk.corpus.gutenberg.words('austen-sense.txt')

# Train phrase model to find phrases using original_scorer
phrases = Phrases([words], min_count=5, threshold=10)

for phrase, score in phrases.export_phrases([words]):
    print(phrase, score)




b'. Their' 11.64303411473223
b'at Norland' 20.021396132777753
b'good opinion' 23.366226812009945
b'. The' 12.143253358179983
b'many years' 38.16597251145356
b'. But' 10.316775695849927
b'his own' 11.49898631823725
b'a great' 10.23511614384689
b'Mr .' 14.937097024945233
b'. In' 10.79970083290838
b"' s" 68.69154667529526
b'. His' 11.740059399021664
b'. The' 12.143253358179983
b'Mr .' 14.937097024945233
b'Mrs .' 15.194818559392427
b'Mr .' 14.937097024945233
b'. The' 12.143253358179983
b'young man' 98.03578592634197
b'had been' 16.907632048034525
b'his own' 11.49898631823725
b'soon afterwards' 102.84680134680134
b"' s" 68.69154667529526
b'. Their' 11.64303411473223
b'thousand pounds' 792.5661650233524
b'his own' 11.49898631823725
b"' s" 68.69154667529526
b'. The' 12.143253358179983
b'almost every' 18.349931582284817
b'. He' 10.296342287077485
b';-- but' 14.820204470629008
b'Mr .' 14.937097024945233
b'his wife' 29.14836582879698
b';-- but' 14.820204470629008
b"' s" 68.69154667529526
b'four 

b'the world' 14.291366936528226
b'." "' 23.798926347315316
b'It is' 21.756054131054128
b'," replied' 60.498118439294906
b'Mr .' 14.937097024945233
b"' s" 68.69154667529526
b"' s" 68.69154667529526
b"' s" 68.69154667529526
b'such a' 10.026163945754513
b'no means' 32.943809318377916
b"' s" 68.69154667529526
b'." "' 23.798926347315316
b'. They' 10.416634521313767
b'no more' 10.791937535330694
b'my own' 12.273759216721023
b'any thing' 47.53833113318974
b'may be' 13.850007934888572
b'." "' 23.798926347315316
b'I believe' 14.385958420238175
b'will be' 10.446974911790807
b'will be' 10.446974911790807
b'." "' 23.798926347315316
b'I am' 24.613436090781402
b'. The' 12.143253358179983
b'I dare' 26.250568307828786
b'they are' 19.116426068090785
b'my life' 17.204424557231366
b'my dear' 50.85915516408864
b'Mr .' 14.937097024945233
b'- law' 130.40257855191257
b'her daughters' 11.145958766648423
b'thousand pounds' 792.5661650233524
b'thousand pounds' 792.5661650233524
b'- piece' 15.17411823149528
b'- 

b'the whole' 10.788126015398744
b'well -' 12.234629359425513
b'. His' 11.740059399021664
b'can hardly' 27.37460791635549
b'at least' 53.05669975186104
b'?" "' 25.519811208998124
b'I shall' 14.622661181701638
b'do not' 10.728115801445009
b'tell me' 10.323494620627498
b'I shall' 14.622661181701638
b'no more' 10.791937535330694
b'had been' 16.907632048034525
b'. She' 11.556388279459801
b'. She' 11.556388279459801
b'; but' 15.937227835206098
b"' s" 68.69154667529526
b'. She' 11.556388279459801
b'her mother' 10.595141036901262
b'. She' 11.556388279459801
b'her sister' 12.36134256833083
b'do not' 10.728115801445009
b'," said' 80.0495254815154
b'burst forth' 361.48520710059177
b'- hearted' 50.074590163934424
b'Oh !' 75.72091101585498
b'worse than' 19.96437908496732
b'- hearted' 50.074590163934424
b'could not' 13.843658485566868
b'," said' 80.0495254815154
b'my own' 12.273759216721023
b'affection for' 10.653293806290135
b'. But' 10.316775695849927
b'I am' 24.613436090781402
b'no means' 32.9438

b'next day' 86.15397435897435
b'. They' 10.416634521313767
b'so much' 20.354581853297116
b'at Barton' 25.54897259319151
b'Lady Middleton' 544.4830659536542
b'more than' 31.765982484947997
b'. Her' 11.794664326458975
b'her husband' 16.33012563485699
b"' s" 68.69154667529526
b'. But' 10.316775695849927
b'would have' 11.795675902910675
b'well -' 12.234629359425513
b'Sir John' 354.89098213800963
b'Lady Middleton' 544.4830659536542
b'at home' 17.575718344301794
b'. In' 10.79970083290838
b'present case' 20.202050264550262
b'ten minutes' 216.63475177304963
b'his father' 10.820226709174637
b'every body' 84.61357340720222
b'every body' 84.61357340720222
b'Sir John' 354.89098213800963
b'the park' 12.130648363981697
b'next day' 86.15397435897435
b'Barton Park' 187.20429009193055
b'. The' 12.143253358179983
b'had passed' 12.12411685321902
b'at home' 17.575718344301794
b'. The' 12.143253358179983
b'. The' 12.143253358179983
b'Sir John' 354.89098213800963
b"' s" 68.69154667529526
b'the latter' 11.39

b'they were' 26.78341068029684
b'more than' 31.765982484947997
b'. They' 10.416634521313767
b'set off' 64.9558745348219
b'at first' 12.7904544044665
b'a few' 20.97065049487957
b'. He' 10.296342287077485
b'. She' 11.556388279459801
b'had been' 16.907632048034525
b'able to' 12.747857119926376
b'. The' 12.143253358179983
b'had been' 16.907632048034525
b'her mother' 10.595141036901262
b'fixed on' 26.358296351084814
b'Had he' 10.921296089385475
b'Mrs .' 15.194818559392427
b'would have' 11.795675902910675
b'; but' 15.937227835206098
b'. She' 11.556388279459801
b'. But' 10.316775695849927
b'Mrs .' 15.194818559392427
b'. His' 11.740059399021664
b'he replied' 10.707153028809289
b'at Allenham' 12.632547559966914
b'Miss Dashwood' 75.7576884920635
b'. The' 12.143253358179983
b'still more' 21.71738357625311
b'. His' 11.740059399021664
b'more than' 31.765982484947997
b'had seen' 13.509730207872623
b'. But' 10.316775695849927
b'had seen' 13.509730207872623
b'. His' 11.740059399021664
b'. His' 11.7400

b'able to' 12.747857119926376
b'well -' 12.234629359425513
b'well -' 12.234629359425513
b'I believe' 14.385958420238175
b'." "' 23.798926347315316
b'Miss Dashwood' 75.7576884920635
b'," cried' 58.51628352490421
b'You are' 38.352329799107146
b'. But' 10.316775695849927
b'Colonel Brandon' 311.43854367373154
b'he has' 10.921296089385473
b'I cannot' 11.988298683531813
b'will be' 10.446974911790807
b'I believe' 14.385958420238175
b'I am' 24.613436090781402
b'Mrs .' 15.194818559392427
b'her daughters' 11.145958766648423
b'into Devonshire' 40.86354515050167
b'at home' 17.575718344301794
b'Sir John' 354.89098213800963
b'had been' 16.907632048034525
b'. The' 12.143253358179983
b'the park' 12.130648363981697
b'. In' 10.79970083290838
b'could not' 13.843658485566868
b'. She' 11.556388279459801
b'or twice' 14.421860245514637
b'self -' 120.86970039570379
b'. But' 10.316775695849927
b'the same' 14.581597777676208
b'Every thing' 41.277702702702705
b'Every thing' 41.277702702702705
b'the park' 12.1306

b'. But' 10.316775695849927
b'- law' 130.40257855191257
b'Colonel Brandon' 311.43854367373154
b'could not' 13.843658485566868
b'. The' 12.143253358179983
b'Sir John' 354.89098213800963
b'at least' 53.05669975186104
b'. They' 10.416634521313767
b'a great' 10.23511614384689
b"' s" 68.69154667529526
b'every thing' 62.202410720970285
b'every day' 27.076343490304712
b'Mrs .' 15.194818559392427
b'had already' 12.86640972178345
b'at home' 17.575718344301794
b'. She' 11.556388279459801
b'; but' 15.937227835206098
b'still more' 21.71738357625311
b'did not' 26.994208781067677
b"o '" 42.575834131893025
b'the whole' 10.788126015398744
b'the park' 12.130648363981697
b'they were' 26.78341068029684
b'. The' 12.143253358179983
b'. They' 10.416634521313767
b'rather than' 11.623097412480973
b'they were' 26.78341068029684
b'Colonel Brandon' 311.43854367373154
b'" What' 10.141268260292165
b'?" said' 10.836733245822543
b'Sir John' 354.89098213800963
b'I hope' 12.964685571385962
b'he has' 10.921296089385473

b'" Perhaps' 13.521691013722885
b'ill -' 73.44273224043715
b'; but' 15.937227835206098
b'Mr .' 14.937097024945233
b'I assure' 11.431699101796408
b'There is' 15.73428914835165
b'up stairs' 57.93361782835467
b'It is' 21.756054131054128
b'did not' 26.994208781067677
b'hundred pounds' 70.95354239256679
b'would have' 11.795675902910675
b'Colonel Brandon' 311.43854367373154
b"' s" 68.69154667529526
b'the park' 12.130648363981697
b'Mrs .' 15.194818559392427
b'or three' 23.07497639282342
b'a great' 10.23511614384689
b'must be' 12.248569741413643
b'their acquaintance' 10.308281587473001
b'. She' 11.556388279459801
b'must be' 12.248569741413643
b'must be' 12.248569741413643
b'I am' 24.613436090781402
b'," said' 80.0495254815154
b'I am' 24.613436090781402
b'may be' 13.850007934888572
b'. The' 12.143253358179983
b'at Delaford' 12.632547559966916
b'more than' 31.765982484947997
b'two thousand' 71.71357300073367
b'have been' 26.323428523149712
b'Miss Williams' 41.9581043956044
b'I dare' 26.250568307

b'at present' 15.339522037102682
b'. But' 10.316775695849927
b'." "' 23.798926347315316
b'my dear' 50.85915516408864
b'have been' 26.323428523149712
b'every day' 27.076343490304712
b'." "' 23.798926347315316
b'," said' 80.0495254815154
b'." "' 23.798926347315316
b'I am' 24.613436090781402
b'." "' 23.798926347315316
b'has been' 28.46285227272727
b'." "' 23.798926347315316
b'at least' 53.05669975186104
b'each other' 149.81758158837954
b'? How' 21.612382075471697
b'such a' 10.026163945754513
b'? How' 21.612382075471697
b'be supposed' 11.703256704980843
b'must be' 12.248569741413643
b'your sister' 34.961227951846624
b"' s" 68.69154667529526
b'?" "' 25.519811208998124
b'I confess' 10.161510312707916
b'," replied' 60.498118439294906
b'; but' 15.937227835206098
b'." "' 23.798926347315316
b'You must' 13.685259856630825
b'between them' 22.03860028860029
b'they are' 19.116426068090785
b'your sister' 34.961227951846624
b'Do you' 11.782256509161043
b'?" "' 25.519811208998124
b'I cannot' 11.9882986

b'. His' 11.740059399021664
b'. He' 10.296342287077485
b'; but' 15.937227835206098
b'. The' 12.143253358179983
b'Mrs .' 15.194818559392427
b'sat down' 304.15149359886203
b'" What' 10.141268260292165
b'Mrs .' 15.194818559392427
b"' s" 68.69154667529526
b'at present' 15.339522037102682
b'?" said' 10.836733245822543
b'a great' 10.23511614384689
b'in spite' 25.66848739495798
b'?" "' 25.519811208998124
b'I hope' 12.964685571385962
b'convinced that' 12.155296341433083
b'no more' 10.791937535330694
b'!" "' 10.05152252347542
b'must be' 12.248569741413643
b'affection for' 10.653293806290135
b'." "' 23.798926347315316
b'I shall' 14.622661181701638
b'I cannot' 11.988298683531813
b'." "' 23.798926347315316
b'." "' 23.798926347315316
b'the world' 14.291366936528226
b'I believe' 14.385958420238175
b'every body' 84.61357340720222
b'; but' 15.937227835206098
b'every body' 84.61357340720222
b'must be' 12.248569741413643
b'my own' 12.273759216721023
b'." "' 23.798926347315316
b'!" cried' 86.997660461804

b'have been' 26.323428523149712
b'tell you' 14.36860549897688
b'?" "' 25.519811208998124
b'you mean' 11.221196675391468
b'?" "' 25.519811208998124
b'tell you' 14.36860549897688
b'." "' 23.798926347315316
b'." "' 23.798926347315316
b'Mr .' 14.937097024945233
b'could not' 13.843658485566868
b'a moment' 12.38390265849884
b"' s" 68.69154667529526
b'" Oh' 23.007056351707597
b'! How' 11.965332636939348
b'I hope' 12.964685571385962
b'I am' 24.613436090781402
b'." "' 23.798926347315316
b'do not' 10.728115801445009
b'," replied' 60.498118439294906
b'Mr .' 14.937097024945233
b'a week' 10.55385678500475
b'Mrs .' 15.194818559392427
b'; but' 15.937227835206098
b'self -' 120.86970039570379
b'be gone' 12.391683569979715
b'. His' 11.740059399021664
b'or three' 23.07497639282342
b'be gone' 12.391683569979715
b'. He' 10.296342287077485
b'. He' 10.296342287077485
b'at Norland' 20.021396132777753
b'in town' 15.669716142270861
b'; but' 15.937227835206098
b'. He' 10.296342287077485
b'any thing' 47.538331133

b'. He' 10.296342287077485
b'Sir John' 354.89098213800963
b'had been' 16.907632048034525
b'next day' 86.15397435897435
b'the park' 12.130648363981697
b'Mrs .' 15.194818559392427
b'did not' 26.994208781067677
b'her own' 10.642128527196602
b'her daughters' 11.145958766648423
b'. But' 10.316775695849927
b'Mr .' 14.937097024945233
b'Mrs .' 15.194818559392427
b'any other' 16.886695967050887
b'. They' 10.416634521313767
b'. But' 10.316775695849927
b'Sir John' 354.89098213800963
b'Lady Middleton' 544.4830659536542
b'did not' 26.994208781067677
b'Mrs .' 15.194818559392427
b'Mrs .' 15.194818559392427
b'young ladies' 197.70550161812295
b'?" said' 10.836733245822543
b'as soon' 14.39331825464141
b'they were' 26.78341068029684
b'; but' 15.937227835206098
b'we are' 23.784390573089702
b'the park' 12.130648363981697
b'." "' 23.798926347315316
b'," said' 80.0495254815154
b'a few' 20.97065049487957
b'. The' 12.143253358179983
b'Miss Dashwoods' 146.8533653846154
b'drawing -' 71.53512880562062
b'the park'

b'Upon my' 44.349183303085304
b'Colonel Brandon' 311.43854367373154
b'told me' 31.122299959244668
b'." "' 23.798926347315316
b'very much' 13.844593637574008
b'Colonel Brandon' 311.43854367373154
b'tell you' 14.36860549897688
b'must be' 12.248569741413643
b'person who' 13.426593406593406
b'could not' 13.843658485566868
b'Colonel Brandon' 311.43854367373154
b'." "' 23.798926347315316
b'assure you' 36.81955159112825
b'tell you' 14.36860549897688
b'there is' 12.446901782428098
b'they are' 19.116426068090785
b'very pretty' 20.694783197831978
b'Mr .' 14.937097024945233
b'Combe Magna' 2082.647727272727
b'have been' 26.323428523149712
b'?" "' 25.519811208998124
b'did not' 26.994208781067677
b'; but' 15.937227835206098
b'will be' 10.446974911790807
b'?" "' 25.519811208998124
b'Mr .' 14.937097024945233
b'very well' 20.15303494657983
b'I hope' 12.964685571385962
b'?" "' 25.519811208998124
b'Oh !' 75.72091101585498
b'full of' 10.281795231416549
b'nothing but' 12.038821558774263
b'." "' 23.79892634

b'the whole' 10.788126015398744
b'. She' 11.556388279459801
b'Lady Middleton' 544.4830659536542
b'less than' 14.141435185185184
b'Sir John' 354.89098213800963
b'," cried' 58.51628352490421
b'Miss Dashwood' 75.7576884920635
b"' s" 68.69154667529526
b'without any' 19.286384341316015
b'. She' 11.556388279459801
b'my life' 17.204424557231366
b'I am' 24.613436090781402
b'." "' 23.798926347315316
b'," said' 80.0495254815154
b'this morning' 16.988598442714128
b'." "' 23.798926347315316
b'," said' 80.0495254815154
b'too much' 30.57040922232931
b'may be' 13.850007934888572
b'; but' 15.937227835206098
b'Lady Middleton' 544.4830659536542
b'full of' 10.281795231416549
b'I cannot' 11.988298683531813
b'they are' 19.116426068090785
b'." "' 23.798926347315316
b'I confess' 10.161510312707916
b'," replied' 60.498118439294906
b'I am' 24.613436090781402
b'at Barton' 25.54897259319151
b'Miss Steele' 170.04073886639674
b'very much' 13.844593637574008
b'Miss Dashwood' 75.7576884920635
b'I suppose' 14.9651333

b'a few' 20.97065049487957
b'. Her' 11.794664326458975
b'at first' 12.7904544044665
b'; but' 15.937227835206098
b'?" "' 25.519811208998124
b'have been' 26.323428523149712
b'four years' 36.66926770708283
b'." "' 23.798926347315316
b'!" "' 10.05152252347542
b'did not' 26.994208781067677
b'," said' 80.0495254815154
b'." "' 23.798926347315316
b'many years' 38.16597251145356
b'. He' 10.296342287077485
b"' s" 68.69154667529526
b'you know' 13.831344597710789
b'." "' 23.798926347315316
b'!" "' 10.05152252347542
b'Mr .' b"' s" 68.69154667529526
b'. But' 10.316775695849927
b"' s" 68.69154667529526
b"' s" 68.69154667529526
b'her own' 10.642128527196602
b'affection for' 10.653293806290135
b'could not' 13.843658485566868
b'had already' 12.86640972178345
b'did not' 26.994208781067677
b'her own' 10.642128527196602
b'. But' 10.316775695849927
b'an opportunity' 31.529211395540873
b'at least' 53.05669975186104
b'the park' 12.130648363981697
b'could not' 13.843658485566868
b'be supposed' 11.7032567049808

b'They were' 27.959267734553777
b'I believe' 14.385958420238175
b'put an' 32.8490388493077
b'at once' 22.1069582299421
b'. But' 10.316775695849927
b'Miss Dashwood' 75.7576884920635
b'?" "' 25.519811208998124
b'," answered' 12.85585016835017
b'such a' 10.026163945754513
b'You know' 10.375509510869565
b'very well' 20.15303494657983
b'would have' 11.795675902910675
b'." "' 23.798926347315316
b'," replied' 60.498118439294906
b'put an' 32.8490388493077
b'Edward Ferrars' 12.65273093082431
b'will be' 10.446974911790807
b"Edward '" 10.034571594062571
b'too much' 30.57040922232931
b'," said' 80.0495254815154
b'be supposed' 11.703256704980843
b'your own' 13.642619394493657
b'no answer' 22.76117734724292
b'each other' 149.81758158837954
b'in town' 15.669716142270861
b'Miss Dashwood' 75.7576884920635
b'?" said' 10.836733245822543
b'" Certainly' 23.66295927401505
b'." "' 23.798926347315316
b'I am' 24.613436090781402
b'would have' 11.795675902910675
b'! But' 10.971671795117395
b'I dare' 26.250568307

b'in town' 15.669716142270861
b"' s" 68.69154667529526
b'be gone' 12.391683569979715
b'her own' 10.642128527196602
b'her sister' 12.36134256833083
b'her sister' 12.36134256833083
b'. They' 10.416634521313767
b"' s" 68.69154667529526
b'Mrs .' 15.194818559392427
b'. She' 11.556388279459801
b'her own' 10.642128527196602
b'her sister' 12.36134256833083
b'Mrs .' 15.194818559392427
b'Mrs .' 15.194818559392427
b'could not' 13.843658485566868
b'. They' 10.416634521313767
b"o '" 42.575834131893025
b'such a' 10.026163945754513
b'. The' 12.143253358179983
b'young ladies' 197.70550161812295
b"' s" 68.69154667529526
b'a great' 10.23511614384689
b'in town' 15.669716142270861
b'less than' 14.141435185185184
b'two hours' 11.087295825771324
b'their arrival' 26.389200863930885
b'her mother' 10.595141036901262
b'sat down' 304.15149359886203
b'. In' 10.79970083290838
b'a few' 20.97065049487957
b'the same' 14.581597777676208
b'I am' 24.613436090781402
b'," said' 80.0495254815154
b'or two' 25.06419849565302

b'Mrs .' 15.194818559392427
b"' s" 68.69154667529526
b'her sister' 12.36134256833083
b'more than' 31.765982484947997
b'any thing' 47.53833113318974
b'. She' 11.556388279459801
b'next morning' 162.0450928381963
b"' s" 68.69154667529526
b'full of' 10.281795231416549
b'in Berkeley' 11.32433267424617
b'; but' 15.937227835206098
b'a moment' 12.38390265849884
b"' s" 68.69154667529526
b'her sister' 12.36134256833083
b'!" cried' 86.9976604618045
b'" No' 21.803726759628155
b"ma '" 58.37546007927519
b'It is' 21.756054131054128
b'Mrs .' 15.194818559392427
b'!" "' 10.05152252347542
b'You are' 38.352329799107146
b'?" said' 10.836733245822543
b'" Yes' 31.550612365353402
b'" You' 16.162646289840637
b'." "' 23.798926347315316
b'!" "' 10.05152252347542
b'." "' 23.798926347315316
b'," answered' 12.85585016835017
b'any thing' 47.53833113318974
b'do not' 10.728115801445009
b'at liberty' 27.92457881676897
b'Mrs .' 15.194818559392427
b'It was' 11.381436768760711
b'Lady Middleton' 544.4830659536542
b'their a

b'Mrs .' 15.194818559392427
b'could not' 13.843658485566868
b'her own' 10.642128527196602
b'could not' 13.843658485566868
b'. Her' 11.794664326458975
b'would have' 11.795675902910675
b'his own' 11.49898631823725
b'have been' 26.323428523149712
b'her sister' 12.36134256833083
b'without any' 19.286384341316015
b'might have' 13.379775798968328
b'might have' 13.379775798968328
b'such a' 10.026163945754513
b'could not' 13.843658485566868
b'still more' 21.71738357625311
b'could not' 13.843658485566868
b'. Her' 11.794664326458975
b'. But' 10.316775695849927
b'such an' 10.905397823869432
b'next day' 86.15397435897435
b'. In' 10.79970083290838
b'a few' 20.97065049487957
b'" No' 21.803726759628155
b'no longer' 44.878170429941235
b'the same' 14.581597777676208
b'It was' 11.381436768760711
b'more than' 31.765982484947997
b'would have' 11.795675902910675
b'still more' 21.71738357625311
b'the world' 14.291366936528226
b'. In' 10.79970083290838
b"' s" 68.69154667529526
b'a moment' 12.38390265849884
b

b'had passed' 12.12411685321902
b'have been' 26.323428523149712
b'; but' 15.937227835206098
b'." "' 23.798926347315316
b'have been' 26.323428523149712
b'?" "' 25.519811208998124
b'the world' 14.291366936528226
b'rather than' 11.623097412480973
b'his own' 11.49898631823725
b'capable of' 11.424216923796166
b'your own' 13.642619394493657
b'have been' 26.323428523149712
b'the world' 14.291366936528226
b'have been' 26.323428523149712
b'my dear' 50.85915516408864
b'your own' 13.642619394493657
b'It is' 21.756054131054128
b'." "' 23.798926347315316
b'," cried' 58.51628352490421
b'I am' 24.613436090781402
b'. The' 12.143253358179983
b'may be' 13.850007934888572
b'the world' 14.291366936528226
b'may be' 13.850007934888572
b'I cannot' 11.988298683531813
b'must be' 12.248569741413643
b'they are' 19.116426068090785
b'." "' 23.798926347315316
b"mother '" 10.994631585416172
b's sake' 13.223160173160172
b'more than' 31.765982484947997
b'my own' 12.273759216721023
b'. But' 10.316775695849927
b'; but' 

b'" What' 10.141268260292165
b'?" said' 10.836733245822543
b'!" "' 10.05152252347542
b'has been' 28.46285227272727
b'It is' 21.756054131054128
b'" You' 16.162646289840637
b'I cannot' 11.988298683531813
b'I am' 24.613436090781402
b'the world' 14.291366936528226
b'I cannot' 11.988298683531813
b'dared not' 18.329132913291332
b"' s" 68.69154667529526
b'any other' 16.886695967050887
b'a few' 20.97065049487957
b'the world' 14.291366936528226
b'at least' 53.05669975186104
b'Mrs .' 15.194818559392427
b"' s" 68.69154667529526
b'as soon' 14.39331825464141
b'did not' 26.994208781067677
b'Miss Dashwoods' 146.8533653846154
b'the whole' 10.788126015398744
b'every day' 27.076343490304712
b'every day' 27.076343490304712
b'Conduit Street' 193.93968253968254
b'at least' 53.05669975186104
b'Mrs .' 15.194818559392427
b"' s" 68.69154667529526
b'; but' 15.937227835206098
b'. Their' 11.64303411473223
b'Lady Middleton' 544.4830659536542
b'Miss Steeles' 234.96538461538464
b'. They' 10.416634521313767
b'too muc

b'John Dashwood' 47.592560132437434
b'. The' 12.143253358179983
b'Miss Steeles' 234.96538461538464
b'Harley Street' 933.3347222222222
b'Sir John' 354.89098213800963
b'more than' 31.765982484947997
b'they were' 26.78341068029684
b'must be' 12.248569741413643
b'Mrs .' 15.194818559392427
b'so much' 20.354581853297116
b'did not' 26.994208781067677
b'able to' 12.747857119926376
b'Mrs .' 15.194818559392427
b'her mother' 10.595141036901262
b'no longer' 44.878170429941235
b'the whole' 10.788126015398744
b'or twice' 14.421860245514637
b'her own' 10.642128527196602
b'her own' 10.642128527196602
b'Miss Dashwoods' 146.8533653846154
b'in Berkeley' 11.32433267424617
b'Mrs .' 15.194818559392427
b'Mrs .' 15.194818559392427
b'drawing -' 71.53512880562062
b'by saying' 12.752531051038511
b'Lord !' 87.47070755279799
b'my dear' 50.85915516408864
b'Miss Dashwood' 75.7576884920635
b'?" "' 25.519811208998124
b"ma '" 58.37546007927519
b'?" "' 25.519811208998124
b'! But' 10.971671795117395
b'Mr .' 14.9370970249

b'any thing' 47.53833113318974
b'. She' 11.556388279459801
b'so much' 20.354581853297116
b'had been' 16.907632048034525
b'so much' 20.354581853297116
b'had been' 16.907632048034525
b'It was' 11.381436768760711
b'well -' 12.234629359425513
b'very much' 13.844593637574008
b'my heart' 15.714277548337309
b'" What' 10.141268260292165
b'Mrs .' 15.194818559392427
b'had been' 16.907632048034525
b'be supposed' 11.703256704980843
b'such a' 10.026163945754513
b'could not' 13.843658485566868
b'might have' 13.379775798968328
b'at last' 13.508070658182442
b'. He' 10.296342287077485
b'. But' 10.316775695849927
b'I am' 24.613436090781402
b'Mrs .' 15.194818559392427
b'put an' 32.8490388493077
b"' s" 68.69154667529526
b'every thing' 62.202410720970285
b'. His' 11.740059399021664
b'Miss Morton' 172.76866515837105
b'- year' 37.0922890103218
b'. His' 11.740059399021664
b'two thousand' 71.71357300073367
b'so far' 17.73366554585259
b'the smallest' 11.866938616938617
b'!" "' 10.05152252347542
b'," replied' 60

b'know what' 13.457727536231884
b'this morning' 16.988598442714128
b'had been' 16.907632048034525
b'Harley Street' 933.3347222222222
b'had been' 16.907632048034525
b'as soon' 14.39331825464141
b'went away' 55.658709912536445
b"mother '" 10.994631585416172
b'on purpose' 16.944619082840237
b'must be' 12.248569741413643
b'nothing but' 12.038821558774263
b'two thousand' 71.71357300073367
b'any thing' 47.53833113318974
b'nothing but' 12.038821558774263
b'could not' 13.843658485566868
b'put an' 32.8490388493077
b'his own' 11.49898631823725
b'Miss Morton' 172.76866515837105
b'any thing' 47.53833113318974
b'. But' 10.316775695849927
b'a great' 10.23511614384689
b'you know' 13.831344597710789
b"' t" 59.960966402416005
b'you know' 13.831344597710789
b'the world' 14.291366936528226
b'might have' 13.379775798968328
b'very glad' 16.93209534368071
b'you know' 13.831344597710789
b'some time' 32.53381672196961
b'could not' 13.843658485566868
b'tell me' 10.323494620627498
b'Mrs .' 15.194818559392427
b'

b'able to' 12.747857119926376
b'the utmost' 10.548389881723216
b'between them' 22.03860028860029
b'," said' 80.0495254815154
b'Mr .' 14.937097024945233
b'he has' 10.921296089385473
b'young woman' 34.88920616790405
b'he replied' 10.707153028809289
b'each other' 149.81758158837954
b'Mrs .' 15.194818559392427
b'does not' 17.137739273927394
b'know what' 13.457727536231884
b'may be' 13.850007934888572
b'Mr .' 14.937097024945233
b'or three' 23.07497639282342
b'in Harley' 12.032103466386555
b'. He' 10.296342287077485
b'young man' 98.03578592634197
b'short time' 27.617992766726942
b'his own' 11.49898631823725
b'still more' 21.71738357625311
b'I am' 24.613436090781402
b"' s" 68.69154667529526
b'may be' 13.850007934888572
b'It is' 21.756054131054128
b'I believe' 14.385958420238175
b'did not' 26.994208781067677
b'more than' 31.765982484947997
b'capable of' 11.424216923796166
b'such an' 10.905397823869432
b'will be' 10.446974911790807
b"' s" 68.69154667529526
b'have been' 26.323428523149712
b'. Th

b'your own' 13.642619394493657
b'Colonel Brandon' 311.43854367373154
b"' s" 68.69154667529526
b'did not' 26.994208781067677
b'might have' 13.379775798968328
b'such a' 10.026163945754513
b'; but' 15.937227835206098
b'the same' 14.581597777676208
b'short time' 27.617992766726942
b'at last' 13.508070658182442
b'Colonel Brandon' 311.43854367373154
b'your brother' 19.090274643241553
b'. He' 10.296342287077485
b'." "' 23.798926347315316
b'," replied' 60.498118439294906
b'I believe' 14.385958420238175
b'will be' 10.446974911790807
b'no answer' 22.76117734724292
b'; but' 15.937227835206098
 11.002289801379026
b'. In' 10.79970083290838
b'might have' 13.379775798968328
b'. In' 10.79970083290838
b'could not' 13.843658485566868
b'. The' 12.143253358179983
b'next morning' 162.0450928381963
b'. The' 12.143253358179983
b'affection for' 10.653293806290135
b'. But' 10.316775695849927
b'. In' 10.79970083290838
b'my own' 12.273759216721023
b'." "' 23.798926347315316
b'Mr .' 14.937097024945233
b'?" said' 

b'at liberty' 27.92457881676897
b'" Well' 19.040748570344473
b'he replied' 10.707153028809289
b'I shall' 14.622661181701638
b'." "' 23.798926347315316
b'you mean' 11.221196675391468
b'?" "' 25.519811208998124
b'Your sister' 25.486441385064662
b"' s" 68.69154667529526
b'." "' 23.798926347315316
b'You are' 38.352329799107146
b'. She' 11.556388279459801
b'." "' 23.798926347315316
b'will be' 10.446974911790807
b'some time' 32.53381672196961
b'some time' 32.53381672196961
b'too much' 30.57040922232931
b'her sister' 12.36134256833083
b'half an' 12.840987913820284
b'in spite' 25.66848739495798
b'. She' 11.556388279459801
b'. But' 10.316775695849927
b'at last' 13.508070658182442
b"' s" 68.69154667529526
b'. The' 12.143253358179983
b"' s" 68.69154667529526
b"' s" 68.69154667529526
b'her mother' 10.595141036901262
b"' s" 68.69154667529526
b'her sister' 12.36134256833083
b'half an' 12.840987913820284
b"' s" 68.69154667529526
b'her mother' 10.595141036901262
b"' s" 68.69154667529526
b'Mrs .' 15.19

b'I shall' 14.622661181701638
b'a great' 10.23511614384689
b'the same' 14.581597777676208
b'had been' 16.907632048034525
b'self -' 120.86970039570379
b'. Her' 11.794664326458975
b'at least' 53.05669975186104
b'her sister' 12.36134256833083
b"' s" 68.69154667529526
b'. But' 10.316775695849927
b'had been' 16.907632048034525
b'or three' 23.07497639282342
b'at home' 17.575718344301794
b'. But' 10.316775695849927
b'at last' 13.508070658182442
b"' s" 68.69154667529526
b"mother '" 10.994631585416172
b"' s" 68.69154667529526
b'. The' 12.143253358179983
b'so far' 17.73366554585259
b'I am' 24.613436090781402
b'I hope' 12.964685571385962
b'," said' 80.0495254815154
b'have done' 15.993245719671187
b'as far' 10.05384021368982
b'do not' 10.728115801445009
b'my feelings' 12.150461178927477
b'have been' 26.323428523149712
b'they are' 19.116426068090785
b';-- but' 14.820204470629008
b'." "' 23.798926347315316
b'has been' 28.46285227272727
b'has been' 28.46285227272727
b'" How' 13.011438522639004
b'her 

b'?" "' 25.519811208998124
b"ma '" 58.37546007927519
b'." "' 23.798926347315316
b'Do you' 11.782256509161043
b'know where' 12.64824016563147
b'?" "' 25.519811208998124
b'Mrs .' 15.194818559392427
b'told me' 31.122299959244668
b'." "' 23.798926347315316
b'?" "' 25.519811208998124
b"ma '" 58.37546007927519
b'. They' 10.416634521313767
b'back again' 53.40122377622377
b'Mrs .' 15.194818559392427
b'; but' 15.937227835206098
b'better than' 10.878027065527064
b'. She' 11.556388279459801
b'the whole' 10.788126015398744
b'. She' 11.556388279459801
b'low voice' 276.4298642533937
b'her mother' 10.595141036901262
b'they were' 26.78341068029684
b'Mr .' 14.937097024945233
b"' s" 68.69154667529526
b"' s" 68.69154667529526
b'" Did' 11.063201738500544
b'?" "' 25.519811208998124
b"ma '" 58.37546007927519
b'could not' 13.843658485566868
b'." "' 23.798926347315316
b'Mrs .' 15.194818559392427
b'?" "' 25.519811208998124
b"ma '" 58.37546007927519
b'very well' 20.15303494657983
b'young lady' 55.17362835854595

b'; but' 15.937227835206098
b"' s" 68.69154667529526
b'we are' 23.784390573089702
b'ill -' 73.44273224043715
b'am sure' 114.82001105583194
b'will be' 10.446974911790807
b'could not' 13.843658485566868
b'we are' 23.784390573089702
b'a few' 20.97065049487957
b'" Your' 11.93090383563784
b'well -' 12.234629359425513
b'without any' 19.286384341316015
b'," said' 80.0495254815154
b'I believe' 14.385958420238175
b'." "' 23.798926347315316
b'," said' 80.0495254815154
b'they are' 19.116426068090785
b'. The' 12.143253358179983
b'his own' 11.49898631823725
b'- year' 37.0922890103218
b'. She' 11.556388279459801
b'I suppose' 14.965133369624388
b"' s" 68.69154667529526
b'would have' 11.795675902910675
b'." "' 23.798926347315316
b'will be' 10.446974911790807
b'.-- She' 12.837839731017093
b'will be' 10.446974911790807
b'the same' 14.581597777676208
b'at present' 15.339522037102682
b'between them' 22.03860028860029
b'. He' 10.296342287077485
b"' s" 68.69154667529526
b'did not' 26.994208781067677
b'. He'

b'a great' 10.23511614384689
b'at home' 17.575718344301794
b'can tell' 14.466256215960216
b'Mrs .' 15.194818559392427
b'they were' 26.78341068029684
b'his wife' 29.14836582879698
b'had passed' 12.12411685321902
b'. The' 12.143253358179983
b'the latter' 11.392261072261073
b'at first' 12.7904544044665
b'as soon' 14.39331825464141
b'the smallest' 11.866938616938617
b'Mrs .' 15.194818559392427
b'. The' 12.143253358179983
b"' s" 68.69154667529526
b'may be' 13.850007934888572
b'self -' 120.86970039570379
b'may be' 13.850007934888572
b"Bartlett '" 11.530955077387693
b'. He' 10.296342287077485
b'or two' 25.064198495653027
b'. In' 10.79970083290838
b"' s" 68.69154667529526
b'. His' 11.740059399021664
b'any other' 16.886695967050887
b'his own' 11.49898631823725
b'. He' 10.296342287077485
b"mother '" 10.994631585416172
b'. They' 10.416634521313767
b'Mrs .' 15.194818559392427
b"' s" 68.69154667529526
b'. The' 12.143253358179983
b'at first' 12.7904544044665
b'. But' 10.316775695849927
b'self -' 120

In [8]:
# Exercise 2.2. Find bigrams by NPMI

# find phrases using NPMI

phrases = Phrases([words], min_count=5, threshold=0.4, scoring='npmi')

for phrase, score in phrases.export_phrases([words]):
    print(phrase, score)



b'. Their' 0.4110816344570768
b', and' 0.43983091376412053
b'at Norland' 0.4646236206342064
b'good opinion' 0.49700004455618746
b'. The' 0.5112902108414912
b', and' 0.43983091376412053
b'many years' 0.5585483801754227
b'. But' 0.4872164107824855
b'his own' 0.428604982060435
b'a great' 0.4160463029398818
b'Mr .' 0.5349475102611524
b', and' 0.43983091376412053
b'. In' 0.42363682322819096
b', and' 0.43983091376412053
b"' s" 0.9562587141318204
b'. His' 0.4365673921107498
b'. The' 0.5112902108414912
b'Mr .' 0.5349475102611524
b'Mrs .' 0.6388509199633257
b'; and' 0.5300099823509559
b'Mr .' 0.5349475102611524
b'. The' 0.5112902108414912
b'young man' 0.6536836157022572
b'had been' 0.5263227074804739
b', and' 0.43983091376412053
b'his own' 0.428604982060435
b'soon afterwards' 0.6423546487520422
b"' s" 0.9562587141318204
b'. Their' 0.4110816344570768
b', and' 0.43983091376412053
b'thousand pounds' 0.9100024359251441
b'his own' 0.428604982060435
b"' s" 0.9562587141318204
b', and' 0.43983091376412

b', and' 0.43983091376412053
b'will be' 0.43718059461004055
b'to be' 0.41990744680061143
b'," said' 0.8061628790313721
b'her husband' 0.45405887756524654
b'. The' 0.5112902108414912
b'would be' 0.4122234816516538
b'." "' 0.693249204903288
b'." "' 0.693249204903288
b'would be' 0.4122234816516538
b'hundred pounds' 0.6847126446226847
b'would be' 0.4122234816516538
b'" Oh' 0.5061969002074319
b'so much' 0.5109416447650224
b'such a' 0.4502731675622843
b'would not' 0.4243830217611643
b'any thing' 0.6177683619092494
b'too much' 0.5277521915033289
b'at least' 0.6236333254738505
b'I have' 0.4273545238545112
b'can hardly' 0.5026287905578061
b'." "' 0.693249204903288
b'There is' 0.4441070466039112
b'knowing what' 0.4342985549970214
b'," said' 0.8061628790313721
b'we are' 0.48474356661787155
b'." "' 0.693249204903288
b'five hundred' 0.682022745535601
b'- piece' 0.5315236722119306
b'without any' 0.4665443453341391
b'thousand pounds' 0.9100024359251441
b"mother '" 0.41745134524422184
b'young woman' 0

b'did not' 0.5894846130208785
b'ill -' 0.6236923795747431
b'. She' 0.5044940325692563
b'. It' 0.45091216045154525
b'her mother' 0.45557817465871986
b'It is' 0.5103821207462266
b'," said' 0.8061628790313721
b'. It' 0.45091216045154525
b'." "' 0.693249204903288
b'you will' 0.4045135619029827
b'," said' 0.8061628790313721
b'you know' 0.4568852596794373
b'." "' 0.693249204903288
b'!" replied' 0.4016786030091226
b'her mother' 0.45557817465871986
b'." "' 0.693249204903288
b'." "' 0.693249204903288
b'I have' 0.4273545238545112
b'Mrs .' 0.6388509199633257
b'acquainted with' 0.5324281930616914
b'. Her' 0.4450096913002978
b', and' 0.43983091376412053
b'. She' 0.5044940325692563
b'his merits' 0.43174281956608906
b'regard for' 0.4256538040009272
b'; but' 0.5638928643646233
b'really felt' 0.42802499571608876
b'young man' 0.6536836157022572
b"' s" 0.9562587141318204
b'to be' 0.41990744680061143
b'no longer' 0.561821715681069
b'to be' 0.41990744680061143
b', and' 0.43983091376412053
b'a few' 0.510358

b', and' 0.43983091376412053
b'young woman' 0.5384851507000227
b'Mrs .' 0.6388509199633257
b'to be' 0.41990744680061143
b'to be' 0.41990744680061143
b'. She' 0.5044940325692563
b', and' 0.43983091376412053
b'might be' 0.4025483528650405
b'. In' 0.42363682322819096
b'. It' 0.45091216045154525
b'her own' 0.45918368894467154
b'. The' 0.5112902108414912
b', and' 0.43983091376412053
b'. He' 0.46418457637624266
b'; and' 0.5300099823509559
b'. He' 0.46418457637624266
b'Barton Park' 0.7058039192019276
b'his own' 0.428604982060435
b'from whence' 0.5456352897379463
b'the same' 0.49164836691948793
b'. He' 0.46418457637624266
b'the whole' 0.4182235497798612
b'could not' 0.5162635917428708
b'a moment' 0.4322659708717282
b'. She' 0.5044940325692563
b'. Her' 0.4450096913002978
b'. The' 0.5112902108414912
b'so far' 0.45021304971922055
b'a few' 0.5103586742524525
b'would have' 0.4535230220036108
b'no longer' 0.561821715681069
b'- law' 0.6966716174747276
b"' s" 0.9562587141318204
b'; and' 0.530009982350

b'would not' 0.4243830217611643
b'every day' 0.5096224679475897
b'Lady Middleton' 0.9846252550902825
b'Mrs .' 0.6388509199633257
b'as soon' 0.4587326731575739
b'would be' 0.4122234816516538
b'; and' 0.5300099823509559
b'her ladyship' 0.4153375783137343
b'next day' 0.6240426548127025
b'. They' 0.42532703744684097
b'so much' 0.5109416447650224
b'at Barton' 0.5099452777349327
b'; and' 0.5300099823509559
b'Lady Middleton' 0.9846252550902825
b'more than' 0.5843156278981673
b', and' 0.43983091376412053
b'. Her' 0.4450096913002978
b'her husband' 0.45405887756524654
b"' s" 0.9562587141318204
b'. But' 0.4872164107824855
b'would have' 0.4535230220036108
b'; and' 0.5300099823509559
b'well -' 0.40938330491059133
b', and' 0.43983091376412053
b'Sir John' 0.9462146078941119
b', and' 0.43983091376412053
b'Lady Middleton' 0.9846252550902825
b'years old' 0.5652066012338234
b'to be' 0.41990744680061143
b', and' 0.43983091376412053
b'her ladyship' 0.4153375783137343
b'at home' 0.4513263124191663
b'to be' 

b'Colonel Brandon' 0.9487658632212199
b'much greater' 0.4070723622175507
b'; but' 0.5638928643646233
b'can hardly' 0.5026287905578061
b'" Did' 0.4068853197986566
b'?" "' 0.5943533305092514
b'," said' 0.8061628790313721
b'her mother' 0.45557817465871986
b'must be' 0.45471532823910565
b'; and' 0.5300099823509559
b'my life' 0.44759051852339776
b'has been' 0.5358774592015813
b'." "' 0.693249204903288
b'very well' 0.4824602538360517
b'Colonel Brandon' 0.9487658632212199
b'. He' 0.46418457637624266
b'. But' 0.4872164107824855
b'thirty -' 0.4618195346562432
b'." "' 0.693249204903288
b'Perhaps ,"' 0.42049527375716217
b'said Elinor' 0.4433341026134319
b'thirty -' 0.4618195346562432
b'any thing' 0.6177683619092494
b'. But' 0.4872164107824855
b'to be' 0.41990744680061143
b'I should' 0.4015548692731018
b'Colonel Brandon' 0.9487658632212199
b"' s" 0.9562587141318204
b'thirty -' 0.4618195346562432
b'." "' 0.693249204903288
b'," said' 0.8061628790313721
b'a moment' 0.4322659708717282
b', and' 0.43983

b'." "' 0.693249204903288
b'will be' 0.43718059461004055
b'," said' 0.8061628790313721
b'Sir John' 0.9462146078941119
b'will be' 0.43718059461004055
b'will be' 0.43718059461004055
b', and' 0.43983091376412053
b'." "' 0.693249204903288
b'That is' 0.44339787744106685
b'Sir John' 0.9462146078941119
b'," said' 0.8061628790313721
b'; and' 0.5300099823509559
b"' s" 0.9562587141318204
b'a man' 0.40344149002006935
b'. Their' 0.4110816344570768
b'; and' 0.5300099823509559
b'Sir John' 0.9462146078941119
b'did not' 0.5894846130208785
b'; but' 0.5638928643646233
b', and' 0.43983091376412053
b'" Ay' 0.4109626854947605
b'you will' 0.4045135619029827
b'I dare' 0.5143957874927514
b', and' 0.43983091376412053
b'very well' 0.4824602538360517
b'can tell' 0.4473824648244856
b'in spite' 0.5081296456779361
b"Marianne '" 0.40014254854975595
b'next morning' 0.695290818312787
b'. He' 0.46418457637624266
b'Mrs .' 0.6388509199633257
b'more than' 0.5843156278981673
b'Sir John' 0.9462146078941119
b"' s" 0.95625871

b'; and' 0.5300099823509559
b', and' 0.43983091376412053
b'could not' 0.5162635917428708
b'. She' 0.5044940325692563
b'; and' 0.5300099823509559
b'or twice' 0.5264738215369208
b'self -' 0.6798910137031118
b'. But' 0.4872164107824855
b'; and' 0.5300099823509559
b'the same' 0.49164836691948793
b'; and' 0.5300099823509559
b'she had' 0.426031866576212
b'Every thing' 0.5863584601356592
b'Every thing' 0.5863584601356592
b'the park' 0.4142538797311001
b'they were' 0.5662775820264453
b'; and' 0.5300099823509559
b'any body' 0.4852508261367605
b'; but' 0.5638928643646233
b'could not' 0.5162635917428708
b', and' 0.43983091376412053
b'Mrs .' 0.6388509199633257
b'. Her' 0.4450096913002978
b', and' 0.43983091376412053
b'to be' 0.41990744680061143
b'she had' 0.426031866576212
b'bestowed on' 0.4885232063833453
b"' s" 0.9562587141318204
b'. Her' 0.4450096913002978
b'so much' 0.5109416447650224
b'. They' 0.42532703744684097
b'amends for' 0.47101545853434357
b'she had' 0.426031866576212
b'Lady Middleton'

b', and' 0.43983091376412053
b'every body' 0.6472900585504116
b'; and' 0.5300099823509559
b'. But' 0.4872164107824855
b'. She' 0.5044940325692563
b'convinced that' 0.41134810026909513
b'fixed on' 0.49609725013748157
b'could not' 0.5162635917428708
b'Mrs .' 0.6388509199633257
b'; but' 0.5638928643646233
b'may be' 0.4538330150177461
b'." "' 0.693249204903288
b'," replied' 0.6283724536996631
b'told me' 0.5150762628107255
b'." This' 0.41657629224438186
b', and' 0.43983091376412053
b'" Oh' 0.5061969002074319
b'let us' 0.5865246566124922
b'," said' 0.8061628790313721
b'Mrs .' 0.6388509199633257
b"' s" 0.9562587141318204
b'?" "' 0.5943533305092514
b"ma '" 0.6048068868088367
b'. But' 0.4872164107824855
b'very well' 0.4824602538360517
b'; and' 0.5300099823509559
b'know where' 0.4334845463791907
b'." "' 0.693249204903288
b'his own' 0.428604982060435
b'at Norland' 0.4646236206342064
b'to be' 0.41990744680061143
b'. He' 0.46418457637624266
b'I dare' 0.5143957874927514
b'." "' 0.693249204903288
b'.

b'young ladies' 0.7154425412087156
b'." "' 0.693249204903288
b'" Oh' 0.5061969002074319
b'; and' 0.5300099823509559
b'I dare' 0.5143957874927514
b'Sir John' 0.9462146078941119
b'they were' 0.5662775820264453
b'; and' 0.5300099823509559
b'. The' 0.5112902108414912
b"' s" 0.9562587141318204
b', and' 0.43983091376412053
b'. He' 0.46418457637624266
b'the park' 0.4142538797311001
b', and' 0.43983091376412053
b'they were' 0.5662775820264453
b'; and' 0.5300099823509559
b'did not' 0.5894846130208785
b'. They' 0.42532703744684097
b'delighted with' 0.46333816171572284
b'; but' 0.5638928643646233
b'. It' 0.45091216045154525
b', and' 0.43983091376412053
b'every body' 0.6472900585504116
b', and' 0.43983091376412053
b'Sir John' 0.9462146078941119
b'Miss Dashwoods' 0.6962706131497786
b'Mrs .' 0.6388509199633257
b"' s" 0.9562587141318204
b'; and' 0.5300099823509559
b', and' 0.43983091376412053
b'I have' 0.4273545238545112
b'in spite' 0.5081296456779361
b'know where' 0.4334845463791907
b', and' 0.43983

b'I should' 0.4015548692731018
b'." "' 0.693249204903288
b'," said' 0.8061628790313721
b'?" "' 0.5943533305092514
b'." "' 0.693249204903288
b'do not' 0.43052961219856084
b'," replied' 0.6283724536996631
b'? What' 0.45506458720505905
b'tell you' 0.4314723219158311
b'? What' 0.45506458720505905
b'; but' 0.5638928643646233
b'" Do' 0.4508242722544636
b'you know' 0.4568852596794373
b'Do not' 0.40220457168638446
b'you know' 0.4568852596794373
b'does not' 0.4492345057412554
b', and' 0.43983091376412053
b'no answer' 0.4831248864546338
b'. His' 0.4365673921107498
b'some time' 0.5338618556878509
b'. His' 0.4365673921107498
b'his own' 0.428604982060435
b'. It' 0.45091216045154525
b'the same' 0.49164836691948793
b'she had' 0.426031866576212
b'no doubt' 0.43358623875422264
b'; but' 0.5638928643646233
b'; and' 0.5300099823509559
b'. He' 0.46418457637624266
b'- room' 0.479052207743605
b'next morning' 0.695290818312787
b'; and' 0.5300099823509559
b'. But' 0.4872164107824855
b', and' 0.4398309137641205

b'I am' 0.6301218872108818
b'. But' 0.4872164107824855
b'. They' 0.42532703744684097
b'a great' 0.4160463029398818
b'. The' 0.5112902108414912
b'to be' 0.41990744680061143
b'young men' 0.668858469754086
b', and' 0.43983091376412053
b'. But' 0.4872164107824855
b'at length' 0.4444766598209814
b'might be' 0.4025483528650405
b'the whole' 0.4182235497798612
b'to be' 0.41990744680061143
b', and' 0.43983091376412053
b'young man' 0.6536836157022572
b'at Oxford' 0.46258417695996856
b'have been' 0.6094572622119955
b'ever since' 0.533785583359924
b'." "' 0.693249204903288
b'I suppose' 0.44266324364511533
b'will be' 0.43718059461004055
b'," said' 0.8061628790313721
b'Mrs .' 0.6388509199633257
b'your own' 0.42630012735428585
b'will be' 0.43718059461004055
b', and' 0.43983091376412053
b"' s" 0.9562587141318204
b'." "' 0.693249204903288
b'will be' 0.43718059461004055
b'," said' 0.8061628790313721
b'to be' 0.41990744680061143
b'. In' 0.42363682322819096
b'every thing' 0.6660144460212637
b'." "' 0.6932

b'Mr .' 0.5349475102611524
b'Mrs .' 0.6388509199633257
b', and' 0.43983091376412053
b'any other' 0.4501584758860145
b'. They' 0.42532703744684097
b', and' 0.43983091376412053
b'to be' 0.41990744680061143
b'. But' 0.4872164107824855
b'Sir John' 0.9462146078941119
b'would not' 0.4243830217611643
b'Lady Middleton' 0.9846252550902825
b'did not' 0.5894846130208785
b'Mrs .' 0.6388509199633257
b'Mrs .' 0.6388509199633257
b'; and' 0.5300099823509559
b'young ladies' 0.7154425412087156
b'as soon' 0.4587326731575739
b'they were' 0.5662775820264453
b'to be' 0.41990744680061143
b'; but' 0.5638928643646233
b'we are' 0.48474356661787155
b'the park' 0.4142538797311001
b'." "' 0.693249204903288
b'to be' 0.41990744680061143
b'," said' 0.8061628790313721
b'a few' 0.5103586742524525
b'. The' 0.5112902108414912
b'We must' 0.40241771817995675
b'Miss Dashwoods' 0.6962706131497786
b'drawing -' 0.6047123634102674
b'the park' 0.4142538797311001
b'next day' 0.6240426548127025
b'Mrs .' 0.6388509199633257
b'good h

b'they were' 0.5662775820264453
b'acquainted with' 0.5324281930616914
b'" Oh' 0.5061969002074319
b'," replied' 0.6283724536996631
b'Mrs .' 0.6388509199633257
b'; but' 0.5638928643646233
b'I have' 0.4273545238545112
b'in town' 0.462481215943391
b'to be' 0.41990744680061143
b'at Barton' 0.5099452777349327
b'at Allenham' 0.4455077175771628
b';-- but' 0.4359243981044014
b'my uncle' 0.447658979035532
b'I dare' 0.5143957874927514
b'a great' 0.4160463029398818
b'have been' 0.6094572622119955
b'. He' 0.46418457637624266
b'I believe' 0.4518104820153916
b'; but' 0.5638928643646233
b'so much' 0.5109416447650224
b'do not' 0.43052961219856084
b'Mr .' 0.5349475102611524
b'you know' 0.4568852596794373
b', and' 0.43983091376412053
b'such a' 0.4502731675622843
b'very well' 0.4824602538360517
b'your sister' 0.5781157347281806
b'I am' 0.6301218872108818
b'monstrous glad' 0.7109389762623912
b'I shall' 0.4679987624187578
b'you know' 0.4568852596794373
b'." "' 0.693249204903288
b'Upon my' 0.5633126028601719

b'very true' 0.40347232296494095
b', and' 0.43983091376412053
b'a great' 0.4160463029398818
b'will be' 0.43718059461004055
b'delighted with' 0.46333816171572284
b'I am' 0.6301218872108818
b'. They' 0.42532703744684097
b'the whole' 0.4182235497798612
b'How can' 0.4745725418633536
b'they are' 0.47502166333664075
b'you know' b'would be' 0.4122234816516538
b'to be' 0.41990744680061143
b', and' 0.43983091376412053
b'. But' 0.4872164107824855
b'have been' 0.6094572622119955
b'let us' 0.5865246566124922
b', and' 0.43983091376412053
b'Miss Dashwood' 0.688656519657195
b'so much' 0.5109416447650224
b'." "' 0.693249204903288
b"ma '" 0.6048068868088367
b'," said' 0.8061628790313721
b', and' 0.43983091376412053
b'I am' 0.6301218872108818
b'capable of' 0.4015024033144336
b'to be' 0.41990744680061143
b'able to' 0.43416154330515977
b'. But' 0.4872164107824855
b', and' 0.43983091376412053
b'to be' 0.41990744680061143
b'Oh !' 0.6307054879986161
b'. It' 0.45091216045154525
b'Mrs .' 0.6388509199633257
b'M

b"' s" 0.9562587141318204
b', and' 0.43983091376412053
b'a great' 0.4160463029398818
b'in town' 0.462481215943391
b'to be' 0.41990744680061143
b'less than' 0.4276292948509782
b'two hours' 0.5003590564125457
b'their arrival' 0.4962222320217063
b'her mother' 0.45557817465871986
b', and' 0.43983091376412053
b'sat down' 0.7697691705928612
b'. In' 0.42363682322819096
b'a few' 0.5103586742524525
b'the same' 0.49164836691948793
b'I am' 0.6301218872108818
b'," said' 0.8061628790313721
b'or two' 0.49700540204912663
b'?" "' 0.5943533305092514
b'I am' 0.6301218872108818
b'," replied' 0.6283724536996631
b', and' 0.43983091376412053
b'no more' 0.4102937839422034
b'; and' 0.5300099823509559
b'must be' 0.45471532823910565
b', and' 0.43983091376412053
b"Marianne '" 0.40014254854975595
b'few minutes' 0.683801180281663
b'no more' 0.4102937839422034
b', and' 0.43983091376412053
b'; and' 0.5300099823509559
b'at once' 0.48241102494193366
b'. Her' 0.4450096913002978
b'; but' 0.5638928643646233
b'her sister'

b"' s" 0.9562587141318204
b'I dare' 0.5143957874927514
b'we shall' 0.5609569605322652
b'Sir John' 0.9462146078941119
b'Lady Middleton' 0.9846252550902825
b'in town' 0.462481215943391
b'." "' 0.693249204903288
b'my dear' 0.6194679929982065
b'her own' 0.45918368894467154
b'." "' 0.693249204903288
b"' s" 0.9562587141318204
b'might be' 0.4025483528650405
b', and' 0.43983091376412053
b'could not' 0.5162635917428708
b', and' 0.43983091376412053
b'. The' 0.5112902108414912
b'Mrs .' 0.6388509199633257
b"' s" 0.9562587141318204
b'in town' 0.462481215943391
b'; and' 0.5300099823509559
b"Don '" 0.5190868701057512
b'can hardly' 0.5026287905578061
b'. It' 0.45091216045154525
b'. The' 0.5112902108414912
b'will be' 0.43718059461004055
b'a moment' 0.4322659708717282
b', and' 0.43983091376412053
b'we shall' 0.5609569605322652
b'; but' 0.5638928643646233
b', and' 0.43983091376412053
b', and' 0.43983091376412053
b'. The' 0.5112902108414912
b'Miss Dashwoods' 0.6962706131497786
b'to be' 0.41990744680061143

b'the same' 0.49164836691948793
b'more than' 0.5843156278981673
b'. She' 0.5044940325692563
b'she had' 0.426031866576212
b'been informed' 0.4497704745026794
b'each other' 0.7150900506380493
b'their mutual' 0.4510958076220618
b'she had' 0.426031866576212
b'no doubt' 0.43358623875422264
b', and' 0.43983091376412053
b'. He' 0.46418457637624266
b', and' 0.43983091376412053
b'his seat' 0.43174281956608906
b', and' 0.43983091376412053
b'your sister' 0.5781157347281806
b'took leave' 0.5324613321912962
b', and' 0.43983091376412053
b'went away' 0.5809639978800969
b'Colonel Brandon' 0.9487658632212199
b"' s" 0.9562587141318204
b', and' 0.43983091376412053
b'or four' 0.4451654232110011
b'she had' 0.426031866576212
b'her mother' 0.45557817465871986
b'. They' 0.42532703744684097
b'Lady Middleton' 0.9846252550902825
b'Mrs .' 0.6388509199633257
b'; and' 0.5300099823509559
b', and' 0.43983091376412053
b'. She' 0.5044940325692563
b'drawing -' 0.6047123634102674
b'Lady Middleton' 0.9846252550902825
b"' 

b', and' 0.43983091376412053
b'know what' 0.42310674567031237
b'the door' 0.4085907107991782
b', and' 0.43983091376412053
b'or three' 0.480303394426024
b'without saying' 0.48139903291752634
b'; and' 0.5300099823509559
b', and' 0.43983091376412053
b'at first' 0.42395846665561443
b"Marianne '" 0.40014254854975595
b'. The' 0.5112902108414912
b', and' 0.43983091376412053
b'some time' 0.5338618556878509
b"' s" 0.9562587141318204
b'; and' 0.5300099823509559
b', and' 0.43983091376412053
b"' s" 0.9562587141318204
b'I have' 0.4273545238545112
b'I am' 0.6301218872108818
b'last night' 0.589865677786315
b'did not' 0.5894846130208785
b'; and' 0.5300099823509559
b'I am' 0.6301218872108818
b'assure you' 0.5403608031347624
b'have been' 0.6094572622119955
b'I shall' 0.4679987624187578
b'reflect on' 0.47549722003307554
b', and' 0.43983091376412053
b'; but' 0.5638928643646233
b'I have' 0.4273545238545112
b'more than' 0.5843156278981673
b'I shall' 0.4679987624187578
b'I should' 0.4015548692731018
b'you wi

b'. The' 0.5112902108414912
b'may be' 0.4538330150177461
b'the world' 0.48263276871731303
b'may be' 0.4538330150177461
b'I cannot' 0.4231591085011541
b'must be' 0.45471532823910565
b'they are' 0.47502166333664075
b'." "' 0.693249204903288
b"mother '" 0.41745134524422184
b's sake' 0.43059072990149333
b'more than' 0.5843156278981673
b'my own' 0.42167170898110257
b'. But' 0.4872164107824855
b'I am' 0.6301218872108818
b'Oh !' 0.6307054879986161
b'they were' 0.5662775820264453
b'without knowing' 0.5109009030113832
b'; and' 0.5300099823509559
b'took up' 0.4691428815942414
b"' s" 0.9562587141318204
b', and' 0.43983091376412053
b'It is' 0.5103821207462266
b'too much' 0.5277521915033289
b'might have' 0.43927129894689865
b'told me' 0.5150762628107255
b'bestowed on' 0.4885232063833453
b'That is' 0.44339787744106685
b'?" "' 0.5943533305092514
b'." "' 0.693249204903288
b'And yet' 0.44534297642255777
b'have been' 0.6094572622119955
b'have been' 0.6094572622119955
b', and' 0.43983091376412053
b'Oh !'

b'. But' 0.4872164107824855
b'in spite' 0.5081296456779361
b'. It' 0.45091216045154525
b'to be' 0.41990744680061143
b'you know' 0.4568852596794373
b'quite out' 0.4021477651483763
b'too much' 0.5277521915033289
b'Colonel Brandon' 0.9487658632212199
b'must be' 0.45471532823910565
b'; and' 0.5300099823509559
b'. It' 0.45091216045154525
b'. In' 0.42363682322819096
b'will be' 0.43718059461004055
b'That is' 0.44339787744106685
b'very much' 0.4380232378242788
b'I assure' 0.4015749299878283
b'her mother' 0.45557817465871986
b'Mrs .' 0.6388509199633257
b'good -' 0.4899138752920593
b'I am' 0.6301218872108818
b'great pleasure' 0.41569865441723813
b'would not' 0.4243830217611643
b'would be' 0.4122234816516538
b'the same' 0.49164836691948793
b'And yet' 0.44534297642255777
b'." "' 0.693249204903288
b'Mr .' 0.5349475102611524
b'Edward Ferrars' 0.41781923867578347
b'," said' 0.8061628790313721
b'to be' 0.41990744680061143
b'?" "' 0.5943533305092514
b'It is' 0.5103821207462266
b'there is' 0.42424122463

b'dared not' 0.47187423538359236
b"Bartlett '" 0.5042557576808325
b's Buildings' 0.5179428186911491
b', and' 0.43983091376412053
b'their mutual' 0.4510958076220618
b'to be' 0.41990744680061143
b'at present' 0.4365941335898709
b'in town' 0.462481215943391
b'short time' 0.5035642117292565
b'in Berkeley' 0.40935071661900574
b"' s" 0.9562587141318204
b'; and' 0.5300099823509559
b'still more' 0.47374319475770343
b'she had' 0.426031866576212
b'. The' 0.5112902108414912
b'delighted with' 0.46333816171572284
b'; and' 0.5300099823509559
b'soon after' 0.4223248513291634
b'their acquaintance' 0.4045369464560328
b'in Harley' 0.4157574127075131
b'three months' 0.6181376995330952
b'. Their' 0.4110816344570768
b'Mrs .' 0.6388509199633257
b', and' 0.43983091376412053
b'John Dashwood' 0.587699653848866
b'Colonel Brandon' 0.9487658632212199
b'to be' 0.41990744680061143
b'Miss Dashwoods' 0.6962706131497786
b'some surprise' 0.4830963387997464
b'. They' 0.42532703744684097
b'Mrs .' 0.6388509199633257
b'; b

b"' s" 0.9562587141318204
b', and' 0.43983091376412053
b', and' 0.43983091376412053
b'" Dear' 0.4582476407426849
b"don '" 0.5586779212189561
b"Don '" 0.5190868701057512
b'no more' 0.4102937839422034
b'quite overcome' 0.590945127452901
b', and' 0.43983091376412053
b"' s" 0.9562587141318204
b"' s" 0.9562587141318204
b', and' 0.43983091376412053
b'almost every' 0.45458059731103795
b'Colonel Brandon' 0.9487658632212199
b'without knowing' 0.5109009030113832
b'Mrs .' 0.6388509199633257
b'" Ah' 0.4109626854947605
b'; and' 0.5300099823509559
b'Sir John' 0.9462146078941119
b'his seat' 0.43174281956608906
b'Lucy Steele' 0.555056628384462
b', and' 0.43983091376412053
b'the whole' 0.4182235497798612
b'. In' 0.42363682322819096
b'a few' 0.5103586742524525
b'put an' 0.518296865530678
b', and' 0.43983091376412053
b'had passed' 0.40875353818004434
b'the whole' 0.4182235497798612
b'!" said' 0.45033895761490567
b'Colonel Brandon' 0.9487658632212199
b'low voice' 0.7469926696671594
b'as soon' 0.4587326731

b', and' 0.43983091376412053
b'something else' 0.5764361320020666
b'such a' 0.4502731675622843
b'in Harley' 0.4157574127075131
b'I have' 0.4273545238545112
b'their mutual' 0.4510958076220618
b', and' 0.43983091376412053
b'they were' 0.5662775820264453
b'But why' 0.47344604427990195
b'?" "' 0.5943533305092514
b'." "' 0.693249204903288
b'to be' 0.41990744680061143
b'?" "' 0.5943533305092514
b'," cried' 0.6142347348788296
b'young men' 0.668858469754086
b'I am' 0.6301218872108818
b'Harley Street' 0.8862155149904902
b'the world' 0.48263276871731303
b', and' 0.43983091376412053
b'. He' 0.46418457637624266
b', and' 0.43983091376412053
b'any body' 0.4852508261367605
b', and' 0.43983091376412053
b'must be' 0.45471532823910565
b'those who' 0.44122641022592474
b'present case' 0.5599473121248175
b'to be' 0.41990744680061143
b'ill -' 0.6236923795747431
b', and' 0.43983091376412053
b'got up' 0.5851055182619299
b'go away' 0.407448761369466
b'!" said' 0.45033895761490567
b'my dear' 0.6194679929982065


b'did not' 0.5894846130208785
b'soon afterwards' 0.6423546487520422
b', and' 0.43983091376412053
b'Mrs .' 0.6388509199633257
b', and' 0.43983091376412053
b'I shall' 0.4679987624187578
b'very much' 0.4380232378242788
b'Mrs .' 0.6388509199633257
b'delighted with' 0.46333816171572284
b', and' 0.43983091376412053
b'such a' 0.4502731675622843
b'able to' 0.43416154330515977
b', and' 0.43983091376412053
b'go away' 0.407448761369466
b'passed between' 0.5198567287533165
b'I have' 0.4273545238545112
b'," said' 0.8061628790313721
b'Mr .' 0.5349475102611524
b'he has' 0.40927299249678084
b'young woman' 0.5384851507000227
b'two young' 0.4013264396740548
b'each other' 0.7150900506380493
b'Mrs .' 0.6388509199633257
b'does not' 0.4492345057412554
b'know what' 0.42310674567031237
b'may be' 0.4538330150177461
b'I have' 0.4273545238545112
b'Mr .' 0.5349475102611524
b'or three' 0.480303394426024
b'in Harley' 0.4157574127075131
b', and' 0.43983091376412053
b'. He' 0.46418457637624266
b'young man' 0.65368361

b'. He' 0.46418457637624266
b'Mrs .' 0.6388509199633257
b'the door' 0.4085907107991782
b'; and' 0.5300099823509559
b'by saying' 0.42190274328600175
b'Miss Dashwood' 0.688656519657195
b', and' 0.43983091376412053
b'might be' 0.4025483528650405
b'at least' 0.6236333254738505
b'. Her' 0.4450096913002978
b'. She' 0.5044940325692563
b', and' 0.43983091376412053
b'to be' 0.41990744680061143
b'acquainted with' 0.5324281930616914
b'she had' 0.426031866576212
b', and' 0.43983091376412053
b'she had' 0.426031866576212
b'some minutes' 0.4186851625422356
b'. He' 0.46418457637624266
b'; and' 0.5300099823509559
b'sat down' 0.7697691705928612
b'first coming' 0.4666731296150237
b'could not' 0.5162635917428708
b'; but' 0.5638928643646233
b'to be' 0.41990744680061143
b'as soon' 0.4587326731575739
b'any thing' 0.6177683619092494
b'Mrs .' 0.6388509199633257
b'told me' 0.5150762628107255
b'," said' 0.8061628790313721
b'at least' 0.6236333254738505
b'such a' 0.4502731675622843
b'the same' 0.49164836691948793

b'her husband' 0.45405887756524654
b'. He' 0.46418457637624266
b'great pleasure' 0.41569865441723813
b'had been' 0.5263227074804739
b'in Berkeley' 0.40935071661900574
b', and' 0.43983091376412053
b'would be' 0.4122234816516538
b'very glad' 0.4518605024162352
b'. They' 0.42532703744684097
b'up stairs' 0.6205476931837914
b'drawing -' 0.6047123634102674
b'her own' 0.45918368894467154
b'I suppose' 0.44266324364511533
b'," said' 0.8061628790313721
b'I am' 0.6301218872108818
b'the world' 0.48263276871731303
b'far from' 0.4367772046563632
b'would not' 0.4243830217611643
b'I am' 0.6301218872108818
b'I have' 0.4273545238545112
b'Colonel Brandon' 0.9487658632212199
b"' s" 0.9562587141318204
b', and' 0.43983091376412053
b'on purpose' 0.4755321717292636
b'." "' 0.693249204903288
b'It is' 0.5103821207462266
b'Colonel Brandon' 0.9487658632212199
b'." "' 0.693249204903288
b'between them' 0.47402723496631644
b'such a' 0.4502731675622843
b'?" "' 0.5943533305092514
b'." "' 0.693249204903288
b'have been'

b'she had' 0.426031866576212
b'her husband' 0.45405887756524654
b', and' 0.43983091376412053
b'every thing' 0.6660144460212637
b'in Harley' 0.4157574127075131
b"' s" 0.9562587141318204
b'so far' 0.45021304971922055
b'without any' 0.4665443453341391
b', and' 0.43983091376412053
b'Colonel Brandon' 0.9487658632212199
b"' s" 0.9562587141318204
b'or two' 0.49700540204912663
b'in town' 0.462481215943391
b'to be' 0.41990744680061143
b'all things' 0.4437055758583848
b'at Delaford' 0.42581010687812676
b'. It' 0.45091216045154525
b'Mrs .' 0.6388509199633257
b', and' 0.43983091376412053
b'Berkeley Street' 0.8795446742933583
b'set out' 0.4772260684338886
b'they were' 0.5662775820264453
b'to be' 0.41990744680061143
b'more than' 0.5843156278981673
b'two days' 0.5066330891761408
b'their journey' 0.43933414550996996
b', and' 0.43983091376412053
b'Mr .' 0.5349475102611524
b'Colonel Brandon' 0.9487658632212199
b'at Cleveland' 0.42116746309955067
b'soon after' 0.4223248513291634
b'their arrival' 0.496222

b"' s" 0.9562587141318204
b'would be' 0.4122234816516538
b'; and' 0.5300099823509559
b'at once' 0.48241102494193366
b'at Cleveland' 0.42116746309955067
b'Miss Dashwood' 0.688656519657195
b'her sister' 0.4920177709974366
b'& c' 1.0
b'his own' 0.428604982060435
b'could not' 0.5162635917428708
b'Mrs .' 0.6388509199633257
b"' s" 0.9562587141318204
b'Mr .' 0.5349475102611524
b'able to' 0.43416154330515977
b'Miss Dashwood' 0.688656519657195
b'. She' 0.5044940325692563
b'she had' 0.426031866576212
b'their arrival' 0.4962222320217063
b'. It' 0.45091216045154525
b'Mrs .' 0.6388509199633257
b'; and' 0.5300099823509559
b'passed away' 0.51809625506954
b'Mr .' 0.5349475102611524
b"' s" 0.9562587141318204
b', and' 0.43983091376412053
b'the same' 0.49164836691948793
b'Mr .' 0.5349475102611524
b'every day' 0.5096224679475897
b', and' 0.43983091376412053
b'Miss Dashwood' 0.688656519657195
b'; but' 0.5638928643646233
b'no means' 0.5179698402567694
b'Mrs .' 0.6388509199633257
b', and' 0.43983091376412053

b', and' 0.43983091376412053
b'I should' 0.4015548692731018
b'Mr .' 0.5349475102611524
b"' s" 0.9562587141318204
b', and' 0.43983091376412053
b'a great' 0.4160463029398818
b'come back' 0.5611623113028302
b'." "' 0.693249204903288
b'tell you' 0.4314723219158311
b'?" "' 0.5943533305092514
b"ma '" 0.6048068868088367
b'. She' 0.5044940325692563
b', and' 0.43983091376412053
b'she had' 0.426031866576212
b'. She' 0.5044940325692563
b'young lady' 0.5859124678661557
b', and' 0.43983091376412053
b'." "' 0.693249204903288
b'Mr .' 0.5349475102611524
b'?" "' 0.5943533305092514
b"ma '" 0.6048068868088367
b'did not' 0.5894846130208785
b"' s" 0.9562587141318204
b'; and' 0.5300099823509559
b'Mrs .' 0.6388509199633257
b'the same' 0.49164836691948793
b'?" "' 0.5943533305092514
b"ma '" 0.6048068868088367
b'." "' 0.693249204903288
b'Do you' 0.4185196843099375
b'know where' 0.4334845463791907
b'?" "' 0.5943533305092514
b'Mrs .' 0.6388509199633257
b'told me' 0.5150762628107255
b'." "' 0.693249204903288
b'?" 

b"mother '" 0.41745134524422184
b'had already' 0.4152460975228856
b'more than' 0.5843156278981673
b'four years' 0.6191552605591046
b'. His' 0.4365673921107498
b'at Barton' 0.5099452777349327
b'. It' 0.45091216045154525
b'such a' 0.4502731675622843
b'might be' 0.4025483528650405
b'present case' 0.5599473121248175
b'so much' 0.5109416447650224
b'an opportunity' 0.5215034718445966
b', and' 0.43983091376412053
b'need not' 0.4570839436178804
b'sat down' 0.7697691705928612
b"o '" 0.5461985764648504
b'her mother' 0.45557817465871986
b"' s" 0.9562587141318204
b', and' 0.43983091376412053
b'. His' 0.4365673921107498
b'more than' 0.5843156278981673
b'. He' 0.46418457637624266
b'more than' 0.5843156278981673
b', and' 0.43983091376412053
b'. He' 0.46418457637624266
b'without any' 0.4665443453341391
b'at once' 0.48241102494193366
b'as soon' 0.4587326731575739
b'. He' 0.46418457637624266
b'such a' 0.4502731675622843
b'. His' 0.4365673921107498
b', and' 0.43983091376412053
b'It was' 0.439637578549109

b'I have' 0.4273545238545112
b'. But' 0.4872164107824855
b', and' 0.43983091376412053
b', and' 0.43983091376412053
b'so much' 0.5109416447650224
b'Colonel Brandon' 0.9487658632212199
b', and' 0.43983091376412053
b'so much' 0.5109416447650224
b'to be' 0.41990744680061143
b'between them' 0.47402723496631644
b'to be' 0.41990744680061143
b'. They' 0.42532703744684097
b'each other' 0.7150900506380493
b'two thousand' 0.6022886432354179
b', and' 0.43983091376412053
b'was impossible' 0.40967101756847807
b'Mrs .' 0.6388509199633257
b'; and' 0.5300099823509559
b'they were' 0.5662775820264453
b'fifty pounds' 0.7488928785002833
b'- year' 0.531958182518809
b'; and' 0.5300099823509559
b'. But' 0.4872164107824855
b'Miss Morton' 0.6990288494914197
b', and' 0.43983091376412053
b'had been' 0.5263227074804739
b'Mrs .' 0.6388509199633257
b"' s" 0.9562587141318204
b'Lucy Steele' 0.555056628384462
b"' s" 0.9562587141318204
b"Edward '" 0.4037491247270704
b'Colonel Brandon' 0.9487658632212199
b'Mrs .' 0.63885

b', and' 0.43983091376412053
b"Bartlett '" 0.5042557576808325
b's Buildings' 0.5179428186911491
b'. He' 0.46418457637624266
b'give up' 0.4297354245767897
b'; and' 0.5300099823509559
b'or two' 0.49700540204912663
b'. In' 0.42363682322819096
b', and' 0.43983091376412053
b"' s" 0.9562587141318204
b'. His' 0.4365673921107498
b', and' 0.43983091376412053
b'any other' 0.4501584758860145
b', and' 0.43983091376412053
b'his own' 0.428604982060435
b'; and' 0.5300099823509559
b'. He' 0.46418457637624266
b', and' 0.43983091376412053
b"mother '" 0.41745134524422184
b'. They' 0.42532703744684097
b'she had' 0.426031866576212
b'Mrs .' 0.6388509199633257
b"' s" 0.9562587141318204
b'. The' 0.5112902108414912
b'at first' 0.42395846665561443
b'; and' 0.5300099823509559
b'. But' 0.4872164107824855
b'self -' 0.6798910137031118
b"' s" 0.9562587141318204
b', and' 0.43983091376412053
b', and' 0.43983091376412053
b'soon afterwards' 0.6423546487520422
b'Mrs .' 0.6388509199633257
b'; and' 0.5300099823509559
b', a

In [9]:
# Exercise 2.3. Tokenize by unigrams and bigrams

# Initialize phrase tokenizer
bigram = Phraser(phrases)

sent="As dinner was not to be ready in less than two hours from their arrival,"
print(bigram[nltk.word_tokenize(sent.lower())])

['as', 'dinner', 'was', 'not', 'to_be', 'ready', 'in', 'less_than', 'two_hours', 'from', 'their_arrival', ',']
